1.de EX 2.nf 3.ft CW 4.. 5.de EE 6.br 7.fi 8.ft 1 9.. 10awk 11.TH AWK 1 12.CT 1 files prog_other 13.SH NAME 14awk \- pattern-directed scanning and processing language 15.SH SYNOPSIS 16.B awk 17[ 18.BI \-F 19.I fs 20] 21[ 22.BI \-v 23.I var=value 24] 25[ 26.I 'prog' 27| 28.BI \-f 29.I progfile 30] 31[ 32.I file ... 33] 34.SH DESCRIPTION 35.I Awk 36scans each input 37.I file 38for lines that match any of a set of patterns specified literally in 39.IR prog 40or in one or more files 41specified as 42.B \-f 43.IR progfile . 44With each pattern 45there can be an associated action that will be performed 46when a line of a 47.I file 48matches the pattern. 49Each line is matched against the 50pattern portion of every pattern-action statement; 51the associated action is performed for each matched pattern. 52The file name 53.B \- 54means the standard input. 55Any 56.IR file 57of the form 58.I var=value 59is treated as an assignment, not a filename, 60and is executed at the time it would have been opened if it were a filename. 61The option 62.B \-v 63followed by 64.I var=value 65is an assignment to be done before 66.I prog 67is executed; 68any number of 69.B \-v 70options may be present. 71The 72.B \-F 73.IR fs 74option defines the input field separator to be the regular expression 75.IR fs. 76.PP 77An input line is normally made up of fields separated by white space, 78or by regular expression 79.BR FS . 80The fields are denoted 81.BR $1 , 82.BR $2 , 83\&..., while 84.B $0 85refers to the entire line. 86If 87.BR FS 88is null, the input line is split into one field per character. 89.PP 90A pattern-action statement has the form 91.IP 92.IB pattern " { " action " } 93.PP 94A missing 95.BI { " action " } 96means print the line; 97a missing pattern always matches. 98Pattern-action statements are separated by newlines or semicolons. 99.PP 100An action is a sequence of statements. 101A statement can be one of the following: 102.PP 103.EX 104.ta \w'\f(CWdelete array[expression]'u 105.RS 106.nf 107.ft CW 108if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP 109while(\fI expression \fP)\fI statement\fP 110for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP 111for(\fI var \fPin\fI array \fP)\fI statement\fP 112do\fI statement \fPwhile(\fI expression \fP) 113break 114continue 115{\fR [\fP\fI statement ... \fP\fR] \fP} 116\fIexpression\fP #\fR commonly\fP\fI var = expression\fP 117print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 118printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 119return\fR [ \fP\fIexpression \fP\fR]\fP 120next #\fR skip remaining patterns on this input line\fP 121nextfile #\fR skip rest of this file, open next, start at top\fP 122delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP 123delete\fI array\fP #\fR delete all elements of array\fP 124exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP 125.fi 126.RE 127.EE 128.DT 129.PP 130Statements are terminated by 131semicolons, newlines or right braces. 132An empty 133.I expression-list 134stands for 135.BR $0 . 136String constants are quoted \&\f(CW"\ "\fR, 137with the usual C escapes recognized within. 138Expressions take on string or numeric values as appropriate, 139and are built using the operators 140.B + \- * / % ^ 141(exponentiation), and concatenation (indicated by white space). 142The operators 143.B 144! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 145are also available in expressions. 146Variables may be scalars, array elements 147(denoted 148.IB x [ i ] ) 149or fields. 150Variables are initialized to the null string. 151Array subscripts may be any string, 152not necessarily numeric; 153this allows for a form of associative memory. 154Multiple subscripts such as 155.B [i,j,k] 156are permitted; the constituents are concatenated, 157separated by the value of 158.BR SUBSEP . 159.PP 160The 161.B print 162statement prints its arguments on the standard output 163(or on a file if 164.BI > file 165or 166.BI >> file 167is present or on a pipe if 168.BI | cmd 169is present), separated by the current output field separator, 170and terminated by the output record separator. 171.I file 172and 173.I cmd 174may be literal names or parenthesized expressions; 175identical string values in different statements denote 176the same open file. 177The 178.B printf 179statement formats its expression list according to the format 180(see 181.IR printf (3)) . 182The built-in function 183.BI close( expr ) 184closes the file or pipe 185.IR expr . 186The built-in function 187.BI fflush( expr ) 188flushes any buffered output for the file or pipe 189.IR expr . 190.PP 191The mathematical functions 192.BR exp , 193.BR log , 194.BR sqrt , 195.BR sin , 196.BR cos , 197and 198.BR atan2 199are built in. 200Other built-in functions: 201.TF length 202.TP 203.B length 204the length of its argument 205taken as a string, 206or of 207.B $0 208if no argument. 209.TP 210.B rand 211random number on (0,1) 212.TP 213.B srand 214sets seed for 215.B rand 216and returns the previous seed. 217.TP 218.B int 219truncates to an integer value 220.TP 221.BI substr( s , " m" , " n\fB) 222the 223.IR n -character 224substring of 225.I s 226that begins at position 227.IR m 228counted from 1. 229.TP 230.BI index( s , " t" ) 231the position in 232.I s 233where the string 234.I t 235occurs, or 0 if it does not. 236.TP 237.BI match( s , " r" ) 238the position in 239.I s 240where the regular expression 241.I r 242occurs, or 0 if it does not. 243The variables 244.B RSTART 245and 246.B RLENGTH 247are set to the position and length of the matched string. 248.TP 249.BI split( s , " a" , " fs\fB) 250splits the string 251.I s 252into array elements 253.IB a [1] , 254.IB a [2] , 255\&..., 256.IB a [ n ] , 257and returns 258.IR n . 259The separation is done with the regular expression 260.I fs 261or with the field separator 262.B FS 263if 264.I fs 265is not given. 266An empty string as field separator splits the string 267into one array element per character. 268.TP 269.BI sub( r , " t" , " s\fB) 270substitutes 271.I t 272for the first occurrence of the regular expression 273.I r 274in the string 275.IR s . 276If 277.I s 278is not given, 279.B $0 280is used. 281.TP 282.B gsub 283same as 284.B sub 285except that all occurrences of the regular expression 286are replaced; 287.B sub 288and 289.B gsub 290return the number of replacements. 291.TP 292.BI sprintf( fmt , " expr" , " ...\fB ) 293the string resulting from formatting 294.I expr ... 295according to the 296.IR printf (3) 297format 298.I fmt 299.TP 300.BI system( cmd ) 301executes 302.I cmd 303and returns its exit status 304.TP 305.BI tolower( str ) 306returns a copy of 307.I str 308with all upper-case characters translated to their 309corresponding lower-case equivalents. 310.TP 311.BI toupper( str ) 312returns a copy of 313.I str 314with all lower-case characters translated to their 315corresponding upper-case equivalents. 316.PD 317.PP 318The ``function'' 319.B getline 320sets 321.B $0 322to the next input record from the current input file; 323.B getline 324.BI < file 325sets 326.B $0 327to the next record from 328.IR file . 329.B getline 330.I x 331sets variable 332.I x 333instead. 334Finally, 335.IB cmd " | getline 336pipes the output of 337.I cmd 338into 339.BR getline ; 340each call of 341.B getline 342returns the next line of output from 343.IR cmd . 344In all cases, 345.B getline 346returns 1 for a successful input, 3470 for end of file, and \-1 for an error. 348.PP 349Patterns are arbitrary Boolean combinations 350(with 351.BR "! || &&" ) 352of regular expressions and 353relational expressions. 354Regular expressions are as defined in 355.IR re_format (7). 356Isolated regular expressions 357in a pattern apply to the entire line. 358Regular expressions may also occur in 359relational expressions, using the operators 360.BR ~ 361and 362.BR !~ . 363.BI / re / 364is a constant regular expression; 365any string (constant or variable) may be used 366as a regular expression, except in the position of an isolated regular expression 367in a pattern. 368.PP 369A pattern may consist of two patterns separated by a comma; 370in this case, the action is performed for all lines 371from an occurrence of the first pattern 372though an occurrence of the second. 373.PP 374A relational expression is one of the following: 375.IP 376.I expression matchop regular-expression 377.br 378.I expression relop expression 379.br 380.IB expression " in " array-name 381.br 382.BI ( expr , expr,... ") in " array-name 383.PP 384where a relop is any of the six relational operators in C, 385and a matchop is either 386.B ~ 387(matches) 388or 389.B !~ 390(does not match). 391A conditional is an arithmetic expression, 392a relational expression, 393or a Boolean combination 394of these. 395.PP 396The special patterns 397.B BEGIN 398and 399.B END 400may be used to capture control before the first input line is read 401and after the last. 402.B BEGIN 403and 404.B END 405do not combine with other patterns. 406.PP 407Variable names with special meanings: 408.TF FILENAME 409.TP 410.B CONVFMT 411conversion format used when converting numbers 412(default 413.BR "%.6g" ) 414.TP 415.B FS 416regular expression used to separate fields; also settable 417by option 418.BI \-F fs. 419.TP 420.BR NF 421number of fields in the current record 422.TP 423.B NR 424ordinal number of the current record 425.TP 426.B FNR 427ordinal number of the current record in the current file 428.TP 429.B FILENAME 430the name of the current input file 431.TP 432.B RS 433input record separator (default newline) 434.TP 435.B OFS 436output field separator (default blank) 437.TP 438.B ORS 439output record separator (default newline) 440.TP 441.B OFMT 442output format for numbers (default 443.BR "%.6g" ) 444.TP 445.B SUBSEP 446separates multiple subscripts (default 034) 447.TP 448.B ARGC 449argument count, assignable 450.TP 451.B ARGV 452argument array, assignable; 453non-null members are taken as filenames 454.TP 455.B ENVIRON 456array of environment variables; subscripts are names. 457.PD 458.PP 459Functions may be defined (at the position of a pattern-action statement) thus: 460.IP 461.B 462function foo(a, b, c) { ...; return x } 463.PP 464Parameters are passed by value if scalar and by reference if array name; 465functions may be called recursively. 466Parameters are local to the function; all other variables are global. 467Thus local variables may be created by providing excess parameters in 468the function definition. 469.SH EXAMPLES 470.TP 471.EX 472length($0) > 72 473.EE 474Print lines longer than 72 characters. 475.TP 476.EX 477{ print $2, $1 } 478.EE 479Print first two fields in opposite order. 480.PP 481.EX 482BEGIN { FS = ",[ \et]*|[ \et]+" } 483 { print $2, $1 } 484.EE 485.ns 486.IP 487Same, with input fields separated by comma and/or blanks and tabs. 488.PP 489.EX 490.nf 491 { s += $1 } 492END { print "sum is", s, " average is", s/NR } 493.fi 494.EE 495.ns 496.IP 497Add up first column, print sum and average. 498.TP 499.EX 500/start/, /stop/ 501.EE 502Print all lines between start/stop pairs. 503.PP 504.EX 505.nf 506BEGIN { # Simulate echo(1) 507 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 508 printf "\en" 509 exit } 510.fi 511.EE 512.SH SEE ALSO 513.IR lex (1), 514.IR sed (1) 515.br 516A. V. Aho, B. W. Kernighan, P. J. Weinberger, 517.I 518The AWK Programming Language, 519Addison-Wesley, 1988. ISBN 0-201-07981-X 520.SH BUGS 521There are no explicit conversions between numbers and strings. 522To force an expression to be treated as a number add 0 to it; 523to force it to be treated as a string concatenate 524\&\f(CW""\fP to it. 525.br 526The scope rules for variables in functions are a botch; 527the syntax is worse. 528