1.de EX 2.nf 3.ft CW 4.. 5.de EE 6.br 7.fi 8.ft 1 9.. 10awk 11.TH AWK 1 12.CT 1 files prog_other 13.SH NAME 14awk \- pattern-directed scanning and processing language 15.SH SYNOPSIS 16.B awk 17[ 18.BI \-F 19.I fs 20] 21[ 22.BI \-v 23.I var=value 24] 25[ 26.I 'prog' 27| 28.BI \-f 29.I progfile 30] 31[ 32.I file ... 33] 34.SH DESCRIPTION 35.I Awk 36scans each input 37.I file 38for lines that match any of a set of patterns specified literally in 39.IR prog 40or in one or more files 41specified as 42.B \-f 43.IR progfile . 44With each pattern 45there can be an associated action that will be performed 46when a line of a 47.I file 48matches the pattern. 49Each line is matched against the 50pattern portion of every pattern-action statement; 51the associated action is performed for each matched pattern. 52The file name 53.B \- 54means the standard input. 55Any 56.IR file 57of the form 58.I var=value 59is treated as an assignment, not a filename, 60and is executed at the time it would have been opened if it were a filename. 61The option 62.B \-v 63followed by 64.I var=value 65is an assignment to be done before 66.I prog 67is executed; 68any number of 69.B \-v 70options may be present. 71The 72.B \-F 73.IR fs 74option defines the input field separator to be the regular expression 75.IR fs. 76.PP 77An input line is normally made up of fields separated by white space, 78or by regular expression 79.BR FS . 80The fields are denoted 81.BR $1 , 82.BR $2 , 83\&..., while 84.B $0 85refers to the entire line. 86If 87.BR FS 88is null, the input line is split into one field per character. 89.PP 90A pattern-action statement has the form 91.IP 92.IB pattern " { " action " } 93.PP 94A missing 95.BI { " action " } 96means print the line; 97a missing pattern always matches. 98Pattern-action statements are separated by newlines or semicolons. 99.PP 100An action is a sequence of statements. 101A statement can be one of the following: 102.PP 103.EX 104.ta \w'\f(CWdelete array[expression]'u 105.RS 106.nf 107.ft CW 108if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP 109while(\fI expression \fP)\fI statement\fP 110for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP 111for(\fI var \fPin\fI array \fP)\fI statement\fP 112do\fI statement \fPwhile(\fI expression \fP) 113break 114continue 115{\fR [\fP\fI statement ... \fP\fR] \fP} 116\fIexpression\fP #\fR commonly\fP\fI var = expression\fP 117print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 118printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 119return\fR [ \fP\fIexpression \fP\fR]\fP 120next #\fR skip remaining patterns on this input line\fP 121nextfile #\fR skip rest of this file, open next, start at top\fP 122delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP 123delete\fI array\fP #\fR delete all elements of array\fP 124exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP 125.fi 126.RE 127.EE 128.DT 129.PP 130Statements are terminated by 131semicolons, newlines or right braces. 132An empty 133.I expression-list 134stands for 135.BR $0 . 136String constants are quoted \&\f(CW"\ "\fR, 137with the usual C escapes recognized within. 138Expressions take on string or numeric values as appropriate, 139and are built using the operators 140.B + \- * / % ^ 141(exponentiation), and concatenation (indicated by white space). 142The operators 143.B 144! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 145are also available in expressions. 146Variables may be scalars, array elements 147(denoted 148.IB x [ i ] ) 149or fields. 150Variables are initialized to the null string. 151Array subscripts may be any string, 152not necessarily numeric; 153this allows for a form of associative memory. 154Multiple subscripts such as 155.B [i,j,k] 156are permitted; the constituents are concatenated, 157separated by the value of 158.BR SUBSEP . 159.PP 160The 161.B print 162statement prints its arguments on the standard output 163(or on a file if 164.BI > file 165or 166.BI >> file 167is present or on a pipe if 168.BI | cmd 169is present), separated by the current output field separator, 170and terminated by the output record separator. 171.I file 172and 173.I cmd 174may be literal names or parenthesized expressions; 175identical string values in different statements denote 176the same open file. 177The 178.B printf 179statement formats its expression list according to the format 180(see 181.IR printf (3)) . 182The built-in function 183.BI close( expr ) 184closes the file or pipe 185.IR expr . 186The built-in function 187.BI fflush( expr ) 188flushes any buffered output for the file or pipe 189.IR expr . 190.PP 191The mathematical functions 192.BR exp , 193.BR log , 194.BR sqrt , 195.BR sin , 196.BR cos , 197and 198.BR atan2 199are built in. 200Other built-in functions: 201.TF length 202.TP 203.B length 204the length of its argument 205taken as a string, 206or of 207.B $0 208if no argument. 209.TP 210.B rand 211random number on [0,1) 212.TP 213.B srand 214sets seed for 215.B rand 216and returns the previous seed. 217.TP 218.B int 219truncates to an integer value 220.TP 221.BI substr( s , " m" , " n\fB) 222the 223.IR n -character 224substring of 225.I s 226that begins at position 227.IR m 228counted from 1. 229.TP 230.BI index( s , " t" ) 231the position in 232.I s 233where the string 234.I t 235occurs, or 0 if it does not. 236.TP 237.BI match( s , " r" ) 238the position in 239.I s 240where the regular expression 241.I r 242occurs, or 0 if it does not. 243The variables 244.B RSTART 245and 246.B RLENGTH 247are set to the position and length of the matched string. 248.TP 249.BI split( s , " a" , " fs\fB) 250splits the string 251.I s 252into array elements 253.IB a [1] , 254.IB a [2] , 255\&..., 256.IB a [ n ] , 257and returns 258.IR n . 259The separation is done with the regular expression 260.I fs 261or with the field separator 262.B FS 263if 264.I fs 265is not given. 266An empty string as field separator splits the string 267into one array element per character. 268.TP 269.BI sub( r , " t" , " s\fB) 270substitutes 271.I t 272for the first occurrence of the regular expression 273.I r 274in the string 275.IR s . 276If 277.I s 278is not given, 279.B $0 280is used. 281.TP 282.B gsub 283same as 284.B sub 285except that all occurrences of the regular expression 286are replaced; 287.B sub 288and 289.B gsub 290return the number of replacements. 291.TP 292.BI sprintf( fmt , " expr" , " ...\fB ) 293the string resulting from formatting 294.I expr ... 295according to the 296.IR printf (3) 297format 298.I fmt 299.TP 300.BI system( cmd ) 301executes 302.I cmd 303and returns its exit status 304.TP 305.BI tolower( str ) 306returns a copy of 307.I str 308with all upper-case characters translated to their 309corresponding lower-case equivalents. 310.TP 311.BI toupper( str ) 312returns a copy of 313.I str 314with all lower-case characters translated to their 315corresponding upper-case equivalents. 316.PD 317.PP 318The ``function'' 319.B getline 320sets 321.B $0 322to the next input record from the current input file; 323.B getline 324.BI < file 325sets 326.B $0 327to the next record from 328.IR file . 329.B getline 330.I x 331sets variable 332.I x 333instead. 334Finally, 335.IB cmd " | getline 336pipes the output of 337.I cmd 338into 339.BR getline ; 340each call of 341.B getline 342returns the next line of output from 343.IR cmd . 344In all cases, 345.B getline 346returns 1 for a successful input, 3470 for end of file, and \-1 for an error. 348.PP 349Patterns are arbitrary Boolean combinations 350(with 351.BR "! || &&" ) 352of regular expressions and 353relational expressions. 354Regular expressions are as in 355.IR egrep ; 356see 357.IR grep (1). 358Isolated regular expressions 359in a pattern apply to the entire line. 360Regular expressions may also occur in 361relational expressions, using the operators 362.BR ~ 363and 364.BR !~ . 365.BI / re / 366is a constant regular expression; 367any string (constant or variable) may be used 368as a regular expression, except in the position of an isolated regular expression 369in a pattern. 370.PP 371A pattern may consist of two patterns separated by a comma; 372in this case, the action is performed for all lines 373from an occurrence of the first pattern 374though an occurrence of the second. 375.PP 376A relational expression is one of the following: 377.IP 378.I expression matchop regular-expression 379.br 380.I expression relop expression 381.br 382.IB expression " in " array-name 383.br 384.BI ( expr , expr,... ") in " array-name 385.PP 386where a relop is any of the six relational operators in C, 387and a matchop is either 388.B ~ 389(matches) 390or 391.B !~ 392(does not match). 393A conditional is an arithmetic expression, 394a relational expression, 395or a Boolean combination 396of these. 397.PP 398The special patterns 399.B BEGIN 400and 401.B END 402may be used to capture control before the first input line is read 403and after the last. 404.B BEGIN 405and 406.B END 407do not combine with other patterns. 408.PP 409Variable names with special meanings: 410.TF FILENAME 411.TP 412.B CONVFMT 413conversion format used when converting numbers 414(default 415.BR "%.6g" ) 416.TP 417.B FS 418regular expression used to separate fields; also settable 419by option 420.BI \-F fs. 421.TP 422.BR NF 423number of fields in the current record 424.TP 425.B NR 426ordinal number of the current record 427.TP 428.B FNR 429ordinal number of the current record in the current file 430.TP 431.B FILENAME 432the name of the current input file 433.TP 434.B RS 435input record separator (default newline) 436.TP 437.B OFS 438output field separator (default blank) 439.TP 440.B ORS 441output record separator (default newline) 442.TP 443.B OFMT 444output format for numbers (default 445.BR "%.6g" ) 446.TP 447.B SUBSEP 448separates multiple subscripts (default 034) 449.TP 450.B ARGC 451argument count, assignable 452.TP 453.B ARGV 454argument array, assignable; 455non-null members are taken as filenames 456.TP 457.B ENVIRON 458array of environment variables; subscripts are names. 459.PD 460.PP 461Functions may be defined (at the position of a pattern-action statement) thus: 462.IP 463.B 464function foo(a, b, c) { ...; return x } 465.PP 466Parameters are passed by value if scalar and by reference if array name; 467functions may be called recursively. 468Parameters are local to the function; all other variables are global. 469Thus local variables may be created by providing excess parameters in 470the function definition. 471.SH EXAMPLES 472.TP 473.EX 474length($0) > 72 475.EE 476Print lines longer than 72 characters. 477.TP 478.EX 479{ print $2, $1 } 480.EE 481Print first two fields in opposite order. 482.PP 483.EX 484BEGIN { FS = ",[ \et]*|[ \et]+" } 485 { print $2, $1 } 486.EE 487.ns 488.IP 489Same, with input fields separated by comma and/or blanks and tabs. 490.PP 491.EX 492.nf 493 { s += $1 } 494END { print "sum is", s, " average is", s/NR } 495.fi 496.EE 497.ns 498.IP 499Add up first column, print sum and average. 500.TP 501.EX 502/start/, /stop/ 503.EE 504Print all lines between start/stop pairs. 505.PP 506.EX 507.nf 508BEGIN { # Simulate echo(1) 509 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 510 printf "\en" 511 exit } 512.fi 513.EE 514.SH SEE ALSO 515.IR lex (1), 516.IR sed (1) 517.br 518A. V. Aho, B. W. Kernighan, P. J. Weinberger, 519.I 520The AWK Programming Language, 521Addison-Wesley, 1988. ISBN 0-201-07981-X 522.SH BUGS 523There are no explicit conversions between numbers and strings. 524To force an expression to be treated as a number add 0 to it; 525to force it to be treated as a string concatenate 526\&\f(CW""\fP to it. 527.br 528The scope rules for variables in functions are a botch; 529the syntax is worse. 530