awk.1 revision 85587
1.de EX
2.nf
3.ft CW
4..
5.de EE
6.br
7.fi
8.ft 1
9..
10awk
11.TH AWK 1
12.CT 1 files prog_other
13.SH NAME
14awk \- pattern-directed scanning and processing language
15.SH SYNOPSIS
16.B awk
17[
18.BI \-F
19.I fs
20]
21[
22.BI \-v
23.I var=value
24]
25[
26.I 'prog'
27|
28.BI \-f
29.I progfile
30]
31[
32.I file ...
33]
34.SH DESCRIPTION
35.I Awk
36scans each input
37.I file
38for lines that match any of a set of patterns specified literally in
39.IR prog
40or in one or more files
41specified as
42.B \-f
43.IR progfile .
44With each pattern
45there can be an associated action that will be performed
46when a line of a
47.I file
48matches the pattern.
49Each line is matched against the
50pattern portion of every pattern-action statement;
51the associated action is performed for each matched pattern.
52The file name 
53.B \-
54means the standard input.
55Any
56.IR file
57of the form
58.I var=value
59is treated as an assignment, not a filename,
60and is executed at the time it would have been opened if it were a filename.
61The option
62.B \-v
63followed by
64.I var=value
65is an assignment to be done before
66.I prog
67is executed;
68any number of
69.B \-v
70options may be present.
71The
72.B \-F
73.IR fs
74option defines the input field separator to be the regular expression
75.IR fs.
76.PP
77An input line is normally made up of fields separated by white space,
78or by regular expression
79.BR FS .
80The fields are denoted
81.BR $1 ,
82.BR $2 ,
83\&..., while
84.B $0
85refers to the entire line.
86If
87.BR FS
88is null, the input line is split into one field per character.
89.PP
90A pattern-action statement has the form
91.IP
92.IB pattern " { " action " }
93.PP
94A missing 
95.BI { " action " }
96means print the line;
97a missing pattern always matches.
98Pattern-action statements are separated by newlines or semicolons.
99.PP
100An action is a sequence of statements.
101A statement can be one of the following:
102.PP
103.EX
104.ta \w'\f(CWdelete array[expression]'u
105.RS
106.nf
107.ft CW
108if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
109while(\fI expression \fP)\fI statement\fP
110for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
111for(\fI var \fPin\fI array \fP)\fI statement\fP
112do\fI statement \fPwhile(\fI expression \fP)
113break
114continue
115{\fR [\fP\fI statement ... \fP\fR] \fP}
116\fIexpression\fP	#\fR commonly\fP\fI var = expression\fP
117print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
118printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
119return\fR [ \fP\fIexpression \fP\fR]\fP
120next	#\fR skip remaining patterns on this input line\fP
121nextfile	#\fR skip rest of this file, open next, start at top\fP
122delete\fI array\fP[\fI expression \fP]	#\fR delete an array element\fP
123delete\fI array\fP	#\fR delete all elements of array\fP
124exit\fR [ \fP\fIexpression \fP\fR]\fP	#\fR exit immediately; status is \fP\fIexpression\fP
125.fi
126.RE
127.EE
128.DT
129.PP
130Statements are terminated by
131semicolons, newlines or right braces.
132An empty
133.I expression-list
134stands for
135.BR $0 .
136String constants are quoted \&\f(CW"\ "\fR,
137with the usual C escapes recognized within.
138Expressions take on string or numeric values as appropriate,
139and are built using the operators
140.B + \- * / % ^
141(exponentiation), and concatenation (indicated by white space).
142The operators
143.B
144! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
145are also available in expressions.
146Variables may be scalars, array elements
147(denoted
148.IB x  [ i ] )
149or fields.
150Variables are initialized to the null string.
151Array subscripts may be any string,
152not necessarily numeric;
153this allows for a form of associative memory.
154Multiple subscripts such as
155.B [i,j,k]
156are permitted; the constituents are concatenated,
157separated by the value of
158.BR SUBSEP .
159.PP
160The
161.B print
162statement prints its arguments on the standard output
163(or on a file if
164.BI > file
165or
166.BI >> file
167is present or on a pipe if
168.BI | cmd
169is present), separated by the current output field separator,
170and terminated by the output record separator.
171.I file
172and
173.I cmd
174may be literal names or parenthesized expressions;
175identical string values in different statements denote
176the same open file.
177The
178.B printf
179statement formats its expression list according to the format
180(see
181.IR printf (3)) .
182The built-in function
183.BI close( expr )
184closes the file or pipe
185.IR expr .
186The built-in function
187.BI fflush( expr )
188flushes any buffered output for the file or pipe
189.IR expr .
190.PP
191The mathematical functions
192.BR exp ,
193.BR log ,
194.BR sqrt ,
195.BR sin ,
196.BR cos ,
197and
198.BR atan2 
199are built in.
200Other built-in functions:
201.TF length
202.TP
203.B length
204the length of its argument
205taken as a string,
206or of
207.B $0
208if no argument.
209.TP
210.B rand
211random number on (0,1)
212.TP
213.B srand
214sets seed for
215.B rand
216and returns the previous seed.
217.TP
218.B int
219truncates to an integer value
220.TP
221.BI substr( s , " m" , " n\fB)
222the
223.IR n -character
224substring of
225.I s
226that begins at position
227.IR m 
228counted from 1.
229.TP
230.BI index( s , " t" )
231the position in
232.I s
233where the string
234.I t
235occurs, or 0 if it does not.
236.TP
237.BI match( s , " r" )
238the position in
239.I s
240where the regular expression
241.I r
242occurs, or 0 if it does not.
243The variables
244.B RSTART
245and
246.B RLENGTH
247are set to the position and length of the matched string.
248.TP
249.BI split( s , " a" , " fs\fB)
250splits the string
251.I s
252into array elements
253.IB a [1] ,
254.IB a [2] ,
255\&...,
256.IB a [ n ] ,
257and returns
258.IR n .
259The separation is done with the regular expression
260.I fs
261or with the field separator
262.B FS
263if
264.I fs
265is not given.
266An empty string as field separator splits the string
267into one array element per character.
268.TP
269.BI sub( r , " t" , " s\fB)
270substitutes
271.I t
272for the first occurrence of the regular expression
273.I r
274in the string
275.IR s .
276If
277.I s
278is not given,
279.B $0
280is used.
281.TP
282.B gsub
283same as
284.B sub
285except that all occurrences of the regular expression
286are replaced;
287.B sub
288and
289.B gsub
290return the number of replacements.
291.TP
292.BI sprintf( fmt , " expr" , " ...\fB )
293the string resulting from formatting
294.I expr ...
295according to the
296.IR printf (3)
297format
298.I fmt
299.TP
300.BI system( cmd )
301executes
302.I cmd
303and returns its exit status
304.TP
305.BI tolower( str )
306returns a copy of
307.I str
308with all upper-case characters translated to their
309corresponding lower-case equivalents.
310.TP
311.BI toupper( str )
312returns a copy of
313.I str
314with all lower-case characters translated to their
315corresponding upper-case equivalents.
316.PD
317.PP
318The ``function''
319.B getline
320sets
321.B $0
322to the next input record from the current input file;
323.B getline
324.BI < file
325sets
326.B $0
327to the next record from
328.IR file .
329.B getline
330.I x
331sets variable
332.I x
333instead.
334Finally,
335.IB cmd " | getline
336pipes the output of
337.I cmd
338into
339.BR getline ;
340each call of
341.B getline
342returns the next line of output from
343.IR cmd .
344In all cases,
345.B getline
346returns 1 for a successful input,
3470 for end of file, and \-1 for an error.
348.PP
349Patterns are arbitrary Boolean combinations
350(with
351.BR "! || &&" )
352of regular expressions and
353relational expressions.
354Regular expressions are as in
355.IR egrep ; 
356see
357.IR grep (1).
358Isolated regular expressions
359in a pattern apply to the entire line.
360Regular expressions may also occur in
361relational expressions, using the operators
362.BR ~
363and
364.BR !~ .
365.BI / re /
366is a constant regular expression;
367any string (constant or variable) may be used
368as a regular expression, except in the position of an isolated regular expression
369in a pattern.
370.PP
371A pattern may consist of two patterns separated by a comma;
372in this case, the action is performed for all lines
373from an occurrence of the first pattern
374though an occurrence of the second.
375.PP
376A relational expression is one of the following:
377.IP
378.I expression matchop regular-expression
379.br
380.I expression relop expression
381.br
382.IB expression " in " array-name
383.br
384.BI ( expr , expr,... ") in " array-name
385.PP
386where a relop is any of the six relational operators in C,
387and a matchop is either
388.B ~
389(matches)
390or
391.B !~
392(does not match).
393A conditional is an arithmetic expression,
394a relational expression,
395or a Boolean combination
396of these.
397.PP
398The special patterns
399.B BEGIN
400and
401.B END
402may be used to capture control before the first input line is read
403and after the last.
404.B BEGIN
405and
406.B END
407do not combine with other patterns.
408.PP
409Variable names with special meanings:
410.TF FILENAME
411.TP
412.B CONVFMT
413conversion format used when converting numbers
414(default
415.BR "%.6g" )
416.TP
417.B FS
418regular expression used to separate fields; also settable
419by option
420.BI \-F fs.
421.TP
422.BR NF
423number of fields in the current record
424.TP
425.B NR
426ordinal number of the current record
427.TP
428.B FNR
429ordinal number of the current record in the current file
430.TP
431.B FILENAME
432the name of the current input file
433.TP
434.B RS
435input record separator (default newline)
436.TP
437.B OFS
438output field separator (default blank)
439.TP
440.B ORS
441output record separator (default newline)
442.TP
443.B OFMT
444output format for numbers (default
445.BR "%.6g" )
446.TP
447.B SUBSEP
448separates multiple subscripts (default 034)
449.TP
450.B ARGC
451argument count, assignable
452.TP
453.B ARGV
454argument array, assignable;
455non-null members are taken as filenames
456.TP
457.B ENVIRON
458array of environment variables; subscripts are names.
459.PD
460.PP
461Functions may be defined (at the position of a pattern-action statement) thus:
462.IP
463.B
464function foo(a, b, c) { ...; return x }
465.PP
466Parameters are passed by value if scalar and by reference if array name;
467functions may be called recursively.
468Parameters are local to the function; all other variables are global.
469Thus local variables may be created by providing excess parameters in
470the function definition.
471.SH EXAMPLES
472.TP
473.EX
474length($0) > 72
475.EE
476Print lines longer than 72 characters.
477.TP
478.EX
479{ print $2, $1 }
480.EE
481Print first two fields in opposite order.
482.PP
483.EX
484BEGIN { FS = ",[ \et]*|[ \et]+" }
485      { print $2, $1 }
486.EE
487.ns
488.IP
489Same, with input fields separated by comma and/or blanks and tabs.
490.PP
491.EX
492.nf
493	{ s += $1 }
494END	{ print "sum is", s, " average is", s/NR }
495.fi
496.EE
497.ns
498.IP
499Add up first column, print sum and average.
500.TP
501.EX
502/start/, /stop/
503.EE
504Print all lines between start/stop pairs.
505.PP
506.EX
507.nf
508BEGIN	{	# Simulate echo(1)
509	for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
510	printf "\en"
511	exit }
512.fi
513.EE
514.SH SEE ALSO
515.IR lex (1), 
516.IR sed (1)
517.br
518A. V. Aho, B. W. Kernighan, P. J. Weinberger,
519.I
520The AWK Programming Language,
521Addison-Wesley, 1988.  ISBN 0-201-07981-X
522.SH BUGS
523There are no explicit conversions between numbers and strings.
524To force an expression to be treated as a number add 0 to it;
525to force it to be treated as a string concatenate
526\&\f(CW""\fP to it.
527.br
528The scope rules for variables in functions are a botch;
529the syntax is worse.
530