1.de EX
2.nf
3.ft CW
4..
5.de EE
6.br
7.fi
8.ft 1
9..
10awk
11.TH AWK 1
12.CT 1 files prog_other
13.SH NAME
14awk \- pattern-directed scanning and processing language
15.SH SYNOPSIS
16.B awk
17[
18.BI \-F
19.I fs
20]
21[
22.BI \-v
23.I var=value
24]
25[
26.I 'prog'
27|
28.BI \-f
29.I progfile
30]
31[
32.I file ...
33]
34.SH DESCRIPTION
35.I Awk
36scans each input
37.I file
38for lines that match any of a set of patterns specified literally in
39.IR prog
40or in one or more files
41specified as
42.B \-f
43.IR progfile .
44With each pattern
45there can be an associated action that will be performed
46when a line of a
47.I file
48matches the pattern.
49Each line is matched against the
50pattern portion of every pattern-action statement;
51the associated action is performed for each matched pattern.
52The file name 
53.B \-
54means the standard input.
55Any
56.IR file
57of the form
58.I var=value
59is treated as an assignment, not a filename,
60and is executed at the time it would have been opened if it were a filename.
61The option
62.B \-v
63followed by
64.I var=value
65is an assignment to be done before
66.I prog
67is executed;
68any number of
69.B \-v
70options may be present.
71The
72.B \-F
73.IR fs
74option defines the input field separator to be the regular expression
75.IR fs.
76.PP
77An input line is normally made up of fields separated by white space,
78or by regular expression
79.BR FS .
80The fields are denoted
81.BR $1 ,
82.BR $2 ,
83\&..., while
84.B $0
85refers to the entire line.
86If
87.BR FS
88is null, the input line is split into one field per character.
89.PP
90A pattern-action statement has the form
91.IP
92.IB pattern " { " action " }
93.PP
94A missing 
95.BI { " action " }
96means print the line;
97a missing pattern always matches.
98Pattern-action statements are separated by newlines or semicolons.
99.PP
100An action is a sequence of statements.
101A statement can be one of the following:
102.PP
103.EX
104.ta \w'\f(CWdelete array[expression]'u
105.RS
106.nf
107.ft CW
108if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
109while(\fI expression \fP)\fI statement\fP
110for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
111for(\fI var \fPin\fI array \fP)\fI statement\fP
112do\fI statement \fPwhile(\fI expression \fP)
113break
114continue
115{\fR [\fP\fI statement ... \fP\fR] \fP}
116\fIexpression\fP	#\fR commonly\fP\fI var = expression\fP
117print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
118printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
119return\fR [ \fP\fIexpression \fP\fR]\fP
120next	#\fR skip remaining patterns on this input line\fP
121nextfile	#\fR skip rest of this file, open next, start at top\fP
122delete\fI array\fP[\fI expression \fP]	#\fR delete an array element\fP
123delete\fI array\fP	#\fR delete all elements of array\fP
124exit\fR [ \fP\fIexpression \fP\fR]\fP	#\fR exit immediately; status is \fP\fIexpression\fP
125.fi
126.RE
127.EE
128.DT
129.PP
130Statements are terminated by
131semicolons, newlines or right braces.
132An empty
133.I expression-list
134stands for
135.BR $0 .
136String constants are quoted \&\f(CW"\ "\fR,
137with the usual C escapes recognized within.
138Expressions take on string or numeric values as appropriate,
139and are built using the operators
140.B + \- * / % ^
141(exponentiation), and concatenation (indicated by white space).
142The operators
143.B
144! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
145are also available in expressions.
146Variables may be scalars, array elements
147(denoted
148.IB x  [ i ] )
149or fields.
150Variables are initialized to the null string.
151Array subscripts may be any string,
152not necessarily numeric;
153this allows for a form of associative memory.
154Multiple subscripts such as
155.B [i,j,k]
156are permitted; the constituents are concatenated,
157separated by the value of
158.BR SUBSEP .
159.PP
160The
161.B print
162statement prints its arguments on the standard output
163(or on a file if
164.BI > file
165or
166.BI >> file
167is present or on a pipe if
168.BI | cmd
169is present), separated by the current output field separator,
170and terminated by the output record separator.
171.I file
172and
173.I cmd
174may be literal names or parenthesized expressions;
175identical string values in different statements denote
176the same open file.
177The
178.B printf
179statement formats its expression list according to the format
180(see
181.IR printf (3)) .
182The built-in function
183.BI close( expr )
184closes the file or pipe
185.IR expr .
186The built-in function
187.BI fflush( expr )
188flushes any buffered output for the file or pipe
189.IR expr .
190.PP
191The mathematical functions
192.BR exp ,
193.BR log ,
194.BR sqrt ,
195.BR sin ,
196.BR cos ,
197and
198.BR atan2 
199are built in.
200Other built-in functions:
201.TF length
202.TP
203.B length
204the length of its argument
205taken as a string,
206or of
207.B $0
208if no argument.
209.TP
210.B rand
211random number on (0,1)
212.TP
213.B srand
214sets seed for
215.B rand
216and returns the previous seed.
217.TP
218.B int
219truncates to an integer value
220.TP
221.BI substr( s , " m" , " n\fB)
222the
223.IR n -character
224substring of
225.I s
226that begins at position
227.IR m 
228counted from 1.
229.TP
230.BI index( s , " t" )
231the position in
232.I s
233where the string
234.I t
235occurs, or 0 if it does not.
236.TP
237.BI match( s , " r" )
238the position in
239.I s
240where the regular expression
241.I r
242occurs, or 0 if it does not.
243The variables
244.B RSTART
245and
246.B RLENGTH
247are set to the position and length of the matched string.
248.TP
249.BI split( s , " a" , " fs\fB)
250splits the string
251.I s
252into array elements
253.IB a [1] ,
254.IB a [2] ,
255\&...,
256.IB a [ n ] ,
257and returns
258.IR n .
259The separation is done with the regular expression
260.I fs
261or with the field separator
262.B FS
263if
264.I fs
265is not given.
266An empty string as field separator splits the string
267into one array element per character.
268.TP
269.BI sub( r , " t" , " s\fB)
270substitutes
271.I t
272for the first occurrence of the regular expression
273.I r
274in the string
275.IR s .
276If
277.I s
278is not given,
279.B $0
280is used.
281.TP
282.B gsub
283same as
284.B sub
285except that all occurrences of the regular expression
286are replaced;
287.B sub
288and
289.B gsub
290return the number of replacements.
291.TP
292.BI sprintf( fmt , " expr" , " ...\fB )
293the string resulting from formatting
294.I expr ...
295according to the
296.IR printf (3)
297format
298.I fmt
299.TP
300.BI system( cmd )
301executes
302.I cmd
303and returns its exit status
304.TP
305.BI tolower( str )
306returns a copy of
307.I str
308with all upper-case characters translated to their
309corresponding lower-case equivalents.
310.TP
311.BI toupper( str )
312returns a copy of
313.I str
314with all lower-case characters translated to their
315corresponding upper-case equivalents.
316.PD
317.PP
318The ``function''
319.B getline
320sets
321.B $0
322to the next input record from the current input file;
323.B getline
324.BI < file
325sets
326.B $0
327to the next record from
328.IR file .
329.B getline
330.I x
331sets variable
332.I x
333instead.
334Finally,
335.IB cmd " | getline
336pipes the output of
337.I cmd
338into
339.BR getline ;
340each call of
341.B getline
342returns the next line of output from
343.IR cmd .
344In all cases,
345.B getline
346returns 1 for a successful input,
3470 for end of file, and \-1 for an error.
348.PP
349Patterns are arbitrary Boolean combinations
350(with
351.BR "! || &&" )
352of regular expressions and
353relational expressions.
354Regular expressions are as defined in
355.IR re_format (7).
356Isolated regular expressions
357in a pattern apply to the entire line.
358Regular expressions may also occur in
359relational expressions, using the operators
360.BR ~
361and
362.BR !~ .
363.BI / re /
364is a constant regular expression;
365any string (constant or variable) may be used
366as a regular expression, except in the position of an isolated regular expression
367in a pattern.
368.PP
369A pattern may consist of two patterns separated by a comma;
370in this case, the action is performed for all lines
371from an occurrence of the first pattern
372though an occurrence of the second.
373.PP
374A relational expression is one of the following:
375.IP
376.I expression matchop regular-expression
377.br
378.I expression relop expression
379.br
380.IB expression " in " array-name
381.br
382.BI ( expr , expr,... ") in " array-name
383.PP
384where a relop is any of the six relational operators in C,
385and a matchop is either
386.B ~
387(matches)
388or
389.B !~
390(does not match).
391A conditional is an arithmetic expression,
392a relational expression,
393or a Boolean combination
394of these.
395.PP
396The special patterns
397.B BEGIN
398and
399.B END
400may be used to capture control before the first input line is read
401and after the last.
402.B BEGIN
403and
404.B END
405do not combine with other patterns.
406.PP
407Variable names with special meanings:
408.TF FILENAME
409.TP
410.B CONVFMT
411conversion format used when converting numbers
412(default
413.BR "%.6g" )
414.TP
415.B FS
416regular expression used to separate fields; also settable
417by option
418.BI \-F fs.
419.TP
420.BR NF
421number of fields in the current record
422.TP
423.B NR
424ordinal number of the current record
425.TP
426.B FNR
427ordinal number of the current record in the current file
428.TP
429.B FILENAME
430the name of the current input file
431.TP
432.B RS
433input record separator (default newline)
434.TP
435.B OFS
436output field separator (default blank)
437.TP
438.B ORS
439output record separator (default newline)
440.TP
441.B OFMT
442output format for numbers (default
443.BR "%.6g" )
444.TP
445.B SUBSEP
446separates multiple subscripts (default 034)
447.TP
448.B ARGC
449argument count, assignable
450.TP
451.B ARGV
452argument array, assignable;
453non-null members are taken as filenames
454.TP
455.B ENVIRON
456array of environment variables; subscripts are names.
457.PD
458.PP
459Functions may be defined (at the position of a pattern-action statement) thus:
460.IP
461.B
462function foo(a, b, c) { ...; return x }
463.PP
464Parameters are passed by value if scalar and by reference if array name;
465functions may be called recursively.
466Parameters are local to the function; all other variables are global.
467Thus local variables may be created by providing excess parameters in
468the function definition.
469.SH EXAMPLES
470.TP
471.EX
472length($0) > 72
473.EE
474Print lines longer than 72 characters.
475.TP
476.EX
477{ print $2, $1 }
478.EE
479Print first two fields in opposite order.
480.PP
481.EX
482BEGIN { FS = ",[ \et]*|[ \et]+" }
483      { print $2, $1 }
484.EE
485.ns
486.IP
487Same, with input fields separated by comma and/or blanks and tabs.
488.PP
489.EX
490.nf
491	{ s += $1 }
492END	{ print "sum is", s, " average is", s/NR }
493.fi
494.EE
495.ns
496.IP
497Add up first column, print sum and average.
498.TP
499.EX
500/start/, /stop/
501.EE
502Print all lines between start/stop pairs.
503.PP
504.EX
505.nf
506BEGIN	{	# Simulate echo(1)
507	for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
508	printf "\en"
509	exit }
510.fi
511.EE
512.SH SEE ALSO
513.IR lex (1), 
514.IR sed (1)
515.br
516A. V. Aho, B. W. Kernighan, P. J. Weinberger,
517.I
518The AWK Programming Language,
519Addison-Wesley, 1988.  ISBN 0-201-07981-X
520.SH BUGS
521There are no explicit conversions between numbers and strings.
522To force an expression to be treated as a number add 0 to it;
523to force it to be treated as a string concatenate
524\&\f(CW""\fP to it.
525.br
526The scope rules for variables in functions are a botch;
527the syntax is worse.
528