.TH gawk "" "" "Command"
.PC "Pattern-scanning and -processing language"
\fBgawk [ \fIPOSIX or GNU style options \fB] \-f \fIprogram-file \fB[ -- ] \fIfile ...\fR
\fBgawk [ \fIPOSIX or GNU style options \fB] [ -- ] \fIprogram-text file ...
.PP
.B gawk
is the GNU Project's implementation of the AWK programming language.
It conforms to the definition of the
language in the \*(PX 1003.2 \fICommand Language and Utilities Standard\fR.
This version in turn is based on the
description in \fIThe AWK Programming Language\fR, by Aho,
Kernighan, and Weinberger, with the additional features
defined in the System V Release 4 version of \*(UX
.BR awk .
.B gawk
also provides some GNU-specific extensions.
.PP
The command line consists of options to
.B gawk
itself, the AWK program text (if not supplied via the options
.B \-f
or
.BR \-\-file ),
and values to be made available in the predefined AWK variables
.B ARGC
and
.BR ARGV .
.SH "Command-line Options"
.B gawk
options may be either the traditional POSIX one-letter options,
or the GNU style long options.
\*(PX-style options begin with a single `\-', whereas GNU long options
begin with ``\-\-''.
GNU-style long options are provided
for both GNU-specific features and for POSIX mandated features.
Other implementations of the AWK language are
likely to only accept the traditional one-letter options.
.PP
Following the \*(PX standard,
.BR gawk -specific
options are supplied via arguments to the
.B \-W
option.
Multiple
.B \-W
options may be supplied, or multiple arguments may be supplied together
if they are separated by commas, or enclosed in quotation marks
and separated by white space.
Case is ignored in arguments to the
.B \-W
option.
Each
.B -W
option has a corresponding GNU style long option, as detailed below.
.PP
.B gawk
recognizes the following command-line options:
.IP "\fB\-F \fIfs\fR"
.IS "\fB\-\-field-separator=\fIfs\fR"
Use
.I fs
for the input field separator (the value of the predefined variable
.BR FS ).
.IP "\fB\-v \fIvariable\^\fB=\fIvalue\fR"
.IS "\fB\-\-assign=\fIvariable\^\fB=\fIvalue\fR"
Assign
.I value
to
.I variable
before executing the program.
.I value
is available to the
.B BEGIN
block of an AWK program.
.IP "\fB\-f \fIprogram-file\fR"
.IS "\fB\-\-file=\fIprogram-file\fR"
Read the AWK program's source from file
.IR program-file ,
instead of from the first command-line argument.
The
.B awk
command line can contain more than one
.B \-f
or
.B \-\-file
options.
.IP "\fB\-W compat\fR"
.IS "\fB\-\-compat\fR"
Run in compatibility mode.
In compatibility mode,
.B gawk
behaves identically to \*(UN
.BR awk ;
it recognizes none of the GNU-specific extensions are recognized.
These extensions are described below.
.IP "\fB\-W copyleft\fR"
.IS "\fB\-W copyright\fR"
.IS "\fB\-\-copyleft\fR"
.IS "\fB\-\-copyright\fR"
Print the short version of the GNU copyright
information message on the standard error.
.IP "\fB\-W help\fR"
.IS "\fB-W usage\fR
.IS "\fB\-\-help\fR"
.IS "\fB\-\-usage\fR"
Print a relatively short summary of the available options on the
standard error.
.IP "\fB\-W lint\fR"
.IS "\fB\-\-lint\fR"
Provide warnings about constructs that are
dubious or non-portable to other implementations of AWK.
.IP "\fB\-W posix\fR"
.IS "\fB\-\-posix\fR"
This turns on compatibility mode, with the
following additional restrictions:
.RS
.IP \(bu 0.3i
The `\ex' escape sequences are not recognized.
.IP \(bu
The synonym
.B func
for the keyword function is not recognized.
.IP \(bu
The operators ``**'' and ``**='' cannot be used in
place of `^' and ``^=''.
.RE
.IP "\fB\-W source=\fIprogram-text\fR"
.IP "\fB\-\-source=\fIprogram-text\fR"
Use
.I program-text
as the AWK program's source code.
This option allows the easy intermixing of library functions
(used via the options
.B \-f
and
.BR \-\-file )
with source code entered on the command line.
It is intended primarily for medium to large AWK programs used in
shell scripts.
The
.B "\-W source="
form of this option uses the rest of the command line argument for
.IR program-text ;
no other options to
.B \-W
will be recognized in the same argument.
.IP "\fB-W version\fR"
.IS "\fB\-\-version\fR"
Print version information for this particular
copy of
.B gawk
on the standard error.
This is useful mainly for knowing if your copy of
.B gawk
is up to date with what the Free Software Foundation is distributing.
.IP "\fB\-\-\fR"
Signal the end of options.
This is useful to allow further arguments to the AWK program
itself to start with a `-'.
This is mainly for consistency with the argument parsing convention
used by most other \*(PX programs.
.PP
All other options are flagged as illegal and ignored.
.SH "AWK Program Execution"
An AWK program consists of a sequence of pattern/action
statements, plus optional function definitions:
.DM
	pattern { action statements }
	function name(parameter list) { statements }
.DE
.PP
.B gawk
first reads the program source from the program file (or files)
if specified, or from the first non-option argument on the command line.
The option
.B -f
may be used multiple times on the command line.
.B gawk
reads the program text as if all the program-files had been concatenated.
This is useful for building libraries of AWK functions,
without having to include them in each new AWK program that uses them.
To use a library function in
a file from a program typed in on the command line, specify
.B /dev/tty
as one of the program files, type your program, and end it with a
.BR <ctrl-D> .
.PP
The environment variable
.B AWKPATH
specifies a search path to use when finding source files
named with the option
.BR \-f .
If this variable does not exist, the default path is:
.DM
	.:/usr/lib/awk:/usr/local/lib/awk
.DE
.PP
If a file name given to the
.B \-f
option contains a `/' character,
.B gawk
does not perform a path search.
.PP
.B gawk
executes AWK programs in the following order:
.IP \fB1.\fR 0.3i
.B gawk
compiles the program into an internal form.
.IP \fB2.\fR
All variable assignments specified via the
.B \-v
option are performed.
.IP \fB3.\fR
.B gawk
executes the code in the
.B BEGIN
block (or blocks), should there be any.
.IP \fB4.\fR
.B gawk
then proceeds to read each file named in the
.B ARGV
array.
If no files are named on the command line,
.B gawk
reads the standard input.
.PP
If a file name on the command line has the form
\fIvariable\^\fB=\fIvalue\^\fR,
.B gawk
treats it as a variable assignment, and assigns
.I value
to
.IR variable .
(This happens after every
.B BEGIN
block has been run.)
Command-line assignment of variables
is most useful when you wish to assign values dynamically to the
variables AWK uses to control how input is broken into fields and records.
It is also useful for controlling the state of program execution
if multiple passes are needed over a single data file.
.PP
If the value of a particular element of
.B ARGV
is empty (""),
.B gawk
skips it.
.PP
For each line in the input,
.B gawk
tests to see if it matches any pattern in the AWK program.
It tests the patterns in the order they occur in the program.
For each pattern that the line matches,
.B awk
executes action associtaed with that pattern.
.PP
Finally, after all the input is exhausted,
.B gawk
executes the code in every
.B END
block.
.SH "Variables and Fields"
AWK variables are dynamic:
they come into existence when they are first used.
Their values are floating-point numbers, strings, or both,
depending upon how they are used.
AWK also has one dimensional arrays:
multiply dimensioned arrays can be simulated.
Several pre-defined variables are set as a program runs;
these are described as needed and summarized below.
.Sh Fields
As it reads a line of input,
.B gawk
splits that line into fields.
The variable
.B FS
defines how fields are separated:
.IP \(bu 0.3i
If
.B FS
is a single character, fields are separated by that character.
.IP \(bu
If
.B FS
is longer than one character, it must be a regular expression.
In this case, the value of variable
.B IGNORECASE
(described below) also affects how fields are split.
.B FS
is a regular expression.
.IP \(bu
In the special case that
.B FS
is a single space character,
fields are separated by a number of space characters or tab characters.
.PP
If variable
.B FIELDWIDTHS
is set to a space-separated list of numbers,
each field is expected to have a fixed width:
.B gawk
splits up the record using the specified widths, and
ignores the value of
.BR FS .
Assigning a new value to
.B FS
overrides the use of
.BR FIELDWIDTHS ,
and restores the default behavior.
.PP
Each field in the input line can be referenced by its position:
.BR $1 ,
.BR $2 ,
and so on.
$0 is the whole line.
.PP
The value of a field may be assigned to as well.
Fields need not be referenced by constants.
For example, the AWK expression
.DM
	n = 5
	print $n
.DE
.PP
prints the fifth field in the input line.
The variable
.B NF
holds the total number of fields in the input line.
.PP
References to non-existent fields (i.e.,
fields after
.BR $NF )
produce the null string.
However, assigning to a nonexistent field
(e.g., \fB$(NF+2) = 5\fR) increases the value of
.BR NF ;
creates any intervening fields, with the null string
as the value of each; and causes the value of
.B $0
to be recomputed, with the fields being separated by the value of
.BR OFS .
.Sh "Built-in Variables"
The following variables are built into AWK:
.IP \fBARGC\fR
The number of command-line arguments.
Note that this does not include options to
.BR gawk ,
or the program source.
.IP \fBARGIND\fR
The index in
.B ARGV
of the file now being processed.
.IP \fBARGV\fR
Array of command-line arguments.
The array is
indexed from through to \fBARGC\fR minus one.
Dynamically changing the contents of
.B ARGV
can control the files used for data.
.IP \fBCONVFMT\fR
The conversion format for numbers \(em by default, ``%.6g''.
.IP \fBENVIRON\fR
An array containing the values of the current environment.
The array is indexed by the environment variables, each element being the
value of that variable (e.g., \fBENVIRON["HOME"]\fR
might be
.BR /u/arnold ).
Changing this array does not affect the environment seen by programs
which gawk spawns via redirection or the function
.BR system() .
(This may change in a future version of
.BR gawk .)
.IP \fBERRNO\fR
If a system error occurs while performing redirection for
.BR getline() ,
during a read for
.BR getline() ,
or during a close,
.B ERRNO
contains a string describing the error.
.IP \fBFIELDWIDTHS\fR
A white-space separated list of fieldwidths.
When set,
.B gawk
parses the input into fields of fixed width,
instead of using the value of the variable
.B FS
as the field separator.
The fixed field-width facility is still experimental;
expect the semantics to change as
.B gawk
evolves over time.
.IP \fBFILENAME\fR
The name of the current input file.
If no files are specified on the command line, the
value of
.B FILENAME
is `-'.
However,
.B FILENAME
is undefined within the
.B BEGIN
block.
.IP \fBFNR\fR
The number of the record within the current input file
that is now being processed.
.IP \fBFS\fR
The input field separator.
By default, this is a blank.
.IP \fBIGNORECASE\fR
Tell
.BR gawk 's
pattern-matching features to ignore the case when they compare
text with a pattern.
When
.B IGNORECASE
is set to a nonzero function, the following features of
.B gawk
are affected:
.RS
.IP \(bu 0.3i
Pattern-matching within rules
.IP \(bu
Fieldsplitting with
.BR FS .
.IP \(bu
Regular expression matching with `~' and ``!~''.
.IP \(bu
The operation of the pre-defined
.B gawk
functions
.BR gsub() ,
.BR index() ,
.BR match() ,
.BR split() ,
and
.BR sub() .
.RE
.IP
Thus, if
.B IGNORECASE
is not equal to zero, pattern
.DM
	/aB/
.DE
.IP
matches all of the following:
.DM
	ab
	aB
	Ab
	AB
.DE
.IP
As with all AWK variables, the initial value of
.B IGNORECASE
is zero, so all regular expression operations are normally case-sensitive.
.IP \fBNF\fR
The number of fields in the current input record.
.IP \fBNR\fR
The total number of input records seen so far.
.IP \fBOFMT\fR
The output format for numbers \(em by default ``%.6g''.
.IP \fBOFS\fR
The output-field separator \(em by default a space character.
.IP \fBORS\fR
The output-record separator \(em by default a newline.
.IP \fBRS\fR
The input record separator \(em by default a newline.
.B RS
is exceptional in that only the first character of its string value
is used to separate records.
(This will probably
change in a future release of
.BR gawk .)
If
.B RS
is set to the null string, then records are separated by blank lines.
When
.B RS
is set to the null string, then the newline character always
acts as a field separator, in addition to whatever value
.B FS
may have.
.IP \fBRSTART\fR
The index of the first character matched by the
.B gawk
function
.BR match() :
zero if no match.
.IP \fBRLENGTH\fR
The length of the string matched by
.BR match() :
\-1 if no match.
.IP \fBSUBSEP\fR
The character used to separate multiple subscripts in array elements \(em
by default ``\034''.
.Sh Arrays
Arrays are subscripted with an expression between square
brackets (`[' and `]').
If the expression is an expression
.DS
.I
	list (expr, expr ...)
.DE
.PP
then the array subscript is a
string consisting of the concatenation of the (string)
value of each expression, separated by the value of the variablen
.BR SUBSEP .
This facility simulates multi-dimensional arrays.
For example,
.DM
	i = "A" ; j = "B" ; k = "C"
	x[i, j, k] = "hello, world\en"
.DE
.PP
assigns the string
.DM
	"hello, world\en"
.DE
.PP
to the element of the
array x which is indexed by the string
.DM
	"A\e034B\e034C".
.DE
.PP
All arrays in AWK are associative, i.e., indexed by string values.
.PP
The special operator in may be used in an
.B if
or
.B while
statement to see if an array has an index that consists of a particular value:
.DS
.I
	if (val in array)
	print array[val]
.DE
.PP
If the array has multiple subscripts, use
.B "(i, j)"
in array.
.PP
You can also use the construct
.B in
within a
.B for
loop to iterate through all the elements of an array.
.PP
An element can be deleted from an array using the statement
.BR delete .
.Sh "Variable Typing And Conversion"
Variables and fields can be floating-point numbers, strings, or both.
How the value of a variable is interpreted depends upon its context.
If a variable or field is used in a numeric expression,
.B gawk
treats it as a number;
if used as a string,
.B gawk
treats it as a string.
To force a variable to be treated as a number, add zero to it;
to force it to be treated as a string, concatenate it with the null string.
.PP
When a string must be converted to a number,
the conversion is accomplished by the library function
.BR atof() .
A number is converted
to a string by using the value of
.B CONVFMT
as a format
string for
.BR sprintf() ,
with the numeric value of the variable as the argument.
However, even though all numbers in AWK are floating point,
integral values are always converted as integers.
Thus, given
.DM
	CONVFMT = "%2.2f"
	a = 12
	b = a ""
.DE
.PP
the variable
.B b
has a value of 12, not 12.00.
.PP
.B gawk
performs comparisons as follows:
.IP \(bu 0.3i
If two variables are
numeric, they are compared numerically.
.IP \(bu
If one value is numeric and the other has a string value that is a
``numeric string,'' then comparisons are also done numerically.
.IP \(bu
Otherwise, the numeric value is converted to a
string and a string comparison is performed.
.PP
Two strings are compared, of course, as strings.
According to the \*(PX standard, even if two strings are numeric strings,
a numeric comparison is performed;
however, this is clearly incorrect, and
.B gawk
does not do this.
.PP
Uninitialized variables have the numeric value zero and the
string value "" (the null, or empty, string).
.SH "Patterns and Actions"
AWK is a line-oriented language:
the pattern comes first, and then the action.
Action statements are enclosed in `{' and `}'.
Either the pattern may be missing, or the action may be missing,
but (of course) not both.
If the pattern is missing, AWK executes the action for every line of input.
A missing action is equivalent to
.DM
	{ print }
.DE
.PP
which prints the entire line.
.PP
Comments begin with the character `#', and continue to the end of the line.
Blank lines can be used to separate statements.
Normally, a statement ends with a newline;
however, this is not the case for lines ending in any of the
following characters:
.DM
	,	{	?	:	&&	||
.DE
.PP
Lines that end in one of the above characters
have their statements automatically continued on the following line.
In other cases, a line can be continued by ending it with a `\e',
in which case the newline will be ignored.
.PP
Multiple statements may be put on one line by separating
them with a `;'.
This applies to both the statements within the action part of
a pattern/action pair (the usual case),
and to the pattern/action statements themselves.
.Sh Patterns
AWK patterns may be one of the following:
.DM
	BEGIN
	END
	/regular expression/
	relational expression
	pattern && pattern
	pattern || pattern
	pattern ? pattern : pattern
	(pattern)
	! pattern
	pattern1, pattern2
.DE
.PP
.B BEGIN
and
.B END
are two special patterns that are not tested against the input.
The action parts of all
.B BEGIN
patterns are merged as if all the statements had
been written in a single
.B BEGIN
block.
They are executed before any of the input is read.
Likewise,
.B gawk
merges all the
.B END
patterns and executes them when all the input is exhausted
(or when an
.B exit
statement is executed).
.B BEGIN
and
.B END
patterns cannot be combined with other patterns in pattern expressions.
.B BEGIN
and
.B END
patterns must have action parts.
.PP
For
.DM
	/regular expression/
.DE
.PP
patterns, the associated statement is executed for each input line
that matches the regular expression.
Regular expressions are the same as those described in the Lexicon entry
for the shell
.BR sh ,
and are summarized below.
.PP
A relational expression may use any of the operators
defined below in the section on actions.
These generally test whether certain fields match certain regular expressions.
.PP
The operators
.BR && ,
.BR || ,
and
.B !
are logical AND, logical OR, and logical NOT, respectively, as in C.
They do short-circuit evaluation, also as in C, and are used for combining
more primitive pattern expressions.
As in most languages, parentheses may be used to change the order of
evaluation.
.PP
The operator
.B ?:
is like the same operator in C.
If the first pattern is true then the pattern used for testing is
the second pattern, otherwise it is the third.
Only one of the second and third patterns is evaluated.
.PP
The
.DS
	\fIpattern1, pattern2\fR
.DE
.PP
form of an expression is called a ``range pattern''.
It matches all input records starting with a line that matches
.IR pattern1 ,
and continues until it reads a record that matches
.IR pattern2 ,
inclusive.
It does not combine with any other sort of pattern expression.
.Sh "Regular Expressions"
Regular expressions are the extended kind found in the shell
.BR sh .
They are composed of characters, as follows:
.IP \fIc\fR
Match the non-meta-character
.IR c .
.IP \fB\e\fIc\fR
Match the literal character
.IR c .
.IP \fB.\fR
Match any character except newline.
.IP \fB^\fR
Match the beginning of a line or a string.
.IP \fB$\fR
Match the end of a line or a string.
.IP \fB[\fIabc...\fB]\fR
Character class:
Match any of the characters
.IR abc... .
.IP \fB[^\fIabc...\fB]\fR
Negated character class:
Match any character except
.I abc...
and newline.
.IP \fIr1\fB|\fIr2\fR
Alternation:
match either
.I r1
or
.IR r2 .
.IP \fIr1r2\fR
Concatenation:
Match
.IR r1 ,
then
.IR r2 .
.IP \fIr\fB+\fR
Match one or more
.IR r 's.
.IP \fIr\fB*\fR
Match zero or more
.IR r 's.
.IP \fIr\fB?\fR
Match zero or one
.IR r 's.
.IP \fB(\fIr\fB)\fR
Grouping: match
.IR r .
.PP
The escape sequences that are valid in string constants
(see below) are also legal in regular expressions.
.Sh Actions
Action statements are enclosed in braces, `{' and `}'.
Action statements consist of the usual assignment, conditional,
and looping statements found in most languages.
The operators, control statements, and input/output statements
available are patterned after those in C.
.Sh Operators
The following gives AWK's operators, in order of increasing precedence:
.DS
	= += -=
	*= /= %= ^= = (assignment)
.DE
.PP
Both absolute assignment (\fIvar\fB = \fIvalue\^\fR)
and operator-assignment (the other forms) are supported.
.IP \fB?: \fR\(em the C conditional expression"
This has the form
.DS
	\fIexpr1 \fB? expr2 \fB: \fIexpr3\fR
.DE
.IP
If
.I expr1
is true, the value of the expression is
.IR expr2 ;
otherwise it is
.IR expr3 .
Only one of
.I expr2
and
.I expr3
is evaluated.
.IP "\fB|| \fR\(em logical OR"
.IS "\fB&& \fR\(em logical AND"
.IS "\fB~ \fR\(em Regular expression match"
.IS "\fB!~ \fR\(em Negated match"
Do not use a constant regular expression (\fB/foo/\fR)
on the left-hand side of a `~' or `!~'.
Only use one on the right-hand side.
The expression
.DS
	\fB/foo/ ~ \fIexp\fR
.DE
.IP
has the same meaning as:
.DS
	\fB(($0 ~ /foo/) ~ \fIexp\^\fB)\fR
.DE
.IP
This is usually not what was intended.
.IP "\fB< >\fR"
.IS "\fB<= >=\fR"
.IS "\fB!=\fR"
.IS "\fB==\fR"
The regular relational operators.
.IP \fB<blank>\fR
String concatenation.
.IP "\fB+\fR"
.IS "\fB\-\fR"
Addition and subtraction.
.IP "\fB*\fR"
.IS "\fB/\fR"
.IS "\fB%\fR"
Multiplication, division, and modulus.
.IP "\fB+\fR"
.IS "\fB\-\fR"
.IS "\fB!\fR"
Unary plus, unary minus, and logical negation.
.IP \fB^\fR
Exponentiation.
The operator `**' may also be used, and `**='
for the assignment operator.
.IP \fB++\fR
.IS \fB\-\-\fR
Increment and decrement, both prefix and suffix.
.IP \fB$\fR
Field reference.
.Sh "Control Statements"
The control statements are as follows:
.DS
	\fBif (\fIcondition\^\fB) \fIstatement \fB[ else \fIstatement \fB]\fR
	\fBwhile (\fIcondition\^\fB) \fIstatement\fR
	\fBdo \fIstatement \fBwhile (\fIcondition\^\fB)\fR
	\fBfor (\fIexpr1\^\fB; \fIexpr2\^\fB; \fIexpr3\^\fB) \fIstatement\fR
	\fBfor (\fIvar in array\^\fB) \fIstatement\fR
	\fBbreak\fR
	\fBcontinue\fR
	\fBdelete array[\fIindex\^\fB]\fR
	\fBexit [ \fIexpression\fR ]
	\fB{ \fIstatements \fB}\fR
.DE
.Sh "I/O Statements"
AWK recognizes the following input/output statements:
.IP \fBclose(\fIfilename\^\fB)\fR
Close file or pipe.
.IP \fBgetline\fR
Set
.B $0
from next input record.
This statement also sets the built-in variables
.BR NF ,
.BR NR ,
and
.BR FNR .
.IP "\fBgetline <\fIfile\fR"
Set
.B $0
from next record of file.
This statement also sets the built-in variable
.BR NF .
.IP "\fBgetline \fIvar\fR"
Set
.I var
from next input record.
This statment also sets the built-in variables
.B NF
and
.BR FNR .
.IP "\fBgetline \fIvar \fB<\fIfile\fR"
Set
.I var
from next record of
.IR file .
.IP "\fBnext\fR"
Stop processing the current input record.
The next input record is read and processing starts over with
the first pattern in the AWK program.
If the end of the input data is reached, each
.B END
block is executed.
.IP "\fBnext \fIfile\fR"
Stop processing the current input file.
The next input record read comes from the next input file.
.B FILENAME
is updated,
.B FNR
is reset to one, and processing starts over with
the first pattern in the AWK program.
If the end of the input data is reached, every
.B END
is executed.
.IP "\fBprint\fR"
Print the current record.
.IP "\fBprint \fIexpr-list\fR"
Print each expression in
.IR expr-list .
.IP "\fBprint \fIexpr-list \fB>\fIfile\fR"
Print expressions on file.
.IP "\fBprintf \fIfmt\^\fB, \fIexpr-list\fR"
Format and print.
.IP "\fBprintf \fIfmt\^\fB, \fIexpr-list \fB>\fIfile\fR"
Format and print into
.IR file .
.IP "\fBsystem(\fIcmd-line\^\fB)\fR
Execute the command
.IR cmd-line ,
and return its exit status.
.PP
Other input/output redirections are also allowed.
For
.B print
and
.BR printf ,
\fB>>\fIfile\fR appends output onto
.IR file ,
whereas a `\fB|\fR' command writes onto a pipe.
Likewise, command \fB|getline pipes into
.BR getline .
.B getline
returns zero when it reads EOF, and \-1 if an error occurs.
.Sh "The printf Statement\fR"
The AWK statement
.B printf
and the function
.B sprintf()
(see below) accept the following conversion specification formats:
.IP \fB%c\fR
An ASCII character.
If the argument used for
.B %c
is numeric, it is treated as a character and printed.
Otherwise, the argument is assumed to be a string,
and the only first character of that string is printed.
.IP \fB%d\fR
A decimal number (the integer part).
.IP \fB%i\fR
Just like
.BR %d .
.IP \fB%e\fR
A floating-point number of the form \fB[-]\fId\^\fB.\fIdddddd\fBE[+-]\fIdd\fR.
.IP \fB%f\fR
A floating-point number of the form \fB[-]\fIddd\^\fB.\fIdddddd\fR.
.IP \fB%g\fR
Use `e' or `f' conversion, whichever is shorter,
with nonsignificant zeros suppressed.
.IP \fB%o\fR
An unsigned octal number (again, an integer).
.IP \fB%s\fR
A character string.
.IP \fB%x\fR
An unsigned hexadecimal number (an integer).
.IP \fB%X\fR
Like
.BR %x ,
but using ``ABCDEF'' instead of ``abcdef''.
.IP \fB%%\fR
A single `%' character; no argument is converted.
.PP
There are optional, additional parameters that may lie
between the `%' and the control letter:
.IP \fB\-\fR
The expression should be left-justified within its field.
.IP \fIwidth\fR
The field should be padded to this width.
If the number has a leading zero, then the field will be
padded with zeroes;
otherwise, it is padded with blanks.
.IP \fB.\fIprec\fR
A number that indicates the maximum width of the string or
digit to the right of the decimal point.
.PP
The dynamic width and precision capabilities of the ANSI C
.B printf()
routines are supported.
A `*' in place of either the width or precision
specification causes AWK to take its value
from the argument list to
.B printf
or
.BR sprintf() .
.Sh "Special File Names"
When doing I/O redirection from either
.B print
or
.B printf
into a file, or via
.B getline
from a file,
.B gawk
recognizes certain special file names internally.
These file names allow access to open file descriptors inherited from
.BR gawk 's
parent process (usually the shell).
Other special file names provide access information about the running
.B gawk
process.
The file names are as follows:
.IP \fB/dev/pid\fR
Reading this file returns the identfier of
the current process, in decimal, terminated with a newline.
.IP \fB/dev/ppid\fR
Reading this file returns the identifier of the current's process's parent,
in decimal, terminated with a newline.
.IP \fB/dev/pgrpid\fR
Reading this file returns the current process's group identifier,
in decimal, terminated with a newline.
.IP \fB/dev/user\fR
Reading this file returns a single record terminated with a newline.
The fields are separated with blanks.
.B $1
is the value of the system call
.BR getuid() ;
.B $2
is the value of the system call
geteuid() ;
.B $3
is the value of the system call
.BR getgid() ;
and
.B $4
is the value of the system call
.BR getegid() .
If there are any
additional fields, they are the group identifiers
returned by
.BR getgroups() .
.IP \fB/dev/stdin\fR
The standard input.
.IP \fB/dev/stdout\fR
The standard output.
.IP \fB/dev/stderr\fR
The standard error output.
.IP \fB/dev/fd/\fIn\fR
The file associated with the open-file descriptor
.IR n .
.PP
These are particularly useful for error messages.
For example, these files let you use the statement
.DM
	print "You blew it!" > "/dev/stderr"
.DE
.PP
where otherwise you would have had to say:
.DM
	print "You blew it!" | "cat 1>&2"
.DE
.PP
These file names may also be used on the command line to name data files.
.Sh "Numeric Functions"
AWK contains the following pre-defined arithmetic functions:
.IP "\fBatan2(\fIy\^\fB, \fIx\fB)\fR"
Return the arctangent of \fIy\fB/\fIx\fR, in radians.
.IP "\fBcos(\fIexpr\^\fB)\fR"
Returns the cosine, in radians.
.IP "\fBexp(\fIexpr\^\fB)\fR"
The exponential function.
.IP "\fBint(\fIexpr\^\fB)\fR"
Truncate to integer.
.IP "\fBlog(\fIexpr\^\fB)\fR"
The natural-logarithm function.
.IP "\fBrand()\fR"
Returns a random number between zero and one.
.IP "\fBsin(\fIexpr\^\fB)\fR"
Return the sine in radians.
.IP "\fBsqrt(\fIexpr\^\fB)\fR"
The square-root function.
.IP "\fBsrand(\fIexpr\^\fB)\fR"
Use
.I expr
as a new seed for the random number generator.
If no
.I expr
is provided, the time of day will be used.
The return value is the previous seed for the random number generator.
.Sh "String Functions"
AWK has the following pre-defined string functions:
.IP "\fBgsub(\fIr\^\fB, \fIs\^\fB, \fIt\^\fB)\fR"
For each substring matching the regular expression
.I r
in the string
.IR t ,
substitute the string
.I s
and return the number of substitutions.
If
.I t
is not supplied, use
.BR $0 .
.IP "\fBindex(\fIs\^\fB, \fIt\fB)\fR"
Return the index of the string
.I t
in the string
.IR s ,
or zero if
.I t
is not present.
.IP \fBlength(\^\fBs\^\fB)\fR
Return the length of the string
.IR s ,
or the length of
.B $0
if
.I s
is not supplied.
.IP "\fBmatch(\fIs\^\fB, \fIr\^\fB)\fR"
Return the position in
.I s
where the regular expression
.I r
occurs, or zero if
.I r
is not present, and set the values of
.B RSTART
and
.BR RLENGTH .
.IP "\fBsplit(\fIs\^\fB, \fIa\^\fB, \fIr\^\fB)\fR"
Split the string
.I s
into the array
.I a
on the regular expression
.IR r ,
and return the number of fields.
If
.I r
is omitted, use
.B FS
instead.
.IP "\fBsprintf(\fIfmt\^\fB, \fIexpr-list\^\fB)\fR"
Print
.I expr-list
according to
.IR fmt ,
and return the resulting string.
.IP "\fBsub(\fIr\^\fB, \fIs\^\fB, \fIt\^\fB)\fR"
Just like
.BR gsub() ,
but only the first matching substring is replaced.
.IP "\fBsubstr(\fIs\^\fB, \fIi\^\fB, \fIn\^\fB)\fR"
Return the \fIn\fR-character substring of
.I s
starting at
.IR i .
If
.I n
is omitted, the rest of
.I s
is used.
.IP "\fBtolower(\fIstr\^\fB)\fR"
Return a copy of the string
.IR str ,
with all the upper-case characters in
.I str
translated to their corresponding lower-case counterparts.
Non-alphabetic characters are left unchanged.
.IP "\fBtoupper(\fIstr\^\fB)\fR"
Return a copy of the string
.IR str ,
with all the lower-case characters in
.I str
translated to their corresponding upper-case counterparts.
Non-alphabetic characters are left unchanged.
.Sh "Time Functions"
Because one of the primary uses of AWK programs is processing
log files that contain time stamp information,
.B gawk
provides the following two functions for obtaining time
stamps and formatting them.
.IP \fBsystime()\fR
Return the current time of day as the number of
seconds since 00:00:00 hours on January 1, 1970 GMT.
.IP "\fBstrftime(\fIformat\^\fB, \fItimestamp\^\fB)\fR"
Format
.I timestamp
according to the specification within
.IR format .
.I timestamp
should be of the same form as returned by
.BR systime() .
If
.I timestamp
is
missing, the current time of day is used.
See the Lexicon entry for
.B strftime()
for the format conversions that are guaranteed to be available.
.Sh "String Constants"
String constants in AWK are sequences of characters
enclosed between quotation marks `"'.
Within a string, the following escape sequences are recognized:
.DS
	\fB\e\e\fR	Literal backslash
	\fB\ea\fR	The BEL character
	\fB\eb\fR	Backspace
	\fB\ef\fR	Form-feed
	\fB\en\fR	New line
	\fB\er\fR	Carriage return
	\fB\et\fR	Horizontal tab
	\fB\ev\fR	vertical tab.
	\fB\ex\fIXX\fR	Character with hexadecimal value \fIXX\fR
	\fB\e\fIOOO\fR	Character represented by octal digits \fIOOO\fR
	\fB\e\fIc\fR	The literal character \fIc\fR
.DE
.PP
The escape sequences may also be used within constant regular expressions
(e.g., \fB/[\et\ef\en\er\ev]/\fR matches whitespace characters).
.SH Functions
AWK defines a function as follows:
.DS
	\fIfunction name\fB(\fIparameter list\^\fB) { \fIstatements \fB}\fR
.DE
.PP
AWK executes a function when it is called from within the action
part of a regular \fIpattern\fB/\fIaction\fR statement.
The parameters supplied in the function call are used to instantiate
the formal parameters declared within the function.
Arrays are passed by reference, other variables are passed by value.
.PP
Because functions were not originally part of the AWK language,
the provision for local variables is rather clumsy:
they are declared as extra parameters in the parameter list.
The convention is to separate local variables from
real parameters by extra spaces in the parameter list.
For example:
.DM
	function f(p, q, a, b) { # a & b are local
		.....
	}
.sp \n(pDu
	/abc/ { ...
	; f(1, 2) ; ...
	}
.DE
.PP
The left parenthesis in a function call is required to
immediately follow the function name, without any intervening white space.
This is to avoid a syntactic ambiguity with the concatenation operator.
This restriction
does not apply to the built-in functions listed above.
.PP
Functions may call each other and may be recursive.
Function parameters used as local variables are initialized to
the null string and the number zero upon function invocation.
.PP
The word
.B func
may be used in place of
.BR function .
.SH Examples
Print and sort the login names of every user on your system:
.DM
	BEGIN { FS = ":" }
	{ print $1 | "sort" }
.DE
.PP
Count lines in a file:
.DM
	{ nlines++ }
	END { print nlines }
.DE
.PP
Precede each line by its number in the file:
.DM
	{ print FNR, $0 }
.DE
.PP
Concatenate and line number (a variation on a theme):
.DM
	{ print NR, $0 }
.DE
.SH "Compatibility
A primary goal for
.B gawk
is compatibility with the \*(PX standard,
as well as with the latest version of \*(UN
.BR awk .
To this end,
.B gawk
incorporates the following user-visible features that
are not described in the AWK book, but are part of
.B awk
in System V Release 4, and are in the \*(PX standard:
.IP \(bu 0.3i
The option
.B \-v
for assigning variables before program execution starts is new.
The book indicates that command
line variable assignment happens when awk would otherwise
open the argument as a file, which is after the
.B BEGIN
block is executed.
However, in earlier implementations,
when such an assignment appeared before any file names,
the assignment would happen before the
.B BEGIN
block was run.
Applications came to depend on this ``feature.''
When
.B awk
was changed to match its documentation,
this option was added to accomodate applications that depended
upon the old behavior.
(This feature was agreed upon by both the AT&T and GNU developers.)
.IP \(bu
The option
.B \-W
for implementation specific features is from the \*(PX standard.
.IP \(bu
When processing arguments, gawk uses the special option
``\-\-'' to signal the end of arguments, and warns about,
but otherwise ignores, undefined options.
.IP \(bu
The AWK book does not define the return value of
.BR srand() .
The System V Release 4 version of \*(UN awk (and the \*(PX
standard)
has it return the seed it was using,
to allow keeping track of random number sequences.
Therefore,
.B srand()
in
.B gawk
also returns its current seed.
.IP \(bu
Other new features include the following:
use of multiple
.B \-f
options (from MKS \fBawk\fR); the
.B ENVIRON
array; the escape sequences \fB\ea\fR and \fB\ev\fR
(done originally in
.B gawk
and fed back into AT&T's); the built-in functions
.B tolower()
and
.B toupper()
(from AT&T); and the ANSI-C conversion specifications in
.B printf
(done first in AT&T's version).
.SH "GNU Extensions"
.B gawk
has some extensions to \*(PX
.BR awk .
They are described in this section.
All the extensions described here can be
disabled by invoking
.B gawk
with the command-line option
.BR "\-W compat" .
The following features of
.B gawk
are not available in \*(PX
.BR awk :
.IP \(bu 0.3i
The escape sequence \fB\ex\fR.
.IP \(bu
The functions
.B systime()
and
.BR strftime() .
.IP \(bu
The special-file names available for I/O redirection.
.IP \(bu
The variables
.B ARGIND
and
.B ERRNO
are not special.
.IP \(bu
The variable
.B IGNORECASE
and its side-effects are not available.
.IP \(bu
The variable
.B FIELDWIDTHS
and fixed-width field splitting.
.IP \(bu
No path search is performed for files named via
the option
.BR \-f .
Therefore, the environmental variable
.B AWKPATH
is not special.
.IP \(bu
The use of next file to abandon processing of the current input file.
.PP
The AWK book does not define the return value of the function
.BR close() .
.BR gawk 's
.B close()
returns the value from
.B fclose()
or
.B pclose()
when closing a file or pipe, respectively.
When
.B gawk
is invoked with the option
.BR "-W compat" ,
if the
.I fs
argument to option
.B \-F
is `t', then
.B FS
will be set to the tab character.
Because this is a rather ugly special case, it is not the default behavior.
This behavior also does not occur if
.B \-Wposix
has been specified.
.SH "Historical Features"
There are two features of historical AWK implementations that
.B gawk
supports.
First, it is possible to call the
.B length()
built-in function not only with no argument, but even without parentheses!
Thus
.DM
	a = length
.DE
.PP
is the same as either of
.DM
	a = length()
	a = length($0)
.DE
.PP
This feature is marked as ``deprecated'' in the \*(PX
standard, and
.B gawk
will issue a warning about its use if option
.B \-Wlint
is specified on the command line.
.PP
The other feature is the use of the
.B continue
statement outside the body of a
.BR while ,
.BR for ,
or
.B do
loop.
Traditional
AWK implementations have treated such usage as equivalent
to the next statement.
.B gawk
supports this usage if
.B \-Wposix
has not been specified.
.SH "See Also"
.Xr "awk," awk
.Xr "commands," commands
.Xr "Programming COHERENT" programmi
.br
\fIIntroduction to the awk Language\fR, tutorial.
.br
Aho, Alfred V.; Kernighan, Brian W.; Weinberger, Peter J.:
\fIThe AWK Programming Language.\fR
Englewood Cliffs, NJ, Addison-Wesley, Inc., 1988
(ISBN 0-201-07981-X).
.br
\fIThe GAWK Manual,\fR
ed 0.15.
Boston, The Free Software Foundation, 1993.
.SH Notes
The option
.B \-F
option is not necessary given the command line
variable assignment feature;
it remains only for backwards compatibility.
.PP
If your system actually has support for
.B /dev/fd
and the associated
.BR /dev/stdin ,
.BR /dev/stdout ,
and
.B /dev/stderr
files, you may get different output from
.B gawk
than you would get on a system without those files.
When
.B gawk
interprets these files internally, it synchronizes output to the
standard output with output to
.BR /dev/stdout ,
while on a system with those files, the output is actually to different
open files.
.I "Caveat utilitor."
.PP
This man page documents gawk, version 2.15.
Please note that with this version,
.B gawk
no longer recognizes the command-line options
.BR \-c ,
.BR -V ,
.BR -C ,
.BR \-a ,
and
.B \-e
that had been recognized by version 2.11.
.PP
.II "Aho, Alfred"
.II "Kernighan, Brian"
.II "Weinberger, Peter"
The original version of \*(UN awk was designed and implemented by
Alfred Aho, Peter Weinberger, and Brian Kernighan of AT&T Bell Laboratories.
Brian Kernighan continues to maintain and enhance it.
.PP
.II "Rubin, Paul"
.II "Fenlason, Jay"
Paul Rubin and Jay Fenlason, of the Free Software Foundation,
wrote
.B gawk
to be compatible with the original version of awk distributed in
\*(UN version 7.
John Woods contributed a number of bug fixes.
David Trueman, with contributions from Arnold Robbins, made
.B gawk
compatible with the new version of \*(UN awk.
.PP
Brian Kernighan of AT&T Bell Laboratories provided valuable assistance
during testing and debugging.
The authors thank him.
.PP
Finally, please note that
.B gawk
and its associated documentation (including this manual page)
is protected by the Free Software Foundation's ``copyleft''.
For details on your rights and obligations, see the file
.B COPYING
in the source code for
.BR gawk ,
which is available through the Mark Williams BBS and other public-domain
systems.
