.TH lex "" "" Command
.PC "Lexical analyzer generator"
\fBlex [\-t][\-v][\fIfile\^\fB]\fR
\fBcc lex.yy.c \-ll\fR
.PP
.HS
.SH Options:
.IC \fB\-t\fR
Write to standard output instead of \fBlex.yy.c\fR
.IC \fB\-v\fR
Give statistics about generated tables
.HE
.II libl
.II "library^lex"
Many programs, e.g., compilers, process highly structured input according to
rules.
Two of the most complicated parts of such programs are
.I "lexical analysis"
and
.I parsing
(also called
.IR "syntax analysis" ).
The \*(CO system includes two powerful tools called
.B lex
and
.B yacc
to help you construct these parts of a program.
.B lex
converts a set of lexical rules into a lexical analyzer, and
.B yacc
converts a set of parsing rules into a parser.
.PP
The output of
.B lex
may be used directly, or may be used by a parser generated by
.BR yacc .
.PP
.B lex
reads a specification from the given
.I file
(or from the standard input if none), and
generates a C function called
.BR yylex() .
.B lex
writes the generated function in the file
.BR lex.yy.c ,
or on standard output if you use the
.B \-t
option.
The
.B \-v
option prints some statistics about the generated tables.
.PP
The
tutorial on
.B lex
that appear in this manual describes
.B lex
in detail.
In brief, the generated function
.B yylex()
matches portions of its input
to one pattern (sometimes called a regular expression) from a set of rules,
or
.IR context ,
and executes associated C commands.
Unmatched portions of the input are copied to the output stream.
.B yylex()
returns EOF when input has been exhausted.
.PP
.B lex
uses the following macros that you may replace with the
preprocessor directive
.B #undef
if you wish:
\fBinput()\fR (read the standard input stream), and
\fBoutput(\fIc\^\fB)\fR (write the character
.I c
to the standard output stream).
You may also replace the following functions if you wish:
\fBmain()\fR (main function),
\fBerror(...)\fR (print error messages; takes same arguments as
.BR printf ),
and
\fByywrap()\fR (handle events at the end of a file).
If an action is desired on end of file, such as arranging for more
input,
.B "yywrap()"
should perform it,
returning zero to keep going.
.PP
A full
.B lex
specification has the following format:
.IP \(bu 0.3i
Macro definitions, of the form:
.DM
	name	pattern
.DE
.IP \(bu
Start condition declarations:
.DM
	%S	NAME ...
.DE
.IP \(bu
Context declarations:
.DM
	%C	NAME ...
.DE
.IP \(bu
Code to be included in the header section:
.DM
	%{
	anything
	%}
	<tab or space> anything
.DE
.IP \(bu
Rules section delimiter (must always be present):
.DM
	%%\fR
.DE
.IP \(bu
Code to appear at the start of \fByylex()\fR:
.DM
	<tab or space> anything
.DE
.IP \(bu
Rules for initial context, in any of the forms:
.DM
	rule		action;
	rule		| (means use next action)
	rule		{
	<tab or space>	action;
	<tab or space>	}
.DE
.IP \(bu
For each additional context:
.DM
	%C	NAME\fR
	...rules for this context...
.DE
.IP \(bu
End of rules section delimiter:
.DM
	%%
.DE
.IP \(bu
Code to be copied verbatim, such as user provided
\fBinput(), output(), yywrap(),\fR or other.
.PP
.B lex
matches the longest string possible;
if two rules match the same length string, the rule specified
first takes precedence.
.B lex
puts the matched string, or
.I token,
in the
.B char
array
.BR "yytext[]" ,
and sets the variable
.B yyleng
to its length.
.PP
Actions may use the following:
.LB
\fBECHO\fR	Output the token
\fBREJECT\fR	Perform action for lower precedence match
\fBBEGIN \fINAME\fR	Set start condition to \fINAME\fR
\fBBEGIN 0\fR		Clear start condition
\fByyswitch(\fINAME\^\fB)\fR	Switch to context \fINAME\^\fR, return current
\fByyswitch(0)\fR	Switch to initial context
\fByynext()\fR	Steal next character from input
\fByyback(\fIc\^\fB)\fR	Put character \fIc\fR back into input
\fByyless(\fIn\^\fB)\fR	Reduce token length to \fIn\fR, put rest back
\fByymore()\fR	Append next token to this one
\fByylook()\fR	Returns number of chars in input buffer
.PP
.B lex
rules are contiguous strings of the form
.DS
	[ <\fINAME,...\^\fR> ][ ^ ] \fItoken\fR [ \fI/lookahead\fR ][ $ ]
.DE
.PP
where brackets `[\|]' indicate optional items.
.LB
<\fINAME,...\fR>	Match only under given start conditions
\fB^\fR	Match the beginning of a line
\fB$\fR	Match the end of a line
\fItoken\fR	Pattern that a given token is to match
\fI/lookahead\fR	Pattern that given trailing text is to match
.PP
Pattern elements:
.LB
\fBa\fR	The character \fBa\fR
\e\fBa\fR	The character \fBa\fR, even if special
\fB.\fR	Any character except newline
\fB[abx-z]\fR	Any of \fBa, b,\fR or \fBx\fR through \fBz\fR
\fB[^abx-z]\fR	Any except \fBa, b,\fR or \fBx\fR through \fBz\fR
\fBabc\fR	The string \fBabc\fR, even if any are special
\fB{\fIname\^\fB}\fR	The macro definition \fIname\fR
\fB(\fIexp\^\fB)\fR	The pattern \fIexp\fR (grouping operator)
.PP
Optional operators on elements:
.LB
\fIe\fB?\fR	Zero or one occurrence of \fIe\fR
\fIe\fB*\fR	Zero or more consecutive \fIe\fRs
\fIe\fB+\fR	One or more consecutive \fIe\fRs
\fIe\^\fB{\fIn\^\fB}\fR	\fIn\fP (a decimal number) consecutive \fIe\fRs
\fIe\^\fB{\fIm,n\^\fB}\fR	\fIm\fR through \fIn\fR consecutive \fIe\fRs
.PP
Patterns may be of the form:
.LB
\fIe1e2\fR	Matches the sequence \fIe1 e2\fR
\fIe1|e2\fR	Matches either \fIe1\fR or \fIe2\fR
.PP
.B lex
recognizes the standard C escapes:
\fB\en\fR, \fB\et\fR, \fB\er\fR, \fB\eb\fR, \fB\ef\fR, and \fB\e\fIooo\fR
(octal representation).
The special characters
.DM
	 \e ( ) < > { } % * + ? [ - ] ^ / $ . |
.DE
.PP
must be prefixed with \e
or enclosed within quotation marks
(excepting \fB"\fR and \e) to be normal.
Within classes, only the characters . ^ - \e and ] are special.
.SH Files
\fB/usr/lib/libl.a
.br
/usr/src/libl/*\fR \(em library source code
.SH "See Also"
.Xr "commands," commands
.Xr "yacc" yacc
.br
\fIIntroduction to lex, the Lexical Analyzer\fR
