lleexx -- Command

Lexical analyzer generator
lleexx [-tt][-vv][_f_i_l_e]
cccc lleexx.yyyy.cc -llll

Many programs,  e.g., compilers, process highly  structured input according
to rules.  Two  of the most complicated parts of  such programs are _l_e_x_i_c_a_l
_a_n_a_l_y_s_i_s  and _p_a_r_s_i_n_g  (also called _s_y_n_t_a_x  _a_n_a_l_y_s_i_s). The  COHERENT system
includes two powerful tools called lleexx and yyaacccc to help you construct these
parts of  a program.  lleexx  converts a set  of lexical rules  into a lexical
analyzer, and yyaacccc converts a set of parsing rules into a parser.

The  output of  lleexx  may be  used  directly, or  may  be used  by a  parser
generated by yyaacccc.

lleexx reads a  specification from the given _f_i_l_e (or  from the standard input
if  none),  and generates  a  C  function called  yyyylleexx().  lleexx writes  the
generated function in  the file lleexx.yyyy.cc, or on standard  output if you use
the -tt  option.  The -vv  option prints some statistics  about the generated
tables.

The tutorial on lleexx that appear in this manual describes lleexx in detail.  In
brief, the generated function yyyylleexx()  matches portions of its input to one
pattern (sometimes  called a  regular expression) from  a set of  rules, or
_c_o_n_t_e_x_t,  and executes  associated C commands.   Unmatched portions  of the
input are copied to the output  stream.  yyyylleexx() returns EOF when input has
been exhausted.

lleexx uses  the following macros  that you may replace  with the preprocessor
directive #uunnddeeff if you wish: iinnppuutt() (read the standard input stream), and
oouuttppuutt(_c) (write  the character _c to the standard  output stream).  You may
also replace  the following functions if you  wish: mmaaiinn() (main function),
eerrrroorr(...)  (print error  messages; takes  same  arguments as  pprriinnttff), and
yyyywwrraapp() (handle events at the end  of a file).  If an action is desired on
end of file, such as arranging  for more input, yyyywwrraapp() should perform it,
returning zero to keep going.

A full lleexx specification has the following format:

-> Macro definitions, of the form:

       name    pattern

-> Start condition declarations:

       %S  NAME ...

-> Context declarations:

       %C  NAME ...

-> Code to be included in the header section:

       %{
       anything
       %}
       <tab or space> anything

-> Rules section delimiter (must always be present):

       %%

-> Code to appear at the start of yyyylleexx():

       <tab or space> anything

-> Rules for initial context, in any of the forms:

       rule        action;
       rule        | (means use next action)
       rule        {
       <tab or space>  action;
       <tab or space>  }

-> For each additional context:

       %C  NAME
       ...rules for this context...

-> End of rules section delimiter:

       %%

-> Code to  be copied  verbatim, such  as user provided  iinnppuutt(), oouuttppuutt(),
   yyyywwrraapp(), or other.

lleexx matches the longest string possible; if two rules match the same length
string, the  rule specified first  takes precedence.  lleexx  puts the matched
string, or _t_o_k_e_n, in the cchhaarr  array yyyytteexxtt[], and sets the variable yyyylleenngg
to its length.

Actions may use the following:

EECCHHOO...........Output the token
RREEJJEECCTT.........Perform action for lower precedence match
BBEEGGIINN _N_A_M_E.....Set start condition to _N_A_M_E
BBEEGGIINN 00........Clear start condition
yyyysswwiittcchh(_N_A_M_E).Switch to context _N_A_M_E, return current
yyyysswwiittcchh(00)....Switch to initial context
yyyynneexxtt().......Steal next character from input
yyyybbaacckk(_c)......Put character _c back into input
yyyylleessss(_n)......Reduce token length to _n, put rest back
yyyymmoorree().......Append next token to this one
yyyyllooookk().......Returns number of chars in input buffer

lleexx rules are contiguous strings of the form

    [ <_N_A_M_E,...> ][ ^ ] _t_o_k_e_n [ /_l_o_o_k_a_h_e_a_d ][ $ ]

where brackets `[]' indicate optional items.

<_N_A_M_E,...>.....Match only under given start conditions
^..............Match the beginning of a line
$..............Match the end of a line
_t_o_k_e_n..........Pattern that a given token is to match
/_l_o_o_k_a_h_e_a_d.....Pattern that given trailing text is to match

Pattern elements:

aa       The character aa
\aa      The character aa, even if special
.       Any character except newline
[aabbxx-zz] Any of aa, bb, or xx through zz
[^aabbxx-zz]Any except aa, bb, or xx through zz
aabbcc     The string aabbcc, even if any are special
{_n_a_m_e}  The macro definition _n_a_m_e
(_e_x_p)   The pattern _e_x_p (grouping operator)

Optional operators on elements:

_e?      Zero or one occurrence of _e
_e*      Zero or more consecutive _es
_e+      One or more consecutive _es
_e{_n}    _n (a decimal number) consecutive _es
_e{_m,_n}  _m through _n consecutive _es

Patterns may be of the form:

_e_1_e_2    Matches the sequence _e_1 _e_2
_e_1|_e_2   Matches either _e_1 or _e_2

lleexx recognizes the standard C escapes:  \nn, \tt, \rr, \bb, \ff, and \_o_o_o (octal
representation).  The special characters

     \ ( ) < > { } % * + ? [ - ] ^ / $ . |

must be prefixed with \ or enclosed within quotation marks (excepting " and
\) to  be normal.  Within  classes, only the characters  . ^ - \  and ] are
special.

_F_i_l_e_s
/uussrr/lliibb/lliibbll.aa
/uussrr/ssrrcc/lliibbll/* -- library source code

_S_e_e _A_l_s_o
ccoommmmaannddss, yyaacccc
_I_n_t_r_o_d_u_c_t_i_o_n _t_o _l_e_x, _t_h_e _L_e_x_i_c_a_l _A_n_a_l_y_z_e_r
