diff options
Diffstat (limited to 'tools/bison++/bison.info-2')
-rw-r--r-- | tools/bison++/bison.info-2 | 1334 |
1 files changed, 1334 insertions, 0 deletions
diff --git a/tools/bison++/bison.info-2 b/tools/bison++/bison.info-2 new file mode 100644 index 000000000..2e392509a --- /dev/null +++ b/tools/bison++/bison.info-2 @@ -0,0 +1,1334 @@ +This is bison.info, produced by makeinfo version 4.1 from bison.texinfo. + +START-INFO-DIR-ENTRY +* bison: (bison). GNU Project parser generator (yacc replacement). +END-INFO-DIR-ENTRY + + This file documents the Bison parser generator. + + Copyright (C) 1988, 89, 90, 91, 92, 93, 95, 98, 1999 Free Software +Foundation, Inc. + + Permission is granted to make and distribute verbatim copies of this +manual provided the copyright notice and this permission notice are +preserved on all copies. + + Permission is granted to copy and distribute modified versions of +this manual under the conditions for verbatim copying, provided also +that the sections entitled "GNU General Public License" and "Conditions +for Using Bison" are included exactly as in the original, and provided +that the entire resulting derived work is distributed under the terms +of a permission notice identical to this one. + + Permission is granted to copy and distribute translations of this +manual into another language, under the above conditions for modified +versions, except that the sections entitled "GNU General Public +License", "Conditions for Using Bison" and this permission notice may be +included in translations approved by the Free Software Foundation +instead of in the original English. + + +File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Lexer, Prev: Rpcalc Decls, Up: RPN Calc + +Grammar Rules for `rpcalc' +-------------------------- + + Here are the grammar rules for the reverse polish notation +calculator. + + input: /* empty */ + | input line + ; + + line: '\n' + | exp '\n' { printf ("\t%.10g\n", $1); } + ; + + exp: NUM { $$ = $1; } + | exp exp '+' { $$ = $1 + $2; } + | exp exp '-' { $$ = $1 - $2; } + | exp exp '*' { $$ = $1 * $2; } + | exp exp '/' { $$ = $1 / $2; } + /* Exponentiation */ + | exp exp '^' { $$ = pow ($1, $2); } + /* Unary minus */ + | exp 'n' { $$ = -$1; } + ; + %% + + The groupings of the rpcalc "language" defined here are the +expression (given the name `exp'), the line of input (`line'), and the +complete input transcript (`input'). Each of these nonterminal symbols +has several alternate rules, joined by the `|' punctuator which is read +as "or". The following sections explain what these rules mean. + + The semantics of the language is determined by the actions taken +when a grouping is recognized. The actions are the C code that appears +inside braces. *Note Actions::. + + You must specify these actions in C, but Bison provides the means for +passing semantic values between the rules. In each action, the +pseudo-variable `$$' stands for the semantic value for the grouping +that the rule is going to construct. Assigning a value to `$$' is the +main job of most actions. The semantic values of the components of the +rule are referred to as `$1', `$2', and so on. + +* Menu: + +* Rpcalc Input:: +* Rpcalc Line:: +* Rpcalc Expr:: + + +File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Up: Rpcalc Rules + +Explanation of `input' +...................... + + Consider the definition of `input': + + input: /* empty */ + | input line + ; + + This definition reads as follows: "A complete input is either an +empty string, or a complete input followed by an input line". Notice +that "complete input" is defined in terms of itself. This definition +is said to be "left recursive" since `input' appears always as the +leftmost symbol in the sequence. *Note Recursive Rules: Recursion. + + The first alternative is empty because there are no symbols between +the colon and the first `|'; this means that `input' can match an empty +string of input (no tokens). We write the rules this way because it is +legitimate to type `Ctrl-d' right after you start the calculator. It's +conventional to put an empty alternative first and write the comment +`/* empty */' in it. + + The second alternate rule (`input line') handles all nontrivial +input. It means, "After reading any number of lines, read one more +line if possible." The left recursion makes this rule into a loop. +Since the first alternative matches empty input, the loop can be +executed zero or more times. + + The parser function `yyparse' continues to process input until a +grammatical error is seen or the lexical analyzer says there are no more +input tokens; we will arrange for the latter to happen at end of file. + + +File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: Rpcalc Rules + +Explanation of `line' +..................... + + Now consider the definition of `line': + + line: '\n' + | exp '\n' { printf ("\t%.10g\n", $1); } + ; + + The first alternative is a token which is a newline character; this +means that rpcalc accepts a blank line (and ignores it, since there is +no action). The second alternative is an expression followed by a +newline. This is the alternative that makes rpcalc useful. The +semantic value of the `exp' grouping is the value of `$1' because the +`exp' in question is the first symbol in the alternative. The action +prints this value, which is the result of the computation the user +asked for. + + This action is unusual because it does not assign a value to `$$'. +As a consequence, the semantic value associated with the `line' is +uninitialized (its value will be unpredictable). This would be a bug if +that value were ever used, but we don't use it: once rpcalc has printed +the value of the user's input line, that value is no longer needed. + + +File: bison.info, Node: Rpcalc Expr, Prev: Rpcalc Line, Up: Rpcalc Rules + +Explanation of `expr' +..................... + + The `exp' grouping has several rules, one for each kind of +expression. The first rule handles the simplest expressions: those +that are just numbers. The second handles an addition-expression, +which looks like two expressions followed by a plus-sign. The third +handles subtraction, and so on. + + exp: NUM + | exp exp '+' { $$ = $1 + $2; } + | exp exp '-' { $$ = $1 - $2; } + ... + ; + + We have used `|' to join all the rules for `exp', but we could +equally well have written them separately: + + exp: NUM ; + exp: exp exp '+' { $$ = $1 + $2; } ; + exp: exp exp '-' { $$ = $1 - $2; } ; + ... + + Most of the rules have actions that compute the value of the +expression in terms of the value of its parts. For example, in the +rule for addition, `$1' refers to the first component `exp' and `$2' +refers to the second one. The third component, `'+'', has no meaningful +associated semantic value, but if it had one you could refer to it as +`$3'. When `yyparse' recognizes a sum expression using this rule, the +sum of the two subexpressions' values is produced as the value of the +entire expression. *Note Actions::. + + You don't have to give an action for every rule. When a rule has no +action, Bison by default copies the value of `$1' into `$$'. This is +what happens in the first rule (the one that uses `NUM'). + + The formatting shown here is the recommended convention, but Bison +does not require it. You can add or change whitespace as much as you +wish. For example, this: + + exp : NUM | exp exp '+' {$$ = $1 + $2; } | ... + +means the same thing as this: + + exp: NUM + | exp exp '+' { $$ = $1 + $2; } + | ... + +The latter, however, is much more readable. + + +File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Rules, Up: RPN Calc + +The `rpcalc' Lexical Analyzer +----------------------------- + + The lexical analyzer's job is low-level parsing: converting +characters or sequences of characters into tokens. The Bison parser +gets its tokens by calling the lexical analyzer. *Note The Lexical +Analyzer Function `yylex': Lexical. + + Only a simple lexical analyzer is needed for the RPN calculator. +This lexical analyzer skips blanks and tabs, then reads in numbers as +`double' and returns them as `NUM' tokens. Any other character that +isn't part of a number is a separate token. Note that the token-code +for such a single-character token is the character itself. + + The return value of the lexical analyzer function is a numeric code +which represents a token type. The same text used in Bison rules to +stand for this token type is also a C expression for the numeric code +for the type. This works in two ways. If the token type is a +character literal, then its numeric code is the ASCII code for that +character; you can use the same character literal in the lexical +analyzer to express the number. If the token type is an identifier, +that identifier is defined by Bison as a C macro whose definition is +the appropriate number. In this example, therefore, `NUM' becomes a +macro for `yylex' to use. + + The semantic value of the token (if it has one) is stored into the +global variable `yylval', which is where the Bison parser will look for +it. (The C data type of `yylval' is `YYSTYPE', which was defined at +the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc +Decls..) + + A token type code of zero is returned if the end-of-file is +encountered. (Bison recognizes any nonpositive value as indicating the +end of the input.) + + Here is the code for the lexical analyzer: + + /* Lexical analyzer returns a double floating point + number on the stack and the token NUM, or the ASCII + character read if not a number. Skips all blanks + and tabs, returns 0 for EOF. */ + + #include <ctype.h> + + yylex () + { + int c; + + /* skip white space */ + while ((c = getchar ()) == ' ' || c == '\t') + ; + /* process numbers */ + if (c == '.' || isdigit (c)) + { + ungetc (c, stdin); + scanf ("%lf", &yylval); + return NUM; + } + /* return end-of-file */ + if (c == EOF) + return 0; + /* return single chars */ + return c; + } + + +File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc + +The Controlling Function +------------------------ + + In keeping with the spirit of this example, the controlling function +is kept to the bare minimum. The only requirement is that it call +`yyparse' to start the process of parsing. + + main () + { + yyparse (); + } + + +File: bison.info, Node: Rpcalc Error, Next: Rpcalc Gen, Prev: Rpcalc Main, Up: RPN Calc + +The Error Reporting Routine +--------------------------- + + When `yyparse' detects a syntax error, it calls the error reporting +function `yyerror' to print an error message (usually but not always +`"parse error"'). It is up to the programmer to supply `yyerror' +(*note Parser C-Language Interface: Interface.), so here is the +definition we will use: + + #include <stdio.h> + + yyerror (s) /* Called by yyparse on error */ + char *s; + { + printf ("%s\n", s); + } + + After `yyerror' returns, the Bison parser may recover from the error +and continue parsing if the grammar contains a suitable error rule +(*note Error Recovery::). Otherwise, `yyparse' returns nonzero. We +have not written any error rules in this example, so any invalid input +will cause the calculator program to exit. This is not clean behavior +for a real calculator, but it is adequate in the first example. + + +File: bison.info, Node: Rpcalc Gen, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc + +Running Bison to Make the Parser +-------------------------------- + + Before running Bison to produce a parser, we need to decide how to +arrange all the source code in one or more source files. For such a +simple example, the easiest thing is to put everything in one file. +The definitions of `yylex', `yyerror' and `main' go at the end, in the +"additional C code" section of the file (*note The Overall Layout of a +Bison Grammar: Grammar Layout.). + + For a large project, you would probably have several source files, +and use `make' to arrange to recompile them. + + With all the source in a single file, you use the following command +to convert it into a parser file: + + bison FILE_NAME.y + +In this example the file was called `rpcalc.y' (for "Reverse Polish +CALCulator"). Bison produces a file named `FILE_NAME.tab.c', removing +the `.y' from the original file name. The file output by Bison contains +the source code for `yyparse'. The additional functions in the input +file (`yylex', `yyerror' and `main') are copied verbatim to the output. + + +File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Gen, Up: RPN Calc + +Compiling the Parser File +------------------------- + + Here is how to compile and run the parser file: + + # List files in current directory. + % ls + rpcalc.tab.c rpcalc.y + + # Compile the Bison parser. + # `-lm' tells compiler to search math library for `pow'. + % cc rpcalc.tab.c -lm -o rpcalc + + # List files again. + % ls + rpcalc rpcalc.tab.c rpcalc.y + + The file `rpcalc' now contains the executable code. Here is an +example session using `rpcalc'. + + % rpcalc + 4 9 + + 13 + 3 7 + 3 4 5 *+- + -13 + 3 7 + 3 4 5 * + - n Note the unary minus, `n' + 13 + 5 6 / 4 n + + -3.166666667 + 3 4 ^ Exponentiation + 81 + ^D End-of-file indicator + % + + +File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples + +Infix Notation Calculator: `calc' +================================= + + We now modify rpcalc to handle infix operators instead of postfix. +Infix notation involves the concept of operator precedence and the need +for parentheses nested to arbitrary depth. Here is the Bison code for +`calc.y', an infix desk-top calculator. + + /* Infix notation calculator--calc */ + + %{ + #define YYSTYPE double + #include <math.h> + %} + + /* BISON Declarations */ + %token NUM + %left '-' '+' + %left '*' '/' + %left NEG /* negation--unary minus */ + %right '^' /* exponentiation */ + + /* Grammar follows */ + %% + input: /* empty string */ + | input line + ; + + line: '\n' + | exp '\n' { printf ("\t%.10g\n", $1); } + ; + + exp: NUM { $$ = $1; } + | exp '+' exp { $$ = $1 + $3; } + | exp '-' exp { $$ = $1 - $3; } + | exp '*' exp { $$ = $1 * $3; } + | exp '/' exp { $$ = $1 / $3; } + | '-' exp %prec NEG { $$ = -$2; } + | exp '^' exp { $$ = pow ($1, $3); } + | '(' exp ')' { $$ = $2; } + ; + %% + +The functions `yylex', `yyerror' and `main' can be the same as before. + + There are two important new features shown in this code. + + In the second section (Bison declarations), `%left' declares token +types and says they are left-associative operators. The declarations +`%left' and `%right' (right associativity) take the place of `%token' +which is used to declare a token type name without associativity. +(These tokens are single-character literals, which ordinarily don't +need to be declared. We declare them here to specify the +associativity.) + + Operator precedence is determined by the line ordering of the +declarations; the higher the line number of the declaration (lower on +the page or screen), the higher the precedence. Hence, exponentiation +has the highest precedence, unary minus (`NEG') is next, followed by +`*' and `/', and so on. *Note Operator Precedence: Precedence. + + The other important new feature is the `%prec' in the grammar section +for the unary minus operator. The `%prec' simply instructs Bison that +the rule `| '-' exp' has the same precedence as `NEG'--in this case the +next-to-highest. *Note Context-Dependent Precedence: Contextual +Precedence. + + Here is a sample run of `calc.y': + + % calc + 4 + 4.5 - (34/(8*3+-3)) + 6.880952381 + -56 + 2 + -54 + 3 ^ 2 + 9 + + +File: bison.info, Node: Simple Error Recovery, Next: Multi-function Calc, Prev: Infix Calc, Up: Examples + +Simple Error Recovery +===================== + + Up to this point, this manual has not addressed the issue of "error +recovery"--how to continue parsing after the parser detects a syntax +error. All we have handled is error reporting with `yyerror'. Recall +that by default `yyparse' returns after calling `yyerror'. This means +that an erroneous input line causes the calculator program to exit. +Now we show how to rectify this deficiency. + + The Bison language itself includes the reserved word `error', which +may be included in the grammar rules. In the example below it has been +added to one of the alternatives for `line': + + line: '\n' + | exp '\n' { printf ("\t%.10g\n", $1); } + | error '\n' { yyerrok; } + ; + + This addition to the grammar allows for simple error recovery in the +event of a parse error. If an expression that cannot be evaluated is +read, the error will be recognized by the third rule for `line', and +parsing will continue. (The `yyerror' function is still called upon to +print its message as well.) The action executes the statement +`yyerrok', a macro defined automatically by Bison; its meaning is that +error recovery is complete (*note Error Recovery::). Note the +difference between `yyerrok' and `yyerror'; neither one is a misprint. + + This form of error recovery deals with syntax errors. There are +other kinds of errors; for example, division by zero, which raises an +exception signal that is normally fatal. A real calculator program +must handle this signal and use `longjmp' to return to `main' and +resume parsing input lines; it would also have to discard the rest of +the current line of input. We won't discuss this issue further because +it is not specific to Bison programs. + + +File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Simple Error Recovery, Up: Examples + +Multi-Function Calculator: `mfcalc' +=================================== + + Now that the basics of Bison have been discussed, it is time to move +on to a more advanced problem. The above calculators provided only five +functions, `+', `-', `*', `/' and `^'. It would be nice to have a +calculator that provides other mathematical functions such as `sin', +`cos', etc. + + It is easy to add new operators to the infix calculator as long as +they are only single-character literals. The lexical analyzer `yylex' +passes back all non-number characters as tokens, so new grammar rules +suffice for adding a new operator. But we want something more +flexible: built-in functions whose syntax has this form: + + FUNCTION_NAME (ARGUMENT) + +At the same time, we will add memory to the calculator, by allowing you +to create named variables, store values in them, and use them later. +Here is a sample session with the multi-function calculator: + + % mfcalc + pi = 3.141592653589 + 3.1415926536 + sin(pi) + 0.0000000000 + alpha = beta1 = 2.3 + 2.3000000000 + alpha + 2.3000000000 + ln(alpha) + 0.8329091229 + exp(ln(beta1)) + 2.3000000000 + % + + Note that multiple assignment and nested function calls are +permitted. + +* Menu: + +* Decl: Mfcalc Decl. Bison declarations for multi-function calculator. +* Rules: Mfcalc Rules. Grammar rules for the calculator. +* Symtab: Mfcalc Symtab. Symbol table management subroutines. + + +File: bison.info, Node: Mfcalc Decl, Next: Mfcalc Rules, Up: Multi-function Calc + +Declarations for `mfcalc' +------------------------- + + Here are the C and Bison declarations for the multi-function +calculator. + + %{ + #include <math.h> /* For math functions, cos(), sin(), etc. */ + #include "calc.h" /* Contains definition of `symrec' */ + %} + %union { + double val; /* For returning numbers. */ + symrec *tptr; /* For returning symbol-table pointers */ + } + + %token <val> NUM /* Simple double precision number */ + %token <tptr> VAR FNCT /* Variable and Function */ + %type <val> exp + + %right '=' + %left '-' '+' + %left '*' '/' + %left NEG /* Negation--unary minus */ + %right '^' /* Exponentiation */ + + /* Grammar follows */ + + %% + + The above grammar introduces only two new features of the Bison +language. These features allow semantic values to have various data +types (*note More Than One Value Type: Multiple Types.). + + The `%union' declaration specifies the entire list of possible types; +this is instead of defining `YYSTYPE'. The allowable types are now +double-floats (for `exp' and `NUM') and pointers to entries in the +symbol table. *Note The Collection of Value Types: Union Decl. + + Since values can now have various types, it is necessary to +associate a type with each grammar symbol whose semantic value is used. +These symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations +are augmented with information about their data type (placed between +angle brackets). + + The Bison construct `%type' is used for declaring nonterminal +symbols, just as `%token' is used for declaring token types. We have +not used `%type' before because nonterminal symbols are normally +declared implicitly by the rules that define them. But `exp' must be +declared explicitly so we can specify its value type. *Note +Nonterminal Symbols: Type Decl. + + +File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symtab, Prev: Mfcalc Decl, Up: Multi-function Calc + +Grammar Rules for `mfcalc' +-------------------------- + + Here are the grammar rules for the multi-function calculator. Most +of them are copied directly from `calc'; three rules, those which +mention `VAR' or `FNCT', are new. + + input: /* empty */ + | input line + ; + + line: + '\n' + | exp '\n' { printf ("\t%.10g\n", $1); } + | error '\n' { yyerrok; } + ; + + exp: NUM { $$ = $1; } + | VAR { $$ = $1->value.var; } + | VAR '=' exp { $$ = $3; $1->value.var = $3; } + | FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); } + | exp '+' exp { $$ = $1 + $3; } + | exp '-' exp { $$ = $1 - $3; } + | exp '*' exp { $$ = $1 * $3; } + | exp '/' exp { $$ = $1 / $3; } + | '-' exp %prec NEG { $$ = -$2; } + | exp '^' exp { $$ = pow ($1, $3); } + | '(' exp ')' { $$ = $2; } + ; + /* End of grammar */ + %% + + +File: bison.info, Node: Mfcalc Symtab, Prev: Mfcalc Rules, Up: Multi-function Calc + +The `mfcalc' Symbol Table +------------------------- + + The multi-function calculator requires a symbol table to keep track +of the names and meanings of variables and functions. This doesn't +affect the grammar rules (except for the actions) or the Bison +declarations, but it requires some additional C functions for support. + + The symbol table itself consists of a linked list of records. Its +definition, which is kept in the header `calc.h', is as follows. It +provides for either functions or variables to be placed in the table. + + /* Data type for links in the chain of symbols. */ + struct symrec + { + char *name; /* name of symbol */ + int type; /* type of symbol: either VAR or FNCT */ + union { + double var; /* value of a VAR */ + double (*fnctptr)(); /* value of a FNCT */ + } value; + struct symrec *next; /* link field */ + }; + + typedef struct symrec symrec; + + /* The symbol table: a chain of `struct symrec'. */ + extern symrec *sym_table; + + symrec *putsym (); + symrec *getsym (); + + The new version of `main' includes a call to `init_table', a +function that initializes the symbol table. Here it is, and +`init_table' as well: + + #include <stdio.h> + + main () + { + init_table (); + yyparse (); + } + + yyerror (s) /* Called by yyparse on error */ + char *s; + { + printf ("%s\n", s); + } + + struct init + { + char *fname; + double (*fnct)(); + }; + + struct init arith_fncts[] + = { + "sin", sin, + "cos", cos, + "atan", atan, + "ln", log, + "exp", exp, + "sqrt", sqrt, + 0, 0 + }; + + /* The symbol table: a chain of `struct symrec'. */ + symrec *sym_table = (symrec *)0; + + init_table () /* puts arithmetic functions in table. */ + { + int i; + symrec *ptr; + for (i = 0; arith_fncts[i].fname != 0; i++) + { + ptr = putsym (arith_fncts[i].fname, FNCT); + ptr->value.fnctptr = arith_fncts[i].fnct; + } + } + + By simply editing the initialization list and adding the necessary +include files, you can add additional functions to the calculator. + + Two important functions allow look-up and installation of symbols in +the symbol table. The function `putsym' is passed a name and the type +(`VAR' or `FNCT') of the object to be installed. The object is linked +to the front of the list, and a pointer to the object is returned. The +function `getsym' is passed the name of the symbol to look up. If +found, a pointer to that symbol is returned; otherwise zero is returned. + + symrec * + putsym (sym_name,sym_type) + char *sym_name; + int sym_type; + { + symrec *ptr; + ptr = (symrec *) malloc (sizeof (symrec)); + ptr->name = (char *) malloc (strlen (sym_name) + 1); + strcpy (ptr->name,sym_name); + ptr->type = sym_type; + ptr->value.var = 0; /* set value to 0 even if fctn. */ + ptr->next = (struct symrec *)sym_table; + sym_table = ptr; + return ptr; + } + + symrec * + getsym (sym_name) + char *sym_name; + { + symrec *ptr; + for (ptr = sym_table; ptr != (symrec *) 0; + ptr = (symrec *)ptr->next) + if (strcmp (ptr->name,sym_name) == 0) + return ptr; + return 0; + } + + The function `yylex' must now recognize variables, numeric values, +and the single-character arithmetic operators. Strings of alphanumeric +characters with a leading nondigit are recognized as either variables or +functions depending on what the symbol table says about them. + + The string is passed to `getsym' for look up in the symbol table. If +the name appears in the table, a pointer to its location and its type +(`VAR' or `FNCT') is returned to `yyparse'. If it is not already in +the table, then it is installed as a `VAR' using `putsym'. Again, a +pointer and its type (which must be `VAR') is returned to `yyparse'. + + No change is needed in the handling of numeric values and arithmetic +operators in `yylex'. + + #include <ctype.h> + yylex () + { + int c; + + /* Ignore whitespace, get first nonwhite character. */ + while ((c = getchar ()) == ' ' || c == '\t'); + + if (c == EOF) + return 0; + + /* Char starts a number => parse the number. */ + if (c == '.' || isdigit (c)) + { + ungetc (c, stdin); + scanf ("%lf", &yylval.val); + return NUM; + } + + /* Char starts an identifier => read the name. */ + if (isalpha (c)) + { + symrec *s; + static char *symbuf = 0; + static int length = 0; + int i; + + /* Initially make the buffer long enough + for a 40-character symbol name. */ + if (length == 0) + length = 40, symbuf = (char *)malloc (length + 1); + + i = 0; + do + { + /* If buffer is full, make it bigger. */ + if (i == length) + { + length *= 2; + symbuf = (char *)realloc (symbuf, length + 1); + } + /* Add this character to the buffer. */ + symbuf[i++] = c; + /* Get another character. */ + c = getchar (); + } + while (c != EOF && isalnum (c)); + + ungetc (c, stdin); + symbuf[i] = '\0'; + + s = getsym (symbuf); + if (s == 0) + s = putsym (symbuf, VAR); + yylval.tptr = s; + return s->type; + } + + /* Any other character is a token by itself. */ + return c; + } + + This program is both powerful and flexible. You may easily add new +functions, and it is a simple job to modify this code to install +predefined variables such as `pi' or `e' as well. + + +File: bison.info, Node: Exercises, Prev: Multi-function Calc, Up: Examples + +Exercises +========= + + 1. Add some new functions from `math.h' to the initialization list. + + 2. Add another array that contains constants and their values. Then + modify `init_table' to add these constants to the symbol table. + It will be easiest to give the constants type `VAR'. + + 3. Make the program report an error if the user refers to an + uninitialized variable in any way except to store a value in it. + + +File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top + +Bison Grammar Files +******************* + + Bison takes as input a context-free grammar specification and +produces a C-language function that recognizes correct instances of the +grammar. + + The Bison grammar input file conventionally has a name ending in +`.y'. + +* Menu: + +* Grammar Outline:: Overall layout of the grammar file. +* Symbols:: Terminal and nonterminal symbols. +* Rules:: How to write grammar rules. +* Recursion:: Writing recursive rules. +* Semantics:: Semantic values and actions. +* Declarations:: All kinds of Bison declarations are described here. +* Multiple Parsers:: Putting more than one Bison parser in one program. + + +File: bison.info, Node: Grammar Outline, Next: Symbols, Up: Grammar File + +Outline of a Bison Grammar +========================== + + A Bison grammar file has four main sections, shown here with the +appropriate delimiters: + + %{ + C DECLARATIONS + %} + + BISON DECLARATIONS + + %% + GRAMMAR RULES + %% + + ADDITIONAL C CODE + + Comments enclosed in `/* ... */' may appear in any of the sections. + +* Menu: + +* C Declarations:: Syntax and usage of the C declarations section. +* Bison Declarations:: Syntax and usage of the Bison declarations section. +* Grammar Rules:: Syntax and usage of the grammar rules section. +* C Code:: Syntax and usage of the additional C code section. + + +File: bison.info, Node: C Declarations, Next: Bison Declarations, Up: Grammar Outline + +The C Declarations Section +-------------------------- + + The C DECLARATIONS section contains macro definitions and +declarations of functions and variables that are used in the actions in +the grammar rules. These are copied to the beginning of the parser +file so that they precede the definition of `yyparse'. You can use +`#include' to get the declarations from a header file. If you don't +need any C declarations, you may omit the `%{' and `%}' delimiters that +bracket this section. + + +File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: C Declarations, Up: Grammar Outline + +The Bison Declarations Section +------------------------------ + + The BISON DECLARATIONS section contains declarations that define +terminal and nonterminal symbols, specify precedence, and so on. In +some simple grammars you may not need any declarations. *Note Bison +Declarations: Declarations. + + +File: bison.info, Node: Grammar Rules, Next: C Code, Prev: Bison Declarations, Up: Grammar Outline + +The Grammar Rules Section +------------------------- + + The "grammar rules" section contains one or more Bison grammar +rules, and nothing else. *Note Syntax of Grammar Rules: Rules. + + There must always be at least one grammar rule, and the first `%%' +(which precedes the grammar rules) may never be omitted even if it is +the first thing in the file. + + +File: bison.info, Node: C Code, Prev: Grammar Rules, Up: Grammar Outline + +The Additional C Code Section +----------------------------- + + The ADDITIONAL C CODE section is copied verbatim to the end of the +parser file, just as the C DECLARATIONS section is copied to the +beginning. This is the most convenient place to put anything that you +want to have in the parser file but which need not come before the +definition of `yyparse'. For example, the definitions of `yylex' and +`yyerror' often go here. *Note Parser C-Language Interface: Interface. + + If the last section is empty, you may omit the `%%' that separates it +from the grammar rules. + + The Bison parser itself contains many static variables whose names +start with `yy' and many macros whose names start with `YY'. It is a +good idea to avoid using any such names (except those documented in this +manual) in the additional C code section of the grammar file. + + +File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File + +Symbols, Terminal and Nonterminal +================================= + + "Symbols" in Bison grammars represent the grammatical classifications +of the language. + + A "terminal symbol" (also known as a "token type") represents a +class of syntactically equivalent tokens. You use the symbol in grammar +rules to mean that a token in that class is allowed. The symbol is +represented in the Bison parser by a numeric code, and the `yylex' +function returns a token type code to indicate what kind of token has +been read. You don't need to know what the code value is; you can use +the symbol to stand for it. + + A "nonterminal symbol" stands for a class of syntactically equivalent +groupings. The symbol name is used in writing grammar rules. By +convention, it should be all lower case. + + Symbol names can contain letters, digits (not at the beginning), +underscores and periods. Periods make sense only in nonterminals. + + There are three ways of writing terminal symbols in the grammar: + + * A "named token type" is written with an identifier, like an + identifier in C. By convention, it should be all upper case. Each + such name must be defined with a Bison declaration such as + `%token'. *Note Token Type Names: Token Decl. + + * A "character token type" (or "literal character token") is written + in the grammar using the same syntax used in C for character + constants; for example, `'+'' is a character token type. A + character token type doesn't need to be declared unless you need to + specify its semantic value data type (*note Data Types of Semantic + Values: Value Type.), associativity, or precedence (*note Operator + Precedence: Precedence.). + + By convention, a character token type is used only to represent a + token that consists of that particular character. Thus, the token + type `'+'' is used to represent the character `+' as a token. + Nothing enforces this convention, but if you depart from it, your + program will confuse other readers. + + All the usual escape sequences used in character literals in C can + be used in Bison as well, but you must not use the null character + as a character literal because its ASCII code, zero, is the code + `yylex' returns for end-of-input (*note Calling Convention for + `yylex': Calling Convention.). + + * A "literal string token" is written like a C string constant; for + example, `"<="' is a literal string token. A literal string token + doesn't need to be declared unless you need to specify its semantic + value data type (*note Value Type::), associativity, precedence + (*note Precedence::). + + You can associate the literal string token with a symbolic name as + an alias, using the `%token' declaration (*note Token + Declarations: Token Decl.). If you don't do that, the lexical + analyzer has to retrieve the token number for the literal string + token from the `yytname' table (*note Calling Convention::). + + *WARNING*: literal string tokens do not work in Yacc. + + By convention, a literal string token is used only to represent a + token that consists of that particular string. Thus, you should + use the token type `"<="' to represent the string `<=' as a token. + Bison does not enforces this convention, but if you depart from + it, people who read your program will be confused. + + All the escape sequences used in string literals in C can be used + in Bison as well. A literal string token must contain two or more + characters; for a token containing just one character, use a + character token (see above). + + How you choose to write a terminal symbol has no effect on its +grammatical meaning. That depends only on where it appears in rules and +on when the parser function returns that symbol. + + The value returned by `yylex' is always one of the terminal symbols +(or 0 for end-of-input). Whichever way you write the token type in the +grammar rules, you write it the same way in the definition of `yylex'. +The numeric code for a character token type is simply the ASCII code for +the character, so `yylex' can use the identical character constant to +generate the requisite code. Each named token type becomes a C macro in +the parser file, so `yylex' can use the name to stand for the code. +(This is why periods don't make sense in terminal symbols.) *Note +Calling Convention for `yylex': Calling Convention. + + If `yylex' is defined in a separate file, you need to arrange for the +token-type macro definitions to be available there. Use the `-d' +option when you run Bison, so that it will write these macro definitions +into a separate header file `NAME.tab.h' which you can include in the +other source files that need it. *Note Invoking Bison: Invocation. + + The symbol `error' is a terminal symbol reserved for error recovery +(*note Error Recovery::); you shouldn't use it for any other purpose. +In particular, `yylex' should never return this value. + + +File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File + +Syntax of Grammar Rules +======================= + + A Bison grammar rule has the following general form: + + RESULT: COMPONENTS... + ; + +where RESULT is the nonterminal symbol that this rule describes and +COMPONENTS are various terminal and nonterminal symbols that are put +together by this rule (*note Symbols::). + + For example, + + exp: exp '+' exp + ; + +says that two groupings of type `exp', with a `+' token in between, can +be combined into a larger grouping of type `exp'. + + Whitespace in rules is significant only to separate symbols. You +can add extra whitespace as you wish. + + Scattered among the components can be ACTIONS that determine the +semantics of the rule. An action looks like this: + + {C STATEMENTS} + +Usually there is only one action and it follows the components. *Note +Actions::. + + Multiple rules for the same RESULT can be written separately or can +be joined with the vertical-bar character `|' as follows: + + RESULT: RULE1-COMPONENTS... + | RULE2-COMPONENTS... + ... + ; + +They are still considered distinct rules even when joined in this way. + + If COMPONENTS in a rule is empty, it means that RESULT can match the +empty string. For example, here is how to define a comma-separated +sequence of zero or more `exp' groupings: + + expseq: /* empty */ + | expseq1 + ; + + expseq1: exp + | expseq1 ',' exp + ; + +It is customary to write a comment `/* empty */' in each rule with no +components. + + +File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File + +Recursive Rules +=============== + + A rule is called "recursive" when its RESULT nonterminal appears +also on its right hand side. Nearly all Bison grammars need to use +recursion, because that is the only way to define a sequence of any +number of somethings. Consider this recursive definition of a +comma-separated sequence of one or more expressions: + + expseq1: exp + | expseq1 ',' exp + ; + +Since the recursive use of `expseq1' is the leftmost symbol in the +right hand side, we call this "left recursion". By contrast, here the +same construct is defined using "right recursion": + + expseq1: exp + | exp ',' expseq1 + ; + +Any kind of sequence can be defined using either left recursion or +right recursion, but you should always use left recursion, because it +can parse a sequence of any number of elements with bounded stack +space. Right recursion uses up space on the Bison stack in proportion +to the number of elements in the sequence, because all the elements +must be shifted onto the stack before the rule can be applied even +once. *Note The Bison Parser Algorithm: Algorithm, for further +explanation of this. + + "Indirect" or "mutual" recursion occurs when the result of the rule +does not appear directly on its right hand side, but does appear in +rules for other nonterminals which do appear on its right hand side. + + For example: + + expr: primary + | primary '+' primary + ; + + primary: constant + | '(' expr ')' + ; + +defines two mutually-recursive nonterminals, since each refers to the +other. + + +File: bison.info, Node: Semantics, Next: Declarations, Prev: Recursion, Up: Grammar File + +Defining Language Semantics +=========================== + + The grammar rules for a language determine only the syntax. The +semantics are determined by the semantic values associated with various +tokens and groupings, and by the actions taken when various groupings +are recognized. + + For example, the calculator calculates properly because the value +associated with each expression is the proper number; it adds properly +because the action for the grouping `X + Y' is to add the numbers +associated with X and Y. + +* Menu: + +* Value Type:: Specifying one data type for all semantic values. +* Multiple Types:: Specifying several alternative data types. +* Actions:: An action is the semantic definition of a grammar rule. +* Action Types:: Specifying data types for actions to operate on. +* Mid-Rule Actions:: Most actions go at the end of a rule. + This says when, why and how to use the exceptional + action in the middle of a rule. + + +File: bison.info, Node: Value Type, Next: Multiple Types, Up: Semantics + +Data Types of Semantic Values +----------------------------- + + In a simple program it may be sufficient to use the same data type +for the semantic values of all language constructs. This was true in +the RPN and infix calculator examples (*note Reverse Polish Notation +Calculator: RPN Calc.). + + Bison's default is to use type `int' for all semantic values. To +specify some other type, define `YYSTYPE' as a macro, like this: + + #define YYSTYPE double + +This macro definition must go in the C declarations section of the +grammar file (*note Outline of a Bison Grammar: Grammar Outline.). + + +File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics + +More Than One Value Type +------------------------ + + In most programs, you will need different data types for different +kinds of tokens and groupings. For example, a numeric constant may +need type `int' or `long', while a string constant needs type `char *', +and an identifier might need a pointer to an entry in the symbol table. + + To use more than one data type for semantic values in one parser, +Bison requires you to do two things: + + * Specify the entire collection of possible data types, with the + `%union' Bison declaration (*note The Collection of Value Types: + Union Decl.). + + * Choose one of those types for each symbol (terminal or nonterminal) + for which semantic values are used. This is done for tokens with + the `%token' Bison declaration (*note Token Type Names: Token + Decl.) and for groupings with the `%type' Bison declaration (*note + Nonterminal Symbols: Type Decl.). + + +File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics + +Actions +------- + + An action accompanies a syntactic rule and contains C code to be +executed each time an instance of that rule is recognized. The task of +most actions is to compute a semantic value for the grouping built by +the rule from the semantic values associated with tokens or smaller +groupings. + + An action consists of C statements surrounded by braces, much like a +compound statement in C. It can be placed at any position in the rule; +it is executed at that position. Most rules have just one action at +the end of the rule, following all the components. Actions in the +middle of a rule are tricky and used only for special purposes (*note +Actions in Mid-Rule: Mid-Rule Actions.). + + The C code in an action can refer to the semantic values of the +components matched by the rule with the construct `$N', which stands for +the value of the Nth component. The semantic value for the grouping +being constructed is `$$'. (Bison translates both of these constructs +into array element references when it copies the actions into the parser +file.) + + Here is a typical example: + + exp: ... + | exp '+' exp + { $$ = $1 + $3; } + +This rule constructs an `exp' from two smaller `exp' groupings +connected by a plus-sign token. In the action, `$1' and `$3' refer to +the semantic values of the two component `exp' groupings, which are the +first and third symbols on the right hand side of the rule. The sum is +stored into `$$' so that it becomes the semantic value of the +addition-expression just recognized by the rule. If there were a +useful semantic value associated with the `+' token, it could be +referred to as `$2'. + + If you don't specify an action for a rule, Bison supplies a default: +`$$ = $1'. Thus, the value of the first symbol in the rule becomes the +value of the whole rule. Of course, the default rule is valid only if +the two data types match. There is no meaningful default action for an +empty rule; every empty rule must have an explicit action unless the +rule's value does not matter. + + `$N' with N zero or negative is allowed for reference to tokens and +groupings on the stack _before_ those that match the current rule. +This is a very risky practice, and to use it reliably you must be +certain of the context in which the rule is applied. Here is a case in +which you can use this reliably: + + foo: expr bar '+' expr { ... } + | expr bar '-' expr { ... } + ; + + bar: /* empty */ + { previous_expr = $0; } + ; + + As long as `bar' is used only in the fashion shown here, `$0' always +refers to the `expr' which precedes `bar' in the definition of `foo'. + + +File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics + +Data Types of Values in Actions +------------------------------- + + If you have chosen a single data type for semantic values, the `$$' +and `$N' constructs always have that data type. + + If you have used `%union' to specify a variety of data types, then +you must declare a choice among these types for each terminal or +nonterminal symbol that can have a semantic value. Then each time you +use `$$' or `$N', its data type is determined by which symbol it refers +to in the rule. In this example, + + exp: ... + | exp '+' exp + { $$ = $1 + $3; } + +`$1' and `$3' refer to instances of `exp', so they all have the data +type declared for the nonterminal symbol `exp'. If `$2' were used, it +would have the data type declared for the terminal symbol `'+'', +whatever that might be. + + Alternatively, you can specify the data type when you refer to the +value, by inserting `<TYPE>' after the `$' at the beginning of the +reference. For example, if you have defined types as shown here: + + %union { + int itype; + double dtype; + } + +then you can write `$<itype>1' to refer to the first subunit of the +rule as an integer, or `$<dtype>1' to refer to it as a double. + |