Re: [vox-tech] another C question (lex/yacc for parsing input)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [vox-tech] another C question (lex/yacc for parsing input)
On Mon, Apr 09, 2001 at 12:11:27PM -0700, Peter Jay Salzman wrote:
> i use memmove to do things like:
>
> 1. convert double spaces to single spaces
> 2. get rid of superfluous tokens like "pretty please" and "Computer".
These things are traditionally handled by the lexical scanner...
>
> and then i ship off the resulting string
>
> auto-destruct 520010
>
> to my parser which tokenizes the nouns and executes the relevant code for
> auto destruction.
>
>
> i know there are other ways of doing this (like searching the string for
> certain tokens and building up a list from the start). however, i tend to
> like this method better. because i can more easily see what's going on when my
> parser handles "auto-destuct 520010" rather than (12 52 00 10). there are a
> lot of tokens, and i'd rather work with "auto-destruct" than "12".
LOL! So would I - but that's why I wouldn't use 12, I'd use
AUTO_DESTRUCT, which I would've conveniently #defined to 12 (or
other). Actually, I'd use a number >= 256, so it doesn't conflict
with single-character tokens, for which I'd usually just want to use
the character-code as the token-code.
>
> it occurs to me that this is the sort of thing that's perfect for lex/yacc, but
> i've been dragging my feet because it looks fairly complicated. i think maybe
> i'll take a look at it though.
>
> pete
It's really not too complicated - if I were anywhere near you, I could
come over and teach you in probably an hour. The info pages are very
good, but they could be better, too. I find, amazingly, that a lot of
what I'm learning in the Dragon book (I think I mistakenly said
"Wizard" before - must be playing too much nethack ;) ) is directly
applicable to using bison.
"Computer, pretty please initiate auto-destuct sequence 520010."
Here's a pseudo-code example of a top-down parser, which would be
recursive-descent like what Mark was talking about, were there any
recursion ;). You could use flex or write your own lexical scanner
(flex is easy to learn, though - read the Texinfo manual - if you
still hate info just read the HTML version from www.gnu.org).
Some caveats about the following code - it doesn't use a look-ahead
token, so it is very inflexible about what grammers it can implement.
It is not a good general-purpose solution, but should work for simple
constructions.
/* All of these except for NUM represent actual textual tokens which
are similar to their name. NUM is used for numeric sequences, and
is the only token for which VALUE is set */
#define COMPUTER 257
#define INITIATE 258
#define AUTO_DESTRUCT 259
#define SEQUENCE 260
#define PRETTY 261
#define PLEASE 262
#define NUM 263
int token;
long value;
int main (void)
{
token = get_next_token(); /* This calls the lexical scanner,
undefined here. Should skip whitespace
and punctuation chracters. If it
matches a number, it should put the
value into the global VALUE. */
auto_destruct_command();
/* If we've gotten this far, then we've successfully parsed an
auto-destruct sequence. The destruction sequence code is left in
the global variable VALUE. */
if (is_valid_sequence_code (value))
destroy_ship(); /* I don't define this here. It's purpose should
be obvious ;) */
else
handle_error_and_quit();
return 0; /* Obviously, we'll never get to this either way ;) */
}
void auto_destruct_command (void)
{
match (COMPUTER);
politeness ();
match (INITIATE);
match (AUTO_DESTRUCT);
sequence_construct ();
}
void politeness (void)
{
match_opt (PRETTY);
match_opt (PLEASE);
}
void sequence_construct (void)
{
match (SEQUENCE);
match (NUM);
}
/* Generates error if it can't match a certain token */
void match (int mtoken)
{
if (mtoken != token)
handle_error_and_quit ();
else
token = get_next_token ();
}
/* Reads next token if it matches mtoken, otherwise leaves the token
alone for next match attempt */
void match_opt (int mtoken)
{
if (mtoken == token)
token = get_next_token ();
}
|