Re: [vox-tech] another C question (lex/yacc for parsing input)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [vox-tech] another C question (lex/yacc for parsing input)
- Subject: Re: [vox-tech] another C question (lex/yacc for parsing input)
- From: "Mark K. Kim" <mkkimMAPS@ucdavis.edu>
- Date: Sun, 08 Apr 2001 17:39:58 -0700
- References: 20010408155050.A17762@dirac.org
BISON (GNU counterpart of YACC) is an LALR(1) parser. LALR(1) parsers
parse text by reading one token to its right ('R' stands for "right" and
'(1)' stands for "one token read-ahead".) This is good for a relatively
simple language, but it's nearly impossible for a language like C or C++.
Here is an excerpt from "The Design and Evolution of C++" by C++'s
creator, Bjarne Stroustrup (1994, Addison-Wesley Publishing):
3.3.2 Parsing C++
In 1982 when I first planned Cfront [, C++'s predecessor, -mk] I wanted
to use a recursive descent parser because I had experience writing and
maintaining such a beast, because I liked such parsers' ability to
produce good error messages, and because I liked the idea of having the
full power of a general-purpose programming language available when
decisions had to be made in the parser. However, being a conscientious
young computer scientist I asked the experts. Al Aho and Steve Johnson
were in the Computer Science Research Center and they, primarily Steve,
convinced me that writing a parser by hand was most old-fashioned,
would be an inefficient use of my time, would almost certainly result
in a hard-to-understand and hard-to-maintain parser, and would be prone
to unsystematic and therefore unreliable error recovery. The right way
was to use an LALR(1) parser generator, so I used Al and Steve's YACC
[Aho, 1986].
For most projects, it would have been the right choice. For almost
every project writing an experimental language from scratch, it would
have been the right choice. For most people, it would have been the
right choice. In retrospect, for me and C++ it was a bad mistake. C++
was not a new experimental language, it was an almost compatible
superset of C -- and at the time nobody had been able to write an
LALR(1) grammar for C. The LALR(1) grammar used by ANSI C was
constructed by Tom Pennello about a year and a half later -- far too
late to benefit me and C++. Even Steve Johnson's PCC, which was the
preeminent C compiler at the time, cheated at details that were to
prove troublesome to C++ parser writers. For example, PCC didn't
handle redundant parentheses correctly so that "int(x);" wasn't
accepted as a declaration of "X". Worse, it seems that some people
have a natural affinity to some parser strategies and others work much
better with other strategies. My bias towards top-down parsing has
shown itself many times over the years in the form of constructs that
are hard to fit into a YACC grammar. To this day, Cfront has a YACC
parser supplemented by much lexical trickery relying on recursive
descent techniques. On the other hand, it *is* possible to write an
efficient and reasonably nice recursive descent parser for C++.
Several modern C++ compilers use recursive descent.
(Note: I had to modify the text a bit to accomodate the lack of
fonts in e-mails -mk)
So it's kind of up to you and it depends on your language. On the other
hand, using FLEX (a language to describe tokens like strings, identifiers,
etc.) is probably a good idea since they're similar across many languages
(you could probably find some premade ones online, too!). But in either
case, you should be comfortable with regex since both languages use regex
to describe the language you parse. And remember that you don't have all
the luxuries of PERL regex.
-Mark
PS: I personally like recursive descent (writing code directly in, for
example, C, because of the control and my lack of confidence in my ability
to regex an entire language, even if you do have the help of LEX/YACC :)
On Sun, 8 Apr 2001, Peter Jay Salzman wrote:
> this is mostly a question of style. suppose i have a program that does a lot
> of user input processing. i have two options:
>
> 1. roll my own parser
> 2. use lex/yacc (or whatever the modern day equivalent is. bison/flex?)
>
>
> i've done option 1. it works well, but it's not pretty. lots of calls to
> functions like memmove() to chop, slice and dice strings.
>
> but it has the nice feature that i don't need to learn yat (yet another thing),
> and a lot can be said for that. plus, i've already coded it. i'm looking at
> a semi-major rewrite of my parser to enable some more features, and if i'm
> going to change over to lex/yacc, now would be the time to do it.
>
> otoh, lex/yacc looks VERY powerful, but it also looks like it has a steep
> learning curve. i've heard that lex/yacc is pretty slow too. but i have no
> grounding rod to know "how slow is slow".
>
> has anyone here used lex/yacc for a parser? any comments or opinions to help
> me make a choice?
>
> pete
>
> --
> "Coffee... I've conquered the Borg on coffee!" p@dirac.org
> -- Kathryn Janeway on the virtues of coffee www.dirac.org/p
>
---
Mark K. Kim
http://www.cbreak.org/mark/
PGP key available upon request.
|