l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2001 Dec 30 17:04

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] another C question (lex/yacc for parsing input)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] another C question (lex/yacc for parsing input)

  • Subject: Re: [vox-tech] another C question (lex/yacc for parsing input)
  • From: "Mark K. Kim" <mkkimMAPS@ucdavis.edu>
  • Date: Sun, 08 Apr 2001 17:39:58 -0700
  • References: 20010408155050.A17762@dirac.org

BISON (GNU counterpart of YACC) is an LALR(1) parser.  LALR(1) parsers
parse text by reading one token to its right ('R' stands for "right" and
'(1)' stands for "one token read-ahead".)  This is good for a relatively
simple language, but it's nearly impossible for a language like C or C++.
Here is an excerpt from "The Design and Evolution of C++" by C++'s
creator, Bjarne Stroustrup (1994, Addison-Wesley Publishing):

   3.3.2 Parsing C++

   In 1982 when I first planned Cfront [, C++'s predecessor, -mk] I wanted
   to use a recursive descent parser because I had experience writing and
   maintaining such a beast, because I liked such parsers' ability to
   produce good error messages, and because I liked the idea of having the
   full power of a general-purpose programming language available when
   decisions had to be made in the parser.  However, being a conscientious
   young computer scientist I asked the experts.  Al Aho and Steve Johnson
   were in the Computer Science Research Center and they, primarily Steve,
   convinced me that writing a parser by hand was most old-fashioned,
   would be an inefficient use of my time, would almost certainly result
   in a hard-to-understand and hard-to-maintain parser, and would be prone
   to unsystematic and therefore unreliable error recovery.  The right way
   was to use an LALR(1) parser generator, so I used Al and Steve's YACC
   [Aho, 1986].

   For most projects, it would have been the right choice.  For almost
   every project writing an experimental language from scratch, it would
   have been the right choice.  For most people, it would have been the
   right choice.  In retrospect, for me and C++ it was a bad mistake.  C++
   was not a new experimental language, it was an almost compatible
   superset of C -- and at the time nobody had been able to write an
   LALR(1) grammar for C.  The LALR(1) grammar used by ANSI C was
   constructed by Tom Pennello about a year and a half later -- far too
   late to benefit me and C++.  Even Steve Johnson's PCC, which was the
   preeminent C compiler at the time, cheated at details that were to
   prove troublesome to C++ parser writers.  For example, PCC didn't
   handle redundant parentheses correctly so that "int(x);" wasn't
   accepted as a declaration of "X".  Worse, it seems that some people
   have a natural affinity to some parser strategies and others work much
   better with other strategies.  My bias towards top-down parsing has
   shown itself many times over the years in the form of constructs that
   are hard to fit into a YACC grammar.  To this day, Cfront has a YACC
   parser supplemented by much lexical trickery relying on recursive
   descent techniques.  On the other hand, it *is* possible to write an
   efficient and reasonably nice recursive descent parser for C++.
   Several modern C++ compilers use recursive descent.

   (Note: I had to modify the text a bit to accomodate the lack of
   fonts in e-mails  -mk)

So it's kind of up to you and it depends on your language.  On the other
hand, using FLEX (a language to describe tokens like strings, identifiers,
etc.) is probably a good idea since they're similar across many languages
(you could probably find some premade ones online, too!).  But in either
case, you should be comfortable with regex since both languages use regex
to describe the language you parse.  And remember that you don't have all
the luxuries of PERL regex.


PS: I personally like recursive descent (writing code directly in, for
example, C, because of the control and my lack of confidence in my ability
to regex an entire language, even if you do have the help of LEX/YACC :)

On Sun, 8 Apr 2001, Peter Jay Salzman wrote:

> this is mostly a question of style.  suppose i have a program that does a lot
> of user input processing.  i have two options:
> 1. roll my own parser
> 2. use lex/yacc (or whatever the modern day equivalent is.  bison/flex?)
> i've done option 1.  it works well, but it's not pretty.  lots of calls to
> functions like memmove() to chop, slice and dice strings.
> but it has the nice feature that i don't need to learn yat (yet another thing),
> and a lot can be said for that.  plus, i've already coded it.   i'm looking at
> a semi-major rewrite of my parser to enable some more features, and if i'm
> going to change over to lex/yacc, now would be the time to do it.
> otoh, lex/yacc looks VERY powerful, but it also looks like it has a steep
> learning curve.  i've heard that lex/yacc is pretty slow too.  but i have no
> grounding rod to know "how slow is slow".
> has anyone here used lex/yacc for a parser?   any comments or opinions to help
> me make a choice?
> pete
> --
> "Coffee... I've conquered the Borg on coffee!"               p@dirac.org
>        -- Kathryn Janeway on the virtues of coffee           www.dirac.org/p

Mark K. Kim
PGP key available upon request.

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
O'Reilly and Associates
For numerous book donations.