Misc SHACC notes:


Quick overview of what it is, and why it is:

SHACC stands for Shaggy's Homemade Alternative Compiler Compiler.
It's my alternative to yacc, and it came about as I was working
on parsing C++ for my thesis (developing my CParse classes).
I was originally trying to do everything with yacc, but
I kept having to do more and more special pre-parsing
prior to giving yacc tokens (complex stages in between
the lexer and parser), and I kept having to make my
grammar more and more convoluted to handle all of the
cases correctly and unambiguously.  At first, it seemed
that my ActionStack class would handle most of the hard
cases, allowing ambiguous areas of the parse tree to be
built up and saved, without executing any real actions on
them until something dis-ambiguated them.   But still,
there were other conflicts in the grammar that couldn't
be resolved so easily.  I finally gave up and decided
I was using the wrong tool for the job.  I was already
used to using yacc's .y grammar, and decided there was
nothing wrong with the general style of the yacc language---
it just needed a more powerful parser applied to it,
one that wasn't limited to LALR(1).  So, I started writing
shacc with the idea of making a parser that could take
a completely generic BNF and parse it correctly,
allowing the user to put in rules to disambiguate
parts of the grammar that are still ambiguous.

I accomplished my goal, but the result was much slower
than I had originally planned, since my parser had
to do a lot more work than I had expected.  I had
hoped to keep all information on stacks, but this
turned out not to be possible.  Also, due to all
the data-structure sharing that had to go on,
I had to add a garbage collection scheme to shacc,
since a simple malloc/free was not sufficient.  It
uses reference counting where possible, and 
a mark and sweep algorithm to clean up cyclic
structures.  All of this results in fairly poor
performance, but at least it allows grammars to
be written in a fairly simple form, and it parses
them correctly.  

In the future, I hope to do more of the work in
the shacc program itself instead of in the parser
that gets attached to the result.  Currently, shacc
just turns the grammar into simple tables which
describe it.  In the future, it should do enough
work up-front to be able to generate tables that
represent PDA's (push-down automatas) so that
the work at runtime can be done much faster 
using stacks, without the overhead of a garbage
collection scheme.

--------------------------------------------------------------------------------
 

Well, here it is.  Read the man-page, and use it as you 
would use yacc, except currently without %prec and %union.

Since there's no limit of one-token lookahead, you can
often use something pretty close to general BNF to write
your grammars (no "flattening" needed).   Whereever there
may be ambiguities, use %uniq to find them faster,
and %rprec to eliminate them by giving certain rules a
higer reduction precedence.  Note that reduction precedence
is only applied after a %uniq is hit that results in
an ambiguous grammar, so the two must be used together.

See lib/cparse/CParse.y for a large-scale example 
shacc source file.  Perhaps I should change the extension,
since normal yacc can't read shacc grammars, even though
the reverse can be true.


At some point, I'll write an in-depth tutorial on
using shacc.  Perhaps I'll devote a chapter to this
in my thesis.  But for now, you've got the code
and the man page.  Good luck!


