Re: Some clarification on "Pre-Parsers" and other bits
- From: "randyhyde@xxxxxxxxxxxxx" <randyhyde@xxxxxxxxxxxxx>
- Date: 23 May 2006 12:56:36 -0700
zcoder wrote:
To ME C is a parser, it parses the C language into assembly text which
is then feed to a assembler.
As best I can tell, the "C" you are talking about is a language. In
theory, there could be a specific product (even named "C", though that
would be confusing) that is a parser for the C language. And what a
product would do (despite what C is to you) is check the syntax of a C
program fed to it.
Now I realize that a lot of people around here are confused by the term
"parser" and the verb "to parse", so it's worthwhile to post the
definition (from dictionary.com):
parse v. parsed, pars·ing, pars·es
v. tr.
1.
To break (a sentence) down into its component parts of speech with an
explanation of the form, function, and syntactical relationship of each
part.
2.
To describe (a word) by stating its part of speech, form, and
syntactical relationships in a sentence.
3.
To examine closely or subject to detailed analysis, especially by
breaking up into components: "What are we missing by parsing the
behavior of chimpanzees into the conventional categories recognized
largely from our own behavior?" (Stephen Jay Gould).
To make sense of; comprehend: I simply couldn't parse what you just
said.
Computer Science. To analyze or separate (input, for example) into more
easily processed components.
BTW, their "computer science" definition is incorrect, though this is a
common misuse of the term by people in CS. What they're describing is
"lexical analysis" (also known as "scanning"), not syntactical analysis
(parsing). But the remainder of the definition is correct.
A typical compiler (and assemblers are just special cases of compilers)
contains many phases. Three of those phases are the following:
1. lexical analysis - separating the stream input into lexical items
(words, numbers, and other atomic symbols in the underlying language).
This is accomplished by program code known as the scanner or lexical
analyzer.
2. syntax analysis - breaking down the stream of lexemes (produced by
the scanner) into syntactical sequences (sentences, or "statements" in
the language). This is accomplished by program code known as the
"parser". *** The job of the parser, therefore, is to determine whether
the input is syntactically correct. ***
3. semantic analysis - taking the syntactically correct statements in
the language and determining their meaning, or using contextual
information (e.g., earlier declarations in the program) to guide the
analysis.
Note that there has been no discussion of code generation in these
phases. Parsing has nothing to do with producing output. All a parser
does is check the input.
The reason people get confused about "parsers" is because many
compilers today use a technique known as "syntax-directed translation"
to produce code. SDT works as follows: the parser determines that a
sequence is syntactically correct and, in parallel, the SDT module
generates something (an abstract syntax tree [AST] or even direct code
emission). Although the SDT module operates in parallel with the
parser, this does not imply that it is part of the parser (indeed, most
compiler modules operate in parallel with the parser; that doesn't make
the lexical analyzer a parser and it doesn't make the code optimizer a
parser, for example).
In earlier days, back when machines had a whole lot less memory, it was
common for the phases of a compiler to operate serially. For example, I
remember using a FORTRAN IV compiler for an IBM 1130 machine with 8K
words of memory (16Kbytes) that had 20 phases. One phase would load in
from disk, it would read input (from cards or from disk), write output
to the disk, then quit and the next phase would load, read its input
from the disk and write its output to the disk. This process repeated
up to the last phase. The exact nature of the output was irrelevant;
the end result is what was important -- FORTRAN IV source code went in,
executable code came out the other end.
Though today's machines have far more memory, there's no reason a
compiler cannot operate the same way. GCC, for example, can read C/C++
source code and write intermediate files to be handled by other modules
(e.g., Gas). Bottom line is that you have C/C++ source code input at
one end, and an executable comes out the other. The same is also true
for HLA.
Bottom line is that the parser is but one phase of a compiler system.
And the way most people use the phrase around here is incorrect.
Hopefully, this brief explanation has straightened some things out. For
more information, feel free to look up the concepts in a book on
compiler construction. P.D. Terry's book discusses this concept in
Chapter 10
(http://webster.cs.ucr.edu/AsmTools/RollYourOwn/CompilerBook/CHAP10.PDF)
Cheers,
Randy Hyde
.
- References:
- Some clarification on "Pre-Parsers" and other bits
- From: greg . johnston
- Re: Some clarification on "Pre-Parsers" and other bits
- From: randyhyde@xxxxxxxxxxxxx
- Re: Some clarification on "Pre-Parsers" and other bits
- From: zcoder
- Some clarification on "Pre-Parsers" and other bits
- Prev by Date: Re: Some clarification on "Pre-Parsers" and other bits
- Next by Date: Re: Use of fwait
- Previous by thread: Re: Some clarification on "Pre-Parsers" and other bits
- Next by thread: Re: Some clarification on "Pre-Parsers" and other bits
- Index(es):
Relevant Pages
|