Re: Can this be restructured ?
- From: "Rob Thorpe" <rthorpe@xxxxxxxxxxxxxxxxx>
- Date: 30 Jan 2007 11:23:50 -0800
The structure you describe is very common for lexical scanners.
Anyone who has seen a lexical scanner interface will recognise the
principles straight away.
Generally the split is done like this:-
* Scanner separates characters into tokens, then finds out what they
are, then returns struct containing tokens parsed
* Looks conditionally at return from scanner and responds
appropriately.
This can be split up more. The scanner can be split into two:-
* "Primitive-scanner" simply finds the tokens, it simply returns a
string and some indicator of the token type (called "class" in your
scanner)
* The "scanner" looks at the string and they type, and creates the
required objects, floats, ints, symbols etc
This makes things clearer within the scanner code.
Also the data structures can be made clearer. In particular the
scanner returns the type of an object, and the object. These can be
separated.
debug_info = RECORD
nextCh: CHAR; (* Character immediately following the last
symbol...*)
line: INTEGER; (* # carriage returns scanned so far. *)
class: INTEGER; (* Scan result: Int, Real, String etc. *)
END;
prog_obj = RECORD
i: LONGINT;
x: REAL;
y: LONGREAL;
c: CHAR;
len: SHORTINT; (* Length of name or string scanned. *)
s: ARRAY 64 OF CHAR;
obj: Objects.Object
END;
Scanner = RECORD ( Reader ) (* Scanner for symbol streams. *)
debug_info: DEBUG_INFO; (* Info for emitting syntax errors *)
class: INTEGER; (* Scan result: Int, Real, String etc *)
prog_obj: PROG_OBJ; (* The returned entity *)
END;
The debug_info and class are filled in by the primitive_scanner
function and the prog_obj by scanner function which uses the
primitive_scanner.
Also, the improvement suggested by H.S.Lahmann can be introduced.
That is, you have some function, lets call it "super_scanner", which
takes functions as arguments. This function would need to take an
argument for each possible type of token. So you would call it:-
super_scanner (int_proc, float_proc, string_proc, symbol_proc, ....);
Structuring the code this way would be OK if the individual functions
were often similar. In parsers though this is not often the case,
mostly they are slightly different. Meaning the programmer has to
write many small functions, this causes code repetition itself. So
the above trick isn't used much AFAIK.
In languages that have reasonable macros or lambda expressions the
above super_scanner is easier though, and might be useful.
.
- Prev by Date: Re: Bron-Kerbosh algorithm for clique detection
- Next by Date: Re: help with automatic code generation
- Previous by thread: Bron-Kerbosh algorithm for clique detection
- Next by thread: Re: Webpage redirect solved
- Index(es):
Relevant Pages
|