Re: Can this be restructured ?



The structure you describe is very common for lexical scanners.
Anyone who has seen a lexical scanner interface will recognise the
principles straight away.

Generally the split is done like this:-
* Scanner separates characters into tokens, then finds out what they
are, then returns struct containing tokens parsed
* Looks conditionally at return from scanner and responds
appropriately.

This can be split up more. The scanner can be split into two:-
* "Primitive-scanner" simply finds the tokens, it simply returns a
string and some indicator of the token type (called "class" in your
scanner)
* The "scanner" looks at the string and they type, and creates the
required objects, floats, ints, symbols etc
This makes things clearer within the scanner code.

Also the data structures can be made clearer. In particular the
scanner returns the type of an object, and the object. These can be
separated.

debug_info = RECORD
nextCh: CHAR; (* Character immediately following the last
symbol...*)
line: INTEGER; (* # carriage returns scanned so far. *)
class: INTEGER; (* Scan result: Int, Real, String etc. *)
END;

prog_obj = RECORD
i: LONGINT;
x: REAL;
y: LONGREAL;
c: CHAR;
len: SHORTINT; (* Length of name or string scanned. *)
s: ARRAY 64 OF CHAR;
obj: Objects.Object
END;

Scanner = RECORD ( Reader ) (* Scanner for symbol streams. *)
debug_info: DEBUG_INFO; (* Info for emitting syntax errors *)
class: INTEGER; (* Scan result: Int, Real, String etc *)
prog_obj: PROG_OBJ; (* The returned entity *)
END;

The debug_info and class are filled in by the primitive_scanner
function and the prog_obj by scanner function which uses the
primitive_scanner.

Also, the improvement suggested by H.S.Lahmann can be introduced.
That is, you have some function, lets call it "super_scanner", which
takes functions as arguments. This function would need to take an
argument for each possible type of token. So you would call it:-
super_scanner (int_proc, float_proc, string_proc, symbol_proc, ....);
Structuring the code this way would be OK if the individual functions
were often similar. In parsers though this is not often the case,
mostly they are slightly different. Meaning the programmer has to
write many small functions, this causes code repetition itself. So
the above trick isn't used much AFAIK.

In languages that have reasonable macros or lambda expressions the
above super_scanner is easier though, and might be useful.

.



Relevant Pages

  • Re: a |=b, a | b and a || b (why not a ||= b?)
    ... look-ahead is the lexical scanner. ... the term lexical scanner interchangably with lexical analyzer (which is ... 'language confusion' in my message. ... So the parser works with tokens. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Passing block to Proc#call
    ... tokens with various tags, and I'm matching based on the tags. ... then increment the position index by one). ... # this scanner matches, have the tokens processed ...
    (comp.lang.ruby)
  • Re: parser
    ... with a lexer that would just scan one-character tokens. ... (let ((next-char (read-char stream nil nil))) ... (defmethod advance ((self scanner)) ... (defun expression (scanner) ...
    (comp.programming)
  • Re: parser
    ... are they designed is my question; in a tree design. ... with a lexer that would just scan one-character tokens. ... (defmethod advance ((self scanner)) ... (defun expression (scanner) ...
    (comp.programming)
  • Re: Very Slow character counting in Word 2003
    ... >Dim Scanner as Range ... It is usually only suitable for char by char ... >Steve Hudson - Word Heretic ...
    (microsoft.public.word.vba.general)