Re: Writing HTML parser wasn't as hard as I thought it'd be
- From: gisle@xxxxxxxxxxxxxx (Gisle Sælensminde)
- Date: 30 Apr 2007 17:50:21 +0200
Kent M Pitman <pitman@xxxxxxxxxxx> writes:
Modularizing the task into something that corrects bad HTML to good
and something that displays good HTML is probably the way to go.
Parsers for bad HTML don't have to know about HTML "meaning", just its
structure.
This is in fact what at least one web browser does internally. In an earlier
job I did web browser development, and that web browser first parsed the HTML
into a DOM-tree, and before the tree was sent to the rendering engine it went
through a so called "DOM-fixer". The DOM-fixer basicly was a set of rules to
rewrite bad HTML, so that the rendering engine not had to deal with them.
This rules was constantly rewritten in order to be able to display all the
pages the other guys could display. This code was required for the browser
to be able to show what people expected a browser to be able to display.
I would guess that most web browsers do something similar.
--
Gisle Sælensminde, Phd student, Scientific programmer
Computational biology unit, BCCS, University of Bergen, Norway,
Email: gisle@xxxxxxxxxx
The best way to travel is by means of imagination
.
- Follow-Ups:
- Re: Writing HTML parser wasn't as hard as I thought it'd be
- From: Kent M Pitman
- Re: Writing HTML parser wasn't as hard as I thought it'd be
- References:
- Re: Writing HTML parser wasn't as hard as I thought it'd be
- From: Robert Uhl
- Re: Writing HTML parser wasn't as hard as I thought it'd be
- From: Kent M Pitman
- Re: Writing HTML parser wasn't as hard as I thought it'd be
- Prev by Date: Re: Opposite of ~^ FORMAT Directive
- Next by Date: Re: Any macro for inserting math "normally"
- Previous by thread: Re: Writing HTML parser wasn't as hard as I thought it'd be
- Next by thread: Re: Writing HTML parser wasn't as hard as I thought it'd be
- Index(es):
Relevant Pages
|