Re: Logic failure..

From: Liz (liz_wants_no_spam_at_xcalibur.nospam.co.uk)
Date: 01/16/05


Date: 16 Jan 2005 09:00:41 -0800

RandomAccess wrote:

>
> Hi Liz,
>
> I've been writting a HTML parser (importer) and ran into pretty much
> My first attempt was a real mess, and I finally decided to implement
> the system as a state machine. I'm loading the entire text into a
> single buffer and scanning the text 1 character at a time. The
> "state" mechanism handles what to do with characters as they are read
> based on the current state. The buffer is simple a string and the
> parser uses a single index into the string.

This is where Im kinda at, I have a character by character deal, but
the problem comes with the word wrapping.. for example

Some text http://www.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyyy

So, for example, Imagine the y's will make it too long for the line.

the code correctly highlights the URL with a tag URL code at the front
and an end url code at the end.

It then works through the characters till it gets to the end of the
line and goes.. "Hmm, I need to find me a good point to wrap".. so, it
searches back down the line for the wrap point, and finds the space,
problem is by then its got the URL settings in its mind as current....
and what i end up with is

some text
<wraptag><highlightUrlcode>http://.................>

which would be ok except if for example

Some text "http://www.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyy"

before the wrap you have

Some text "<url
tag>http://www.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyy url>"

But because it goes through finds the need to wrap at y.. it has the
tags so I get

some text
<wraptag><url tag>"<url
tag>
http://www.............................. url>"

which is obviously wrong..

If I dont do a search and delete all <wraptag><url tag> links when I
unwrap it, I end up with extra <url tags> as resizing events occur and
I mean you can end up with like

Some text <url tag><url tag><url tag>.....<url tag>http://...

So my logic is pants somewhere, but finding it seems a pain in the
hiney..

and while when I first wrap it I seem to have it right, Ive had to make
the assumption there were no codes before the URL tag (but of course
this could easily be another colour code set... and its just driving me
mad..

.. Im not actually dealing with HTML, Im dealing with telnet and ansi
colours .. just to make it more complex. the codes could technically
delete half the stuff already there as it is..



Relevant Pages

  • Re: nobr html command
    ... Mozilla gets it right, IE is broken. ... IE seems to be using the pipe character "|" as whitespace, ... allowing text to wrap at a hyphen.) ... navigation is fixed modify the nav bar in a text ...
    (microsoft.public.frontpage.programming)
  • Re: Text Format
    ... Vertical Alignment to Top for those cells? ... > G'Day Mates, ... > the text would wrap up one line. ... What is the character I can type or what ...
    (microsoft.public.mac.office.excel)
  • Re: [PATCH] add a trivial patch style checker
    ... control column), since the cursor may move to the 81th column when ... We do have screens wider than 80 characters, but almost all the time I spend in terminal windows, they are set to 24x80. ... Hence, 80 is "annoying" not only because patches will wrap, but also because in some editors the 80th character will also wrap. ...
    (Linux-Kernel)
  • Text Format
    ... row 1 and formatted> vertical> bottom> wrap text. ... If I type Sales 2004, ... the text would wrap up one line. ... What is the character I can type or what ...
    (microsoft.public.mac.office.excel)
  • Re: Wrapping strings
    ... >I have a string containing multiple lines that I would like to put ... >I'd like to wrap each line by the 78th character. ...
    (comp.lang.ruby)