Re: Parsing 'dirty/corrupt data'. Advice wanted

burlo_stumproot_at_yahoo.se
Date: 10/30/04


Date: Sat, 30 Oct 2004 13:07:01 GMT

James Willmore <jwillmore@adelphia.net> writes:

> burlo_stumproot@yahoo.se wrote:
> <snip>

<snip of example data>

> Know your data. Know why one line is valid and another isn't. The
> data may appear to have no "logic" or "pattern" to it, but it's there
> somewhere.

I know how my data looks and the regular expression to find them are
simple. What I was hoping for by asking here was if anyone had a better
strategy than the one I have now. What I do now is extensive lookahead
in the file until I find a line that matches. If I cant find it before
a new "block header"-line is found I report the block as broken.

> what is requires for a valid line. That's at first glance and without
> having any clue as to what the data is supposed to be/represent.

I wish I knew what it represented too. I have it in the documentation
but have not had the time to read it yet. I have to do that soon since
I plan to do a reasonabilitytest[1] on the data.

And later I have to make pretty pictures of it in excel and powerpoint!
Ooh JOY!!

[1] Hmm cant find a good word for this right now.

/PM
>From adress valid but rarly read.



Relevant Pages