Re: Regular expression to find <tr> tags in 2nd level HTML tables

From: Tad McClellan (tadmc_at_augustmail.com)
Date: 01/23/04


Date: Thu, 22 Jan 2004 21:26:57 -0600


[
  Non-existent newsgroup removed again.
  comp.lang.javascript removed, no JavaScript content.
  Followups set.
]

[ I couldn't figure out how to repair the top-posting, so
  I just snipped the full-quote at the end.
]

Shannon Jacobs <shanen@my-deja.com> wrote:

[ context: comp.lang.perl was rmgroup'd long ago ]

> I'm so sorry to hear that the Google Groups system has been "neglected
> for many years", as you put it so thoughtfully.

Usenet was administered successfully for 15 years before
DejaNews (google groups).

Google is not a part of Usenet.

Google does not administer Usenet.

Google is an archive of everything appears on Usenet without
regard to whether it is _supposed_ to appear on Usenet or not.

There is a protocol (the "P" in "NNTP") for administering Usenet.

There was a proper protocol message removing the comp.lang.perl
newsgroup many years ago when the other Perl newsgroups were formed.

It appears that you do not have enough experience with Usenet's
operation and history to be acting in an authoritative role.

But NONE of that is of much importance when compared to this:

   The clued do not read comp.lang.perl

Why would you want to ask questions where there are few people with clues?

Post your Perl question to comp.lang.perl and 20 newbies will read it.

Post it to comp.lang.perl.(misc|moderated) and hundreds of people who have used
Perl for years will read it.

You choose who you want answers from...

> It really is
> unfortunate that so many people regard Google as a useful information
> resource, isn't it?

I did not say anything at all about Google, so I don't know
what you are going on about there...

> Incidentally, when I finally had a bit of free time this morning, I
> rethought the technical problem and did come up with a trivial
> regex-based solution. It did exactly what I required on the first
> attempt, confirming that the technical problem was pretty much as
> trivial as I had thought it was.

Working once with one set of data is not what most people
would term "confirmation".

Perhaps you have just not tried it with a data set that exposes
it weakness (if any)?

> Why did the newsgroups fail to produce the technically trivial answer?

Because there was no technically trivial answer.

> I asked a simple technical question,
> and wound up being dragged into a religious war about proper ways to
> handle HTML.

It was not a religious war. It was a mathematical war.

> Not very useful.

Mathematics (Set Theory) is very useful for parsing grammars.

> I still regard regular expressions as useful and worthy of further
> study.

Me too. I did not say that they were not.

Regexs are great for tokeninizing, but hopeless for the real parsing.

> I cannot say the same thing about most of the people who
> responded so religiously to my trivial question.

Your question was not as trivial as you seem to think.
(that was the point of many of the followups.)

It can be proven (in the mathematical sense) that a Regular
Expression cannot parse a Context Free Grammar (which HTML is).[1]

You can do it in special highly-restricted cases, but not in
the general case. It is impossible.

[1] but Perl's regular expressions are no longer mathematical
    Regular Expressions, they've mutated.

-- 
    Tad McClellan                          SGML consulting
    tadmc@augustmail.com                   Perl programming
    Fort Worth, Texas


Relevant Pages

  • Re: Theodore Adorno, a prophet of data systems design
    ... We can't leave the context hanging around forever, ... the newsgroup charter", any more than they mean "someone who has been ... Usenet would collapse in a wobbly heap on the floor. ... You don't "clean up" a newsgroup by posting ...
    (comp.programming)
  • Reference IDs (Was Boston area aluminum welding shop?)
    ... >> your posts don't include any context, people don't know what or who this ... since email quoting is usually done with top posting to ... The accepted rules for Usenet ... "You should be using one system: my system, Google Groups", then I would ...
    (rec.crafts.metalworking)
  • Re: Galileo - Take 2 [Fantastically LONG]
    ... newsgroup with Thunderbird ... context could get a satisfactory idea of what had been said. ... You obviously went and looked at my posting profile on Google to figure ... experienced in the matter; hence all the old-timer talk in my post. ...
    (talk.origins)
  • Re: rpm doesnt recognize installed version of python
    ... > On Mon, 19 May 2008, in the Usenet newsgroup linux.redhat, in article ... >>google groups which is nice, is that one can search all the groups and ... > large (the server I'm spooling from has over 109000 groups). ...
    (linux.redhat)
  • Re: Reply to attendees whove acceptted
    ... It was in context with the previous and only post in the thread. ... Google doesn't automatically quote what we reply to and when others do ... If one does quote the entire previous thread in newsgroup ... Even if you were using the same Google ...
    (microsoft.public.outlook.calendaring)