Re: Reasons for preferring Lisp, and for what

From: Tuang (tuanglen_at_hotmail.com)
Date: 11/08/03


Date: 8 Nov 2003 01:03:24 -0800

Edi Weitz <edi@agharta.de> wrote in message news:<87r80jrj6k.fsf@bird.agharta.de>...
> On 7 Nov 2003 12:35:11 -0800, tuanglen@hotmail.com (Tuang) wrote:
>
> > Actually, the built-in regex-based features let you make just about
> > any collection of data appear to be such a data structure, which you
> > can then apply the other Perl tools to. An example of this would be
> > fetching an HTML file from a URL, locating just the table of
> > interest amidst the other clutter, and parsing the lines out of the
> > table.
>
> The problem with Perl's (admittedly very good) regexp capabilities is
> that Perl users tend to think that they can attack almost every
> problem with regular expressions. (Been there, done that.)
>
> Trying to parse HTML or XML with regular expressions is really a bad
> idea....

It depends on how you mean this. If you mean what I think you mean,
that your generic HTML or XML manipulator library should be one that
is built expressly for that purpose, based on the formal specs, I
definitely agree. Perl has such libraries.

If you're just quickly extracting some specific data from some
specific Web page for yourself, though, then regular expressions are
excellent. It doesn't matter whether it's HTML or XML or whatever.
Just find some "landmarks" in the text you can use as delimeters to
base a regex on and let it extract your data.

Clearly this isn't a scalable approach. It's for one-time tasks, not
real applications and, as you point out, Perl users are famous for
taking it too far.



Relevant Pages

  • Re: Serious Perl Regular Expression deficiency?
    ... I started doing Perl 2 years ago and have ... > conclusion that regular expressions have a serious ... This is serious because the not string ... If you want to pull out the contents of XML comments you could do this. ...
    (comp.lang.perl.misc)
  • Re: FAQ 9.4 How do I remove HTML from a string?
    ... How do I remove HTML from a string? ... If its delimeters, then it has nothing to do with Perl, because Perl can't do ... One some dumb-ass can't do with regular expressions I bet. ...
    (comp.lang.perl.misc)
  • Re: HTML parsing
    ... I need to parse the following HTML page and extract TV listing data ... since the HTML page is not XML well formed, I cannot use a XML parser ... Perl, HTML::Parser. ...
    (comp.text.xml)
  • Re: Regular Expression
    ... :how can i use regular expression in c and how can i strip html tags ... html cannot be properly analyzed as regular expressions. ... for extracting tags properly has already been implimented in perl. ...
    (comp.lang.c)
  • Re: How *extract* data from XHTML Transitional web pages? got xml.dom.minidom troubles..
    ... not being well formed XML. ... Do I just need to add something like <?xml ...?> or what? ... As many HTML Transitional pages are very bad formed, ... Only problem is regular expressions having ...
    (comp.lang.python)