Re: replacing tags between tags



beartiger@xxxxxxxxx wrote:
> Jürgen Exner wrote:
>> beartiger@xxxxxxxxx wrote:
>>> John Bokma wrote:
>> [...]
>>>> You could use s/// to do this, but it might fail. Better to parse
>>>> the HTML, fix it, and write it out.
>>>
>>> That answers the specific example, but I was looking for something
>>> to answer the general case.
>>
>> Why do you think parsing the HTML would _not_ work in the general
>> case?
>
> I don't. Would you please illustrate what you mean?

Well, John wrote:
<quote>Better to parse the HTML, fix it, and write it out.</quote>

You replied:
<quote>
>>> That answers the specific example, but I was looking for something
>>> to answer the general case.
</quote>

To me that seems to imply that you do not believe that parsing the HTML
would work only for the specific example but not for the general case. If
this was not what you meant then I obviously misunderstood what you wrote.

Anyway, this topic has been discussed a gazillion times before. To parse
HTML use a proper HTML parser because contrary to popular believe parsing
HMTL is not trivial. For further details please see DejaNews and the FAQ
(perldoc -q HTML: " How do I remove HTML from a string?").

jue




.



Relevant Pages

  • Re: PHP-Yes, HTML-No --- Why?
    ... So, it's pretty fair to say, that Apache and PHP don't give a damn if they're parsing HTML files for PHP, as as I said, the performance hit is minimal. ...
    (comp.lang.php)
  • Re: HTMLParse - Tutorials and Practical Examples
    ... I don't know of a tutorial for htmlparse either, ... SAX like parsing ... Depending on your needs and the html involved you can also use other tcl ... based html parsers like tdom with the -html option, ...
    (comp.lang.tcl)
  • Re: Regular expression to find <tr> tags in 2nd level HTML tables
    ... SJ> handle HTML. ... you don't get it about parsing html. ... perl's regexes are too religious for you. ... SJ> at the data files again and asked myself whether there was some ...
    (comp.lang.perl.misc)
  • Re: Regex (?(?{CODE})) has too many branches
    ... Many standards leave plenty of room ... == for choice and interpretation, ... But the HTML standard *doesn't* leave much room for choice. ... Or with difficulties in parsing. ...
    (comp.lang.perl.misc)
  • Re: Special characters in attributes
    ... Do you intend to reinvent it from scratch, or are you using some software package for parsing HTML? ... The character set of HTML is defined as UCS, commonly known as the Unicode character set, though more formally the ISO 10646 set. ...
    (alt.html)