RE: HTML to Text



-----Original Message-----

<HTML>
<HEAD>
<TITLE>Bin Server</TITLE>
</HEAD>
<BODY>
<p>Data that I need</p>
<p>Data that I need</p>
</BODY>
</HTML>

I want the output to just be lines of "Data that I need" stored in a
string, that I can work on each line one at a time, or in an array or
something like that would be great.

I would first strip out everything from the start to <BODY>, then everything
from </BODY> to the end...

$test2convert =~ s/^.*\<BODY\>// ;
$test2convert =~ s/\<\/BODY\>.*$// ;

Remove any existing newlines...

$test2convert =~ s/\n// ;

Are you sure that the paragraph tags are always paired up? If so, you could
always strip out the <p>s and substitute the </p> with \n . If you don't
have control over the input, then this would be a big assumption to make.

But it might give you ideas to start with based on how complex the data
is...

Good luck.

-r




.



Relevant Pages

  • Re: hasAttribute equivalent for IE
    ... var el = document.getElementById; ... the onload alert shows 'function' not 'string', ...
    (comp.lang.javascript)
  • Re: [PHP] Change case of HTML tags
    ... <HTML> ... <HEAD> ... I'm working with an XMLDocument object in javascript and when I serialize it ... to string format, for some reason all the tags are made into uppercase. ...
    (php.general)
  • Re: POST Method Problem on Form
    ... Good luck. ... with my form sending me the content of a form. ...
    (microsoft.public.inetsdk.html_authoring)
  • Re: storing the text of an HTML page
    ... static final String HTML = " ... However if you already have a file of HTML which you want to include as a string constant, I haven't found a way of inserting it into the Java source without also having to add quotes etc to the start and end of every line and hunting down and escaping any special characters. ...
    (comp.lang.java.programmer)
  • Re: [PHP] Change case of HTML tags
    ... <HTML> ... <HEAD> ... I'm working with an XMLDocument object in javascript and when I serialize it ... to string format, for some reason all the tags are made into uppercase. ...
    (php.general)