Re: reg exp

From: Gunnar Hjalmarsson (noreply_at_gunnar.cc)
Date: 08/30/04

  • Next message: CV: "Matching neighbouring words of a pattern using Regex"
    Date: Mon, 30 Aug 2004 17:15:54 GMT
    
    

    Ken Chesak wrote:
    > Perl scipt is formatting text for HTML page. It changes things like
    > an & to &amp. But should not change &nbsp. It uses \ as an escape
    > character. So \&nbsp will become &nbsp. The final results are
    > correct, but is there a better way to do this?
    >
    > Input file test.txt
    > \HOME & \  BORN \& FREE BORN FREE ' \' HELP " \" w\\\\\\\w
    >
    > 1st change
    > 1a= \HOME & \  BORN \& FREE BORN FREE '' \' HELP " \"
    > w\\\\\\\w
    > 2nd changes
    > 1b= HOME &   BORN & FREE BORN FREE '' ' HELP " "
    > w\\\w
    >
    > #!/usr/local/bin/perl5
    > #
    > %encode = ( '&' => '&',
    > '"' => '"',
    > '\'' => '\'\'' );
    >
    > $data = `cat test.txt`;
    > print "Oa= $data\n";
    > $data =~ s/(?<!\\)(.)/defined($encode{$1})?$encode{$1}:$1/eg;
    > print "1a= $data\n";
    > $data =~ s/(\\)(.)/$2/g;
    > print "1b= $data\n";

    Don't know about better, but this does it with one substitution, and
    does not require escaping of HTML entities in the original text:

         $data =~ s{(&#?\w+;)|\\(.)|([&"'])}
                   { $1 ? $1 : $2 ? $2 : $encode{$3} }eg;

    Another thing is that I'm a bit confused about the wider purpose with
    the exercise...

    -- 
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    

  • Next message: CV: "Matching neighbouring words of a pattern using Regex"

    Relevant Pages

    • [OT] Re: Asking if elements in struct arre zero
      ... >> What, besides source code, would be easier to read in monospace? ... > effort of formatting it in a variable-width font, ... Keith is right: plain ASCII text is one of the easiest media to ... If HTML postings were generally supported, ...
      (comp.lang.c)
    • Re: Word v.X freezes up when handling bullet points
      ... > substitute default list formatting. ... > Format>Bullets and Numbering bullets or numbers is even more likely to cause ... > because the result is NOT HTML. ... > making lightweight web pages suited for the Internet of yesteryear when ...
      (microsoft.public.mac.office.word)
    • Re: Cant read email
      ... Antivirus - just one more thing that complicates trying to send as a html ... the page...in other words, no special formatting. ... Now convert the publication to a PDF by File> ... My email client is html enabled, but I have it set to not download ...
      (microsoft.public.publisher)
    • Re: automating the insertion of HTML tables (or tab delimited files)
      ... Word is overwriting the obvious intent of the HTML to center the table ... during updates" selector. ... the HTML formatting to take precedence, ... that formatting will be applied to the field code result. ...
      (microsoft.public.word.vba.general)
    • Re: EX07: force remote message to plain text
      ... Thank you, Robbin, but the ContentType is already set to Mimetext, yet the ... recipients are getting both text and html in their messages. ... and formatting. ... " MimeHtmlText converts messages to MIME messages that use HTML ...
      (microsoft.public.exchange.admin)