Regexp issue . . .

From: MichaelC (mickyc_at_NOshaSPAMw.ca)
Date: 11/25/03

  • Next message: Eric J. Roode: "Re: Regexp issue . . ."
    Date: Tue, 25 Nov 2003 07:03:37 GMT
    
    

    Hi all. I am having a particularly difficult time with a perl script that I
    am writing. The problem area is a place where I need to strip some newlines
    out of a file.

    My source data is text which is in paragraph form, but has line breaks
    within the paragraphs. I need to do as much processing as possible in order
    to minimise the amount of manual changes that I have to make.

    Sample text is as follows:

    "This document is intended to give you an
    overview of DG as well as highlight some of
    the features. This is a brought to your handheld using DG."
    With DG you can view and edit word processing and spread*** files on
    your handheld. Simple push-button synchronization of
    the handheld with the desktop will maintain the most up-to-date
    version of a file on both the desktop and handheld.

    I want these to be parsed as follows:

    "This document is intended to give you an overview of DG as well as
    highlight some of the features. This is a brought to your handheld using
    DG." With DG you can view and edit word processing and spread*** files on
    your handheld. Simple push-button synchronization of the handheld with the
    desktop will maintain the most up-to-date version of a file on both the
    desktop and handheld.

    --
    One way that I thought might work is to catch all lines that begin upper
    case, prepend them with a line break, strip the trailing break, then trap
    all lines that start lower case and dump them as-is.  Repeat this until no
    matches are made on the lower case test, then clean up all those extra line
    breaks.
    I came up with this . . . but all it seems to do is strip all newlines out.
    while( <infl> ) {
      my $x = $_;
      if ( $x =~ ?^[^a-z]? ) { $x =~ s!(.*)\n!\n\1 ! }
      else { $x =~ s!(.*)\n!\1 ! }
      print outfl $x;
    }
    Any help would be greately appreciated.
    Michael
    

  • Next message: Eric J. Roode: "Re: Regexp issue . . ."