Re: Simple text parsing gets difficult when line continues to next line



Jacob Rael wrote:
Hello,

I have a simple script to parse a text file (a visual basic program)
and convert key parts to tcl. Since I am only working on specific
sections and I need it quick, I decided not to learn/try a full blown
parsing module. My simple script works well until it runs into
functions that straddle multiple lines. For example:

Call mass_write(&H0, &HF, &H4, &H0, &H5, &H0, &H6, &H0, &H7, &H0,
&H8, &H0, _
&H9, &H0, &HA, &H0, &HB, &H0, &HC, &H0, &HD, &H0, &HE,
&H0, &HF, &H0, -1)


I read in each line with:

for line in open(fileName).readlines():

I would line to identify if a line continues (if line.endswith('_'))
and concate with the next line:

line = line + nextLine

How can I get the next line when I am in a for loop using readlines?

Don't do that. I'm rather dubious about approaches that try to grab the
next line on the fly e.g. fp.next(). Here's a function that takes a
list of lines and returns another with all trailing whitespace removed
and the continued lines glued together. It uses a simple state machine
approach.

def continue_join(linesin):
linesout = []
buff = ""
NORMAL = 0
PENDING = 1
state = NORMAL
for line in linesin:
line = line.rstrip()
if state == NORMAL:
if line.endswith('_'):
buff = line[:-1]
state = PENDING
else:
linesout.append(line)
else:
if line.endswith('_'):
buff += line[:-1]
else:
buff += line
linesout.append(buff)
buff = ""
state = NORMAL
if state == PENDING:
raise ValueError("last line is continued: %r" % line)
return linesout

import sys
fp = open(sys.argv[1])
rawlines = fp.readlines()
cleanlines = continue_join(rawlines)
for line in cleanlines:
print repr(line)
===
Tested with following files:
C:\junk>type contlinet1.txt
only one line

C:\junk>type contlinet2.txt
line 1
line 2

C:\junk>type contlinet3.txt
line 1
line 2a _
line 2b _
line 2c
line 3

C:\junk>type contlinet4.txt
line 1
_
_
line 2c
line 3

C:\junk>type contlinet5.txt
line 1
_
_
line 2c
line 3 _

C:\junk>

HTH,
John

.



Relevant Pages

  • RE: Restart Pending
    ... script it you can use WMI to check for the existence of ... Wscript.Echo "No Pending Reboot" ... Has anyone come up with a WMI query that determines if a system has a ... pending restart waiting in the wings, following Windows Update? ...
    (microsoft.public.windows.server.scripting)
  • Re: service state across servers
    ... Try this out..may be need to do it in a loop for multiple computers ... wscript.echo "start pending" ... The script in question is the monitoring tool Radar and it runs on 26 servers, so such a script would be of tremendous help. ...
    (microsoft.public.windows.server.scripting)
  • Re: How to know if read is pending on pipe
    ... > programs may use stdin and stdout to interface with the operator. ... this seems to be a case for using the pipe mechanism. ... > there is no read pending. ... I cannot predict when and if the script wants ...
    (comp.unix.aix)
  • Re: Simple text parsing gets difficult when line continues to next line
    ... Here's a somewhat less verbose version of the state machine gadget. ... buff = "" ... return linesout ...
    (comp.lang.python)