Re: file reading by record separator (not line by line)
- From: Tijs <tijs_news@xxxxxxxxxxxxxxx>
- Date: Thu, 31 May 2007 15:14:11 +0200
Lee Sander wrote:
I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee
Below is the easy solution. To get even better performance, or if '<' is not
always at the start of the line, you would have to implement the buffering
that is done by readline() yourself (see _fileobject in socket.py in the
standard lib for example).
def chunkreader(f):
name = None
lines = []
while True:
line = f.readline()
if not line: break
if line[0] == '>':
if name is not None:
yield name, lines
name = line[1:].rstrip()
lines = []
else:
lines.append(line)
if name is not None:
yield name, lines
if __name__ == '__main__':
from StringIO import StringIO
s = \
"""> name1
line1
line2
line3
name2line 4
line 5
line 6"""
f = StringIO(s)
for name, lines in chunkreader(f):
print '***', name
print ''.join(lines)
$ python test.py
*** name1
line1
line2
line3
*** name2
line 4
line 5
line 6
--
Regards,
Tijs
.
- References:
- file reading by record separator (not line by line)
- From: Lee Sander
- Re: file reading by record separator (not line by line)
- From: Lee Sander
- file reading by record separator (not line by line)
- Prev by Date: Re: non standard path characters
- Next by Date: Re: file reading by record separator (not line by line)
- Previous by thread: Re: file reading by record separator (not line by line)
- Next by thread: Re: file reading by record separator (not line by line)
- Index(es):
Relevant Pages
|