Re: FAQ 5.38 How do I select a random line from a file?
- From: xhoster@xxxxxxxxx
- Date: 09 Jun 2008 16:16:42 GMT
brian d foy <brian.d.foy@xxxxxxxxx> wrote:
In article <484adab2$0$30196$4c368faf@xxxxxxxxxxxxxx>, Dan Rumney
<danrumney@xxxxxxxxxxxx> wrote:
Peter J. Holzer wrote:
On 2008-06-04 19:03, PerlFAQ Server <brian@xxxxxxxxxxxxxx> wrote:
5.38: How do I select a random line from a file?
If your file is large, and the lines are roughly of equal length, it
is probably preferrable to just do a random seek into the file and
read the line you've hit (by searching backwards and forwards for
$/).
The problem with that is that the probability of selecting a line
becomes a function of that line's length.
The algorithm above ensures that the probability of selecting a line is
precisely 1/N where N is the total number of lines in the file.
From time to time, I think about how I would solve this problem if I
actually needed it for something important. That is, not as some
thought experiment or quote-for-the-sig thing.
Has anyone solved this for something non-trivial? Say, for something
with huge numbers of lines or large file sizes, or where you want to
choose several random lines?
The solution that I think about (but have never implemented), is some
sort of pre-indexing of line endings so you have a list of file
positions where all the lines line (or start, or whatever). When you
want a random line, you choose a random element from that list. Open
the file, seek, and read a line.
If space wasn't at a premium, and the file was static, I might just
determine the maximum line length and then pad out all lines to that length
(putting the padding after the \n, so they are really part of the next
line), and then encode that fixed length either in the code or in the file
name. Then seek to a random record. If the maximum line length was an
order of magnitude more than the typical, I might not do that.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
.
- Follow-Ups:
- Re: FAQ 5.38 How do I select a random line from a file?
- From: nolo contendere
- Re: FAQ 5.38 How do I select a random line from a file?
- References:
- FAQ 5.38 How do I select a random line from a file?
- From: PerlFAQ Server
- Re: FAQ 5.38 How do I select a random line from a file?
- From: Peter J. Holzer
- Re: FAQ 5.38 How do I select a random line from a file?
- From: Dan Rumney
- Re: FAQ 5.38 How do I select a random line from a file?
- From: brian d foy
- FAQ 5.38 How do I select a random line from a file?
- Prev by Date: Re: TCP apllications
- Next by Date: Re: select count in perl
- Previous by thread: Re: FAQ 5.38 How do I select a random line from a file?
- Next by thread: Re: FAQ 5.38 How do I select a random line from a file?
- Index(es):
Relevant Pages
|