Re: how to efficiently find line number k

From: James Dennett (jdennett_at_acm.org)
Date: 09/29/04


Date: Tue, 28 Sep 2004 21:13:01 -0700

b83503104 wrote:

> I have a big text file with millions of lines, and given any number k
> as input, I want to output line k. What is the most efficient way to
> do this, other than checking end-of-line k times?

In general there's no more efficient way possible. You could
cache that information (i.e., build an index) or use a format
with fixed-size records, or various combinations, but there's
no magical way to quickly get to line 500000 of a text file
without reading the previous 499999 lines.

For some platforms, that may not be the case; some filesystems
are inherently record-oriented, and C or C++ implementations
targetting such filesystems may well offer extensions to do
record based random access. That's not true of any platform
on which I've worked though; modern OS's would tend to push
that kind of functionality off to a database application.

-- James



Relevant Pages

  • Re: how to efficiently find line number k
    ... > For some platforms, that may not be the case; some filesystems ... file systems, as implemented in BeFS. ...
    (alt.comp.lang.learn.c-cpp)
  • Re: write to the same file from multiple processes at the same time?
    ... gabor wrote: ... This works across all platforms and filesystems likely to be encountered by a Python program. ... it seems to do what i need (the flock() call waits until he can get access).. ...
    (comp.lang.python)
  • Re: How to handle >16TB devices on 32 bit hosts ??
    ... I expect that the VFS could be made to work with 64-bit pgoff_t fairly ... radix-trees use a ulong index, so we would need a new ... The bigger problem is filesystems - they'll each need to be checked, ... Filesystems that work correctly> 16TB on 64-bit platforms should continue ...
    (Linux-Kernel)
  • Re: OO in Tcl (goodbye tcl)
    ... > Provide binaries for a zillion platforms? ... I don't think it would be necessary to provide binaries for a zillion ... standardized TCL library of extensions could include a requirement for ... standard table that has columns for each supported platform. ...
    (comp.lang.tcl)
  • Re: silent semantic changes with reiser4
    ... Andrew Morton wrote: ... >b) accept the reiser4-only extensions with a view to turning them into ... >offer features which other filesystems do not and cannot offer makes me ... Reiser4 has done that, finally. ...
    (Linux-Kernel)