Regex on whole (large) text file

From: Rune Johansen (rune[insert_current_year_here)
Date: 06/27/04


Date: Sun, 27 Jun 2004 14:02:38 +0200

Hi,

I'm sorry if these questions are trivial, but I've searched the net and
haven't had any luck finding the information I need.

I need to perform some regular expression search and replace on a large
text file. The patterns I need to match are multi-line, so I can't do it
one line at a time. Instead I currently read in the entire text file in
a string using the code below.

File fin = new File("input.txt");
FileInputStream fis = new FileInputStream(fin);
BufferedReader in = new BufferedReader(new InputStreamReader(fis));
String aLine = null;
String theText = "";
while((aLine = in.readLine()) != null) {
    theText = theText + aLine + "\n";
}

The problem with this is that the first couple of thousand lines read in
very fast, but it gets slower and slower, and as we approach line 4000
it gets really slow per line.

Is there a better way to read in an entire text file into a string?

Is storing the entire text file in a string a bad idea? And if so, what
are the alternatives?

Is it possible to perform multiple-line regular expressions on a text
file without loading the whole text file into memory?

Thanks in advance,
Rune



Relevant Pages

  • Re: Get regular expression
    ... own tree structure. ... Expression compares a string character-by character, ... regular expression solution, which was about as close as one could get to ... the structure of the hierarchy can be inferred by using ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Get regular expression
    ... regular expression solution, which was about as close as one could get to ... first string. ... explode "ABLATION" and see subnodes of "ENDOMETRIAL ... "Heart 27.33/2" ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Regular expression optimization
    ... position in the replacement array of strings, ... > input string and a MatchEvaluator delegate. ... > The first part required combining the separate Regular Expression strings ...
    (microsoft.public.dotnet.general)
  • Small regular expression parser
    ... the goal was to develop a very simple regular expression parser. ... sets are selected using the % character instead of \. ... into the string of the start of the match and the length of the match. ... Last there are a couple macros to help with captures. ...
    (comp.lang.lisp)
  • Re: Which RegEx Testing Tool Do You Prefer?
    ... * Regular Expression Tester ... > match has been returned prior to the end of the string. ... >> a white space although in a manner that is confusing as I will point out ... >> would ignore those previous white space characters and then report 2:? ...
    (microsoft.public.dotnet.framework.aspnet)