Re: Fastest way to split a file by columns?



"Kevin" <kaidizhao@xxxxxxxxxxxx> wrote or quoted in
Message-ID: <1139009028.703566.52430@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>:


Just want to know what is the best way for this course coding task. :-)

Task: to split a big file into many files by its columns. Each
resulting file consists one column of the original big file.


I assume the followings:

a) Space is valid character.
b) Line divider is "\r\n" or '\n'.

Maybe, it is as like:

<code>
(this codes is not tested).

public class MainFrame {

static InputStream in;
static final int colNum = 200;
static OutputStream[] out = new OutputStream[colNum];

public static void main(String[] args) {

try {
initializeStream();

byte[] b = new byte[1];
ByteBuffer buf = new ByteBuffer(512);
int colIndex = 0;

while (true) {

if (in.read() < 0) {

break;
}
switch (b[0]) {
case ',':
out[colIndex++].write(buf.get(), 0, buf.size());
buf.clear();
break;

case '\r': // skip
break;

case '\n':
colIndex = 0;
break;

default:
buf.add(b[0]);
break;
}
}
}
catch (Exception e) {
// process exception.
}
finally {
closeStream();
}
}

static void initializeStream() throws Exception {

in = getInputStream();
Arrays.fill(out, null);

for (int i = 0; i < out.length; i++) {

out[i] = getOutputStream(i);
}
}

static void closeStream() {

try {
if (in != null) in.close();

for (int i = 0; i < out.length; i++) {

if (out[i] != null) out[i].close();
}
}
catch (Exception e) {
}
}

static InputStream getInputStream() throws Exception {

File f = new File("your input file");
return new BufferedInputStream(new FileInputStream(f));
}

static OutputStream getOutputStream(int index) throws Exception {

File f = new File("your output file" + index + ".txt");
return new BufferedOutputStream(new FileOutputStream(f));
}

static class ByteBuffer {

int pos;
byte[] buf;

public ByteBuffer(int initialCapacity) {

buf = new byte[initialCapacity];
}

void add(byte b) {

if (pos >= buf.length) {

resize();
}
buf[pos++] = b;
}

void resize() {

byte[] b = new byte[buf.length * 2];
System.arraycopy(buf, 0, b, 0, buf.length);

buf = b;
}

byte[] get() {

return buf;
}

int size() {

return pos;
}

void clear() {

pos = 0;
}
}
}

</code>





.



Relevant Pages

  • Re: Threading
    ... now I see why I was sceptic about the thread abort exception. ... tried a similar scenario in the default domain. ... static void ThreadProc() ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Check DateTime format
    ... "Jon Skeet " wrote: ... >> format. ... >> inefficient - catching the exception that is. ... static void Time ...
    (microsoft.public.dotnet.general)
  • Re: Check DateTime format
    ... > inefficient - catching the exception that is. ... You can use a regular expression to check the format, ... static void Time ... static readonly Regex Expression = new Regex ...
    (microsoft.public.dotnet.general)
  • Re: [PATCH] mm/slub.c - Use print_hex_dump
    ... The difference is the last line of the ascii is not aligned ... I think casting to type in a hex dump odd, ... number of bytes in the @buf ... static void dump_packet ...
    (Linux-Kernel)
  • [PATCH] leds: remove "checkpatch.pl" warnings
    ... struct device_attribute *attr, char *buf) ... static void __exit corgiled_exit ...
    (Linux-Kernel)