Re: Transforming ascii file (pseduo database) into proper database



p. a écrit :
I need to take a series of ascii files and transform the data
contained therein so that it can be inserted into an existing
database. The ascii files are just a series of lines, each line
containing fields separated by '|' character. Relations amongst the
data in the various files are denoted through an integer identifier, a
pseudo key if you will. Unfortunately, the relations in the ascii file
do not match up with those in the database in which i need to insert
the data, i.e., I need to transform the data from the files before
inserting into the database. Now, this would all be relatively simple
if not for the following fact: The ascii files are each around 800MB,
so pulling everything into memory and matching up the relations before
inserting the data into the database is impossible.

My questions are:
1. Has anyone done anything like this before,

More than once, yes.

and if so, do you have
any advice?

1/ use the csv module to parse your text files

2/ use a temporary database (which schema will mimic the one in the flat files), so you can work with the appropriate tools - ie: the RDBMS will take care of disk/memory management, and you'll have a specialized, hi-level language (namely, SQL) to reassemble your data the right way.


2. In the abstract, can anyone think of a way of amassing all the
related data for a specific identifier from all the individual files
without pulling all of the files into memory and without having to
repeatedly open, search, and close the files over and over again?

Answer above.
.



Relevant Pages

  • Transforming ascii file (pseduo database) into proper database
    ... I need to take a series of ascii files and transform the data ... do not match up with those in the database in which i need to insert ... inserting the data into the database is impossible. ...
    (comp.lang.python)
  • Re: Transforming ascii file (pseduo database) into proper database
    ... The ascii files are just a series of lines, ... do not match up with those in the database in which i need to insert ... inserting the data into the database is impossible. ... Albert van der Horst, UTRECHT,THE NETHERLANDS ...
    (comp.lang.python)
  • Re: Requery "Too Soon"?
    ... "The database has been placed in a state by user ... then the Requery will return the just ... MsgBox appears, I get an empty subform, i.e., I do ... I'll try inserting a DoEvents or two. ...
    (microsoft.public.access.formscoding)
  • Re: SQLCE performance from .NET CF v2.0
    ... Please remember when bulk inserting is being executed against SQL CE, ... > local database functionality. ... > database technology and I am considering switching to SQLCE at the same ... > execute the prepared statement again. ...
    (microsoft.public.dotnet.framework.compactframework)
  • Re: Formatting an inserted database field
    ... stop inserting unnecessary and unwanted Carriage Returns. ... DATABASE fields at some point which does cause an unhelpful extra ... When viewing the document without Show/Hide formatting selected (and ...
    (microsoft.public.word.mailmerge.fields)