Re: Transforming ascii file (pseduo database) into proper database



On Jan 22, 7:51 am, "p." <ppetr...@xxxxxxxxx> wrote:
I need to take a series of ascii files and transform the data
contained therein so that it can be inserted into an existing
database. The ascii files are just a series of lines, each line
containing fields separated by '|' character. Relations amongst the
data in the various files are denoted through an integer identifier, a
pseudo key if you will. Unfortunately, the relations in the ascii file
do not match up with those in the database in which i need to insert
the data, i.e., I need to transform the data from the files before
inserting into the database. Now, this would all be relatively simple

I'm glad you think so ... we don't have a clear understanding of the
problem.

if not for the following fact: The ascii files are each around 800MB,
so pulling everything into memory and matching up the relations before
inserting the data into the database is impossible.

My questions are:
1. Has anyone done anything like this before, and if so, do you have
any advice?
2. In the abstract, can anyone think of a way of amassing all the
related data for a specific identifier from all the individual files
without pulling all of the files into memory and without having to
repeatedly open, search, and close the files over and over again?

It would help enormously if you clarified whether the "identifier"
identifies entities or relationships between entities. Is there a
different series of "identifiers" for each (say) type of relationship?

1. Sort each of the N ascii files into "identifier" order. Do an N-way
merge to get all the data for each "identifier" in the same place at
the same time, rearrange it, and insert it into the final database.

OR

2. Load each of the N ascii files into a table in a project-temporary
database. Create a suitably-structured index on the "identifier"
column in each table. Do an N-way merge using SQL inner/outer
joins ...


How big is N anyway? How many unique "identifiers"?
.



Relevant Pages

  • Transforming ascii file (pseduo database) into proper database
    ... I need to take a series of ascii files and transform the data ... do not match up with those in the database in which i need to insert ... inserting the data into the database is impossible. ...
    (comp.lang.python)
  • Re: Transforming ascii file (pseduo database) into proper database
    ... The ascii files are just a series of lines, ... do not match up with those in the database in which i need to insert ... inserting the data into the database is impossible. ... Albert van der Horst, UTRECHT,THE NETHERLANDS ...
    (comp.lang.python)
  • Re: Transforming ascii file (pseduo database) into proper database
    ... The ascii files are just a series of lines, ... do not match up with those in the database in which i need to insert ... so pulling everything into memory and matching up the relations before ... inserting the data into the database is impossible. ...
    (comp.lang.python)
  • Re: some ideas about db rheory
    ... The database records not only the ... to the 'concept' using an identifier attribute. ... option ?Intrinsic Properties?. ... error committed when using standard relational techniques, ...
    (comp.databases.theory)
  • Re: Web Parts Instances
    ... For the cookie, I'm sure it is necessary for personalization. ... their corresponding database record upon reload (even once the user is ... Are you saying that Unique Identifier = pageURL+ ...
    (microsoft.public.dotnet.framework.aspnet)