Re: Best way to parse file into db-type layout?



Thanks for the comments. Quick notes:
1. Yes, ultimate goal is to output various fields into Excel, but I need to
enter the data into a DB as well to avoid having any strange problems. That may
be handled by a completely different process. I should have been more exact in
saying that my current goal is to get this data into a database.

2. I'm relatively new to Python and definitely new to trying to parse
Ragged-right type CSV files that are pairs of data. I'm trying to get my syntax
correct when wording my questions/phrases - please bear with me. :-) Your help
(all posters) is quite appreciated.

3. regarding read/write each row - I started thinking about working my way
through the dictionary. This file could have up to 1900 pairs of data per row.
That's an awful lot of columns to put into a database table, especially if quite
a few will be NULL. I starting thinking of stepping through each pair by each
row. The rows are identified by Letter type and an account number (always the
same field #). I would want to use those, and insert something like:
LetterType, Account #, Field#, Field Data, (current date)
I wasn't quite sure how to phrase that and apologize for being unclear. I
guess if I'm doing row-by-row, I can hold that for each row without too much
trouble.

4. No control over file format. I can define a lookup table that will match
Field# to a real name, perhaps code that into my scripts, but I'm stuck with
that. As to other posters' responses - it's ANSI-coded, CSV, No Quotes (no
Commas in values). Coming from the mainframe, I don't have to worry about
embedded special characters within the file as they can't be entered into these
fields to start with.
File always starts with letter_type, 1 - no idea why they didn't do that as
0, letter_type for consistency but it isn't my file. (Heck, I would have just
specified to output all fields all the time and avoid any need to do special
parsing :-)

Anyway, I appreciate the help. The code looks like it will work for me and I'll
be doing some basic tests to get up to speed on splitting/parsing the data and
putting it into a form that will be usable by all. I may have to write certain
rows out to a new file if they don't match what I'm looking for. I don't
anticipate any issues there - just read the row, write it to a new file if it
doesn't match my criteria.

Thanks to all. Off to search for dictionaries and databases now. :-)

-Pete

John Machin <sjmachin@xxxxxxxxxxx> wrote:

> On Fri, 29 Apr 2005 18:54:54 GMT, Peter A. Schott
> <paschott@xxxxxxxxxxxxxxxxxx> wrote:
>
> >That looks promising.
>
> > The field numbers are pre-defined at the mainframe level.
>
> Of course. Have you managed to acquire a copy of the documentation, or
> do you have to reverse-engineer it?
>
> >This may help me get to my ultimate goal which is to pump these into a DB on a
> >row-by-row basis ( :-P )
>
> That's your *ultimate* goal? Are you running a retro-computing museum
> or something? Don't you want to *USE* the data?
>
> > I'll have to do some playing around with this. I
> >knew that it looked like a dictionary, but wasn't sure how best to handle this.
> >
> >One follow-up question: I'll end up getting multiple records for each "type".
>
> What does that mean?? If it means that more than one customer will get
> the "please settle your account" letter, and more than one customer
> will get the "please buy a spangled fritzolator, only $9.99" letter,
> you are stating the obvious -- otherwise, please explain.
>
> >Would I be referencing these by row[#][field#]?
>
> Not too sure what you mean by that -- whether you can get away with a
> (read a row, write a row) way of handling the data depends on its
> structure (like what are the relationships if any between different
> rows) and what you want to do with it -- both murky concepts at the
> moment.
>
> >
> >Minor revision to the format is that starts like:
> >###,1,1,val_1,....
>
> How often do these "minor revisions" happen? How flexible do you have
> to be? And the extra "1" means what? Is it ever any other number?
>
> >
> >
> >I think right now the plan is to parse through the file and insert the pairs
> >directly into a DB table. Something like RowID, LetterType, date, Field#,
> >Value.
>
> Again, I'd recommend you lose the "Field#" in favour of a better
> representation, ASAP.
>
> > I can get RowID and LetterType overall, date is a constant, the rest
> >would involve reading each pair and inserting both values into the table. Time
> >to hit the books a little more to get up to speed on all of this.
>
> What you need is (a) a clear appreciation of what you are trying to do
> with the data at a high level (b) then develop an understanding of
> what is the underlying data model (c) then and only then worry about
> technical details.
>
> Good luck,
> John
>

.



Relevant Pages

  • Re: FSI Indices with translates the answer
    ... nature of dictionaries as creation devices - any dodo can change them. ... this would allow an application account to be attached ... WEBSITE1-account should be used to lookup the PRODUCTS-file. ... fact that a Basic program can perform an [EXECUTE "logto ...
    (comp.databases.pick)
  • Re: Please! Doesnt anyone know a better way to do this?
    ... account, they need to automatically be directed to the page to enter data ... session variable on the Account page. ... I assume here that you're checking a database when the user attempts to ... When a new user attempts to login or clicks to register, ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Help with WSS 3.0 Server Farm Config - Backend SQL 2005
    ... I had to use only "sharepoint" to get the ... What interest me though is that the database get created but fails after ... Virtual Server with DBSVR ... an account local to the WEBSVR) to create and access the SQL server, ...
    (microsoft.public.sharepoint.windowsservices)
  • RE: Take This Value and Shove It
    ... because of the way I must acquire the loan account numbers and borrower ... To get that data into the Access database, and keep it updated, I have ... Your subform record source should be tblNotes or a query based on it. ...
    (microsoft.public.access.formscoding)
  • Re: Relationships, forms and command buttons
    ... asking how to do a particular thing in terms of the database objects itself. ... If they change their bank account, ... tblIndividuals and tblCAFGAYEDons) with linking tables with an autonumber PK ...
    (microsoft.public.access.gettingstarted)