Re: Best way to parse file into db-type layout?



On Tue, 03 May 2005 19:12:07 GMT, Peter A. Schott
<paschott@xxxxxxxxxxxxxxxxxx> wrote:

>Thanks for the comments. Quick notes:
>1. Yes, ultimate goal is to output various fields into Excel,

Peter, (again) I'd like to suggest that putting some or all of the
data into storage of type X (whether X is a database or Excel or 5x3
index cards) is unlikely to be the true ultimate goal ...

> but I need to
>enter the data into a DB as well to avoid having any strange problems. That may
>be handled by a completely different process. I should have been more exact in
>saying that my current goal is to get this data into a database.

Let me get this straight: Process 1 is going to put some fields into
Excel. Process 2 is going to put the data into a DB to avoid "any
strange problems" -- with what? The (completely different) process 1?

>
>3. regarding read/write each row - I started thinking about working my way
>through the dictionary. This file could have up to 1900 pairs of data per row.
>That's an awful lot of columns to put into a database table, especially if quite
>a few will be NULL.

Focussing (briefly) on the storage details, it's also an awful lot
compared to the max 255 columns in Excel. Dragging ourselves back up
to the live situation that your data is modelling, it's also an awful
lot of attributes for any object (whether real or virtual) to have. It
must be possible to break each line down further. It's extremely
unlikely to be stored like that on the mainframe; it's just been spat
out like that to go to the letter-printing bureau. You may like to
consider talking to the mainframe folk about their data structures,
and how you can more or less reconstruct those structures from the
letter file.

> I starting thinking of stepping through each pair by each
>row. The rows are identified by Letter type and an account number (always the
>same field #). I would want to use those, and insert something like:
>LetterType, Account #, Field#, Field Data, (current date)

This method of representing the data is close to the other extreme,
compared to to 1900 columns per row. With this method, the five or so
components of the customer address (for example) will be in different
rows in the database i.e. not grouped together with a handle that says
"address". It doesn't appear to me to be any more usable than the
letter file.

If the customer can have more than one letter of the same type on the
same date, you will need to store also something that uniquely
identifies the letter e.g. the line number in the file; otherwise the
data from multiple letters will be mingled. Even if "they" say it's
not possible, and you believe them, it might be a good idea anyway --
might make it easier than working with (letter_type, acct_num) when
you are trying to glue the fragments back together.

If it's not possible because the 1900 fields = something like 100
fields x an allowance up to 19 repetitions of a similar scenario, then
you know what to do next :-)

You probably mean "effective date as at which the file was produced"
rather than "current date". Only one file per day? There may be a
backlog when you start this exercise, there may be a backlog in the
future after a disruption in the pipeline. You may like to check if
the mainframe assigns a unique name/number to such files and use that
instead as part of your composite key.

> No Quotes (no
>Commas in values). Coming from the mainframe, I don't have to worry about
>embedded special characters within the file as they can't be entered into these
>fields to start with.

Yes, peace in our time, the cheque's in the mail, and of course
they'll still love you in the morning :-)

>Thanks to all. Off to search for dictionaries and databases now. :-)

Searching for dictionaries???

As we are by now way OT (even by this group's standards), I'll just
leave you with my previous comments:
>>
>> What you need is (a) a clear appreciation of what you are trying to do
>> with the data at a high level (b) then develop an understanding of
>> what is the underlying data model (c) then and only then worry about
>> technical details.
>>

It's been nice kibitzing your game.
Lots of luck,
John



.



Relevant Pages

  • Re: WHY
    ... >>proposition than being tied into one mainframe vendor. ... How do you use a database to hedge a commercial bond portfolio ... but all come with Excel. ... >>build reports than using Microsoft excel. ...
    (microsoft.public.excel)
  • Re: Is Excel the best MS Office tool to use for this business appl
    ... I did download several of the MS Office Excel ... database for these items. ... the other end of this topic is the fact that my friend has no ... Compile and maintain my friends list of products and their cost, ...
    (microsoft.public.office.misc)
  • Re: Word 2007/Excell 2007 - random loss of field contents on mailm
    ... I think, for now, I will try to put a few dummy rows in the database while I ... the document to get data from Excel and insert it. ... use that as the data source for the merge. ... I want the field with data to insert that data in the Word doc. ...
    (microsoft.public.word.mailmerge.fields)
  • Re: My first database (long)
    ... > Hi all - feeling my way trying to create first Access database. ... > I created a database in Excel, which for various reasons would be better ... > The database contains client records. ... One worksheet ...
    (microsoft.public.access.gettingstarted)
  • Re: Best practice to store/manipulate Excel spreadsheet cells
    ... part of an event managment database. ... Ofcourse there are quite a few "cells" that correspond to the aisles/alleys ... excel to have squares properly... ... If it's a module in a database application and the "floor plans" are ...
    (microsoft.public.access.formscoding)