Translate tab-delimited to fixed width text

From: mike beck (cmichaelbeck_at_hotmail.com)
Date: 09/21/04


Date: 21 Sep 2004 10:48:48 -0700

I'm writing a script to take a tab-delimited input file where the
format is known and transform it to a fixed-width text file. I thought
that the script might be easier to troubleshoot or edit later (if the
file formats change, for example) if I specified the column names for
the input and output files, which is why I didn't just map input
column 1 -> export column 14.

Even though I've spent some quality time with the archives and the
fine manual, I'm pretty sure that I'm doing some WRONG THINGS. Or
perhaps some SUB-OPTIMAL THINGS. And possibly REALLY WEIRD THINGS. Or
just maybe THINGS THAT WILL WORK FOR A LITTLE WHILE BUT WILL SURELY
CAUSE DIRE PROBLEMS AT SOME LATER, INCREDIBLY INCONVENIENT TIME
RESULTING IN HORRIBLE, TRAGIC CONSEQUENCES FOR EVERYONE INVOLVED.

The script works, but the use of multiple dictionaries seems ... how
should I say ... unintuitive. Is there a better way to move values
from one dictionary to another than what I've done here? I'd
appreciate any other tips (especially those that fit into the
CATEGORIES listed above).

And I've omitted the read from file and write to file business for the
sake of brevity...

Standing on the shoulders of giants (meaning, you all!),
Mike

---
"""
Transforms a tab delimited text file to a fixed-width text file. 
"""
import copy, string
DEFFIELDSEP = '\t';
DEFLINESEP = '\n';
DEFHASHEADER = 1;                   # 1 if yes, 0 if no
import_field_order = (  'Email',
                        'First Name(s)',
                        'Last Name',
                        'Address Line 1',
                        'Address Line 2',
                        'City',
                        'State',
                        'Zip Code', );
# export columns (keys) to import columns (values)
mapped_fields = {       'EMAIL_ADDRESS_1':'Email',
                        'FIRST_NAME_1':'First Name(s)',
                        'LAST_NAME_1':'Last Name',
                        'ADDRESS_1':'Address Line 1',
                        'CITY':'City',
                        'STATE':'State',
                        'ZIP_CODE':'Zip Code', };
# required length and justification of export fields
export_fields = {       'RECORD_TYPE':[1,'left'],
                        'ACCOUNT_ID':[8,'left'],
                        'TITLE_1':[8,'left'],
                        'FIRST_NAME_1':[20,'left'],
                        'MIDDLE_NAME_1':[20,'left'],
                        'LAST_NAME_1':[20,'left'],
                        'SUFFIX_1':[8,'left'],
                        'NAME_ID_1':[2,'left'],
                        'ADDRESS_1':[256,'left'],
                        'CITY':[30,'left'],
                        'STATE':[2,'left'],
                        'ZIP_CODE':[5,'left'],
                        'ZIP_EXTENSION':[4,'left'],
                        'EMAIL_ADDRESS_1':[100,'left'],
                        'EMAIL_ADDRESS_TYPE_1':[2,'left'],  };
export_field_order = (  'RECORD_TYPE',
                        'ACCOUNT_ID',
                        'TITLE_1',
                        'FIRST_NAME_1',
                        'MIDDLE_NAME_1',
                        'LAST_NAME_1',
                        'SUFFIX_1',
                        'NAME_ID_1',
                        'ADDRESS_1',
                        'CITY',
                        'STATE',
                        'ZIP_CODE',
                        'ZIP_EXTENSION',
                        'EMAIL_ADDRESS_1',
                        'EMAIL_ADDRESS_TYPE_1',    );
def pad ( elem, max_length, justification=None):
    """
    Pad a string to a predetermined length. Truncate the string 
    if necessary. By default, left justify and pad with spaces.
    """
    padded_value = '';
    # what if elem is none?
    if (justification is 'right'):
        padded_value = elem.rjust(max_length)[:max_length]
    else:
        padded_value = elem.ljust(max_length)[:max_length]
    return padded_value
def glue ( var, ref, order ):
    """
    Build a string with padded dictionary values.
    Var and ref are dictionaries, order is a tuple describing the 
    field order.
    """
    finished_line = []
    # pad the items
    for k in var.keys():
        # pad (value, length, justification)
        var[k] = pad( str(var[k]), ref[k][0], ref[k][1])
    # build the line in order
    for elem in order:
        finished_line.append(dict.get(var, elem))
    # join the elements and return a string
    return ''.join([x for x in finished_line])
def build ( line, field_sep = DEFFIELDSEP ):
    """
    Create a single record by transposing fields via 
    multiple dictionaries.
    """
    fields = string.split(line, field_sep);
    # trim whitespace
    fields = map(string.strip, fields)
    # create dict of import column names (keys) and import 
    # data (values)
    step_one = dict([(q,a) for q,a in zip(import_field_order,
fields)])
    # create a 'throwaway' copy of the mapped fields dict to 
    # do the transform
    step_two = copy.deepcopy(mapped_fields)
    # replace the column names of the import file with the 
    # values from the import file
    for k,v in step_two.items():
        step_two[k] = dict.get(step_one, v)
    # create an empty dict with the export column names
    step_three = dict([(q,'') for q in export_field_order])
    dict.update(step_three, step_two)
    # special cases
    step_three['RECORD_TYPE'] = 'D'
    step_three['EMAIL_ADDRESS_TYPE_1'] = 'I'
    return glue(step_three, export_fields, export_field_order)
    
def decapitate(infile,has_header=DEFHASHEADER,line_sep=DEFLINESEP):
    """
    Return file without header row
    """
    if (has_header):
        sans_head = string.split(infile, line_sep)[1:]
    else:
        sans_head = string.split(infile, line_sep)
    return sans_head
    
if __name__ == '__main__':
    
    # init vars
    out_lst = []
    
    # sample data. Names have been changed ... ;)
    fin = {'data': 'Email\tFirst Name(s)\tLast Name\tAddress Line
1\tAddress Line 2\tCity\tState\tZip
Code\nme@test.com\tBill\tMunroe\t416
Main\t\tManhattan\tNY\t10471\ngodaddy@test.com\tGeorge\tScott\t111
Blue Street 2005\t\tSan
Francisco\tCA\t94144\nmmike@test.net\tMike\tBork\t22 Oak
Rd\t\tAlbuquerque\tNM\t01720-5303\npsycho@test.com\tIma\tCrazy\t1111 E
Maple Rd\t\tDenver\tCo\t80206-6139'};
    
    # get a list of lines without a header row
    lines = decapitate(fin['data']);
    
    for line in lines:
        # check for blank lines
        if (line):
            out_lst.append(build(line))
        
    print out_lst;


Relevant Pages

  • Merry Christmas!
    ... dictionaries and every variant possibility has a separate "word" entry. ... The byte string of the "word", whose length is specified by a four ... match is found for a source byte sequence in the dictionary. ...
    (rec.arts.sf.written)
  • Merry Christmas! Linux RULES! New applications to develop!
    ... dictionaries and every variant possibility has a separate "word" ... Each entry in the dictionary contains: ... The byte string of the "word", whose length is specified by a four ... addresses whose entry is selected by the first byte of the sequence. ...
    (comp.os.linux.misc)
  • Re: CollectionsUtil class ... usage?
    ... Add(Key As String, Item As Parameter) ... Private m_oHash As Hashtable ... except that class' Enumerator would only allow iteration over the keys, ... generic collection - they have dictionaries, lists, sorted list, etc. ...
    (microsoft.public.dotnet.general)
  • Re: non-static method gotcha
    ... enum possibilities are their own little classes. ... @param pad URL of the pad xml file we are ... String submit ... The error message spoke of non-static method in a static context because the only way for a nested class to access a private method of the outer class is via an outer reference, in this case a non-static one. ...
    (comp.lang.java.help)
  • Re: Collections challenge (MRU)
    ... At first glance List(Of String) might work. ... how about one of the dictionaries? ... Moya ... Implement a full featured MRU list using ...
    (microsoft.public.dotnet.languages.vb)