Case tagging and python



Hi,

I'm relatively new to programming in general, and totally new to python,
and I've been told that this language is particularly good for what I
need to do. Let me explain.
I have a large corpus of English text, in the form of several files.

First of all I would like to scan each file. Then, for each word I find,
I'd like to examine its case status, and write the (lower case) word back
to another text file - with, appended, a tag stating the case it had in the original file.

An example. Suppose we have three possible "case conditions"
-all lowercase
-all uppercase
-initial uppercase only

Three corresponding tags for each of these might be, respectively:
-nocap
-allcaps
-cap

Therefore, given the string

"The Chairman of BP was asleep"

I would like to produce

"the/cap chairman/cap of/nocap /bp/allcaps was/nocap /asleep/nocap"

and writing this into a file.


I have the following algorithm in mind:

-open input file
-open output file
-get line of text
-split line into words
-for each word
-tag = checkCase(word)
-newword = lowercase(word) + append(tag)
rejoin words into line
write line into output file

Now, I managed to write the following initial code

for s in file:
lines += 1
if lines % 1000 == 0:
print '%d lines' % We print the total lines
sent = s.split() #split string by spaces
#...


But then I don't quite know what would be the fastest/best way to do this. Could I use the join function to reform the string? And, regarding the casetest() function, what do you suggest to do? Should I test each character of each word or there are faster methods?

Thanks very much,

F.



.



Relevant Pages

  • Re: glob in specific directory
    ... > I am writing a DTS ... I have to use full path names because I don't run the perl ... It's a user supplied string run by the SQL server...did you remember ... > The output file is created in $mydirectory with no problem, ...
    (comp.lang.perl.misc)
  • RE: Open file, Copy a column and paste this into a new Excel doc - Pl
    ... Code is assumed to be in in your output file. ... Function FileNameOnlyAs String ... "Harish Mohanbabu" wrote: ... > 3) And paste them all into a single Excel file ...
    (microsoft.public.excel.programming)
  • Re: Again: Please hear my plea: print without softspace
    ... the second parameter of the join function ... I really can't discuss the implementation problems, ... On the aesthetics, ... better string methods". ...
    (comp.lang.python)
  • Re: search and select from a textfile
    ... Create an output file. ... Create a stringlist to use as a buffer. ... Check the current line for the trigger search string, ... We are at the end of a section, now check if the foundit flag is set. ...
    (comp.lang.pascal.delphi.misc)
  • Re: Lowercase
    ... It adds a useful function (Translate) and, like the Cooper Redwine version, doesn't depend on IACHAR and friends, which have a degree of processor dependence. ... It may be a little slower than others, but string functions are not usually time critical for me. ...
    (comp.lang.fortran)