spell checking...capitalization of proper names



I am looking for an efficient way to check for capitalization of
proper names. I am currently using a perl script with quite a lot of
regexes and it works fine, but.... Rather than re-invent the wheel, I
thought there probably was something already out there, or maybe just
a better method of doing it. I do this each day and find that I am
adding 15-30 new words to the regex list each day.

Currently the script has 1500 regexes looking for individual words or
2-3 word phrases. (New York City, for example)

$data =~ s/[ ][Nn]ew([ ]*)(\[\d+:\d+:\d+][ ]*)*[Yy]ork([
]*)(\[\d+:\d+:\d+][ ]*)*[Cc]ity/ New$1$2York$3$4City/msg;


The text is actually a television newscast closed captioning script
that is formatted like the following lines. 70% local Kansas City, MO
news and 30% national US news.

<snip>
[17:16:32] Since the first gulf war --
[17:16:33] Cliff Standby has been
[17:16:35] receiving knee treatment at Walter
[17:16:36] Reed Army Medical
[17:16:38] Center. And within the last
<snip>

The goofy looking first regex below will look for 'walter reed' either
on 1 line or split across two lines. The remainder of them just simply
check for a certain word and capitilize them.

$data is the text of the entire file. The files average 500-800 lines
of text.

<snip>
$data =~ s/[ ][Ww]alter([ ]*)(\[\d+:\d+:\d+][ ]*)*[Rr]eed/
Walter$1$2Reed/msg;
$data =~ s/[ ]ward([ .!?:,;\'-])/ Ward$1/msg; $data =~ s/[ ]warner([
..!?:,;\'-])/ Warner$1/msg; $data =~ s/[ ]warren([ .!?:,;\'-])/
Warren$1/msg; <snip>

I have just begun to look at
http://search.cpan.org/~hank/Text-Aspell/Aspell.pm

It appears that it will just take a string, a word at a time and
check, then suggest, correct or incorrect. Maybe I am not grasping
it's capabilities.


I would appreciate any suggestions.

jbl
.



Relevant Pages

  • Re: RFC: Building the Perfect Tabbed Pane (an tutorial article)
    ... manipulation. ... that logic must reside in a script block ... head shouldn't be an issue as long as the script is after the closing ...
    (comp.lang.javascript)
  • Re: Javascript: string detection
    ... this script doesn' work, probably, because of the wrong syntax. ... A complete syntax check is better left to the server where existing code is more readily available and not a burden on the client. ... Be aware that even the regular expression above is restrictive as it won't accept literal IPv6 addresses, those that contain display names, or comments in some locations. ...
    (comp.lang.javascript)
  • Re: Grep and mv
    ... Or just the ones you have created with the script? ... > My silly little grep script extracts the names as ... > The error messages I get when running this script are: ...
    (comp.unix.shell)
  • Re: Unix scripts
    ... operations at the beginning of the script while calling it after ... > If no file is given on the command line or all file lines were read, ... > The system calls echo and printf are frequently used for the file line ...
    (comp.unix.programmer)
  • Re: Need Help with my script logic
    ... Below is the complete script. ... the hidden inputs get populated after the following on the form: ... the other relevant fields are pretty much everything that takes data ...
    (comp.lang.javascript)