Re: Parsing a chemical formal

From: Luotao Fu (luotao_at_milliways.kammer.uni-hannover.de)
Date: 02/25/05


Date: 25 Feb 2005 15:20:39 GMT

Hi,
GreenLeaf <newspost@kohombanDELETE.net> schrieb:
> Abigail wrote:
>
>> I wouldn't use split, just parse what you want to keep. What you want is
>> very simple: exactly one capital letter, followed by zero or more lower
>> case letters, followed by zero or more numbers. Written as a regex, this
>> is:
>

@Abigail:
fancy idea! Now the famous Question to myself: If this is simple, why
haven't I gotten it myself? ;-) works like a charm, thanx a lot.

> to OP:
>
> If this is an exercise, considering the real world scenario, you might
> want to consider the rule that an element name is always exactly one
> capital letter followed by _exactly zero or one simple letter_, with the
> exception of elements that start with Uu. I'm assuming here that yours
> is a program for learning, since you admitted to write it 'since days'
>:). Considering these facts will make your re more robust.

;-) Actually it's not an exercise, the perlscript should format Database
Files for my C Programm, which handles with CT Scanners. On the other side,
I'm indeed learning Perl though writing this. I'd also had written it in C,
but I chose perl to refresh my Memory on RegExp.

>
> You might also want to consider the radicals (such as hydroxyl -OH)
> because they are sure to lead to incorrect results if you just ignore
> parenthesis: for instance Fe(OH)3. You can do this by first capturing
> parenthesis and numbers that follow, then running the same simple rules
> that you used to capture no-parenthesis case for the token within each
> set of parenthesis. Something along the line of
>
> my @atoms = /((?:\(.+\)|Uu.|[A-Z][a-z]?)\d*)/g;
>
> would work here.
>

Thanx for the advise, I didn't think about this one. However it might
not be a serious problem for me. We have limited the Input on only Stuffs
containing the first 100 Elements on the periodic Table. Which is more
important, I define the formatrules of the Inputfiles. I'll notice
in the Readme, that such formats are forbidden :-).

> Since Abigail's post clearly gave you almost everything you need to
> know, it would be quite straightforward to implement these simple
> changes. Good luck! :)
>
> Hope this helps,

Thanx a lot
> sat

Cheers
Luotao Fu



Relevant Pages

  • Re: Pickle Problem
    ... Here's the code I'm stuck on ... format, but you don't open the file in binary mode. ... Sure enough, it was the parenthesis. ... about the binary format for opening files. ...
    (comp.lang.python)
  • Re: How do i find and replace words in parenthesis?
    ... it will say Format: Font ... find and replace with special characters, ... letters and numbers enclosed in parenthesis (including the ...
    (microsoft.public.word.docmanagement)
  • Re: Phone number format
    ... > Access calls it a "mask" but in Filemaker8, is it possible to format ... > a field to make phone numbers appear with parenthesis and hyphen in ... This calculation comes from the custom functions library at ... If there are too few characters, it formats the text in red. ...
    (comp.databases.filemaker)
  • RE: Negative numbers in parenthesis if not shown in drop down box
    ... I changed the settings in control panel. ... numbers format still doesn't have the and will not allow me to format ... "Pandorad" wrote: ... > box does not show negative numbers in parenthesis. ...
    (microsoft.public.excel.worksheet.functions)
  • Re: Date and Time query
    ... I need to separate the date into the format, yyyymmdd, and the time ... hhnn using an append query to another table I have ... An Access Date/Time value stores an exact point in time, ... notification that a parenthesis is not closed. ...
    (microsoft.public.access.queries)