Re: Parsing a chemical formal

From: Luotao Fu (luotao_at_milliways.kammer.uni-hannover.de)
Date: 02/25/05


Date: 25 Feb 2005 15:20:39 GMT

Hi,
GreenLeaf <newspost@kohombanDELETE.net> schrieb:
> Abigail wrote:
>
>> I wouldn't use split, just parse what you want to keep. What you want is
>> very simple: exactly one capital letter, followed by zero or more lower
>> case letters, followed by zero or more numbers. Written as a regex, this
>> is:
>

@Abigail:
fancy idea! Now the famous Question to myself: If this is simple, why
haven't I gotten it myself? ;-) works like a charm, thanx a lot.

> to OP:
>
> If this is an exercise, considering the real world scenario, you might
> want to consider the rule that an element name is always exactly one
> capital letter followed by _exactly zero or one simple letter_, with the
> exception of elements that start with Uu. I'm assuming here that yours
> is a program for learning, since you admitted to write it 'since days'
>:). Considering these facts will make your re more robust.

;-) Actually it's not an exercise, the perlscript should format Database
Files for my C Programm, which handles with CT Scanners. On the other side,
I'm indeed learning Perl though writing this. I'd also had written it in C,
but I chose perl to refresh my Memory on RegExp.

>
> You might also want to consider the radicals (such as hydroxyl -OH)
> because they are sure to lead to incorrect results if you just ignore
> parenthesis: for instance Fe(OH)3. You can do this by first capturing
> parenthesis and numbers that follow, then running the same simple rules
> that you used to capture no-parenthesis case for the token within each
> set of parenthesis. Something along the line of
>
> my @atoms = /((?:\(.+\)|Uu.|[A-Z][a-z]?)\d*)/g;
>
> would work here.
>

Thanx for the advise, I didn't think about this one. However it might
not be a serious problem for me. We have limited the Input on only Stuffs
containing the first 100 Elements on the periodic Table. Which is more
important, I define the formatrules of the Inputfiles. I'll notice
in the Readme, that such formats are forbidden :-).

> Since Abigail's post clearly gave you almost everything you need to
> know, it would be quite straightforward to implement these simple
> changes. Good luck! :)
>
> Hope this helps,

Thanx a lot
> sat

Cheers
Luotao Fu



Relevant Pages

  • Re: Pickle Problem
    ... Here's the code I'm stuck on ... format, but you don't open the file in binary mode. ... Sure enough, it was the parenthesis. ... about the binary format for opening files. ...
    (comp.lang.python)
  • Re: How do i find and replace words in parenthesis?
    ... it will say Format: Font ... find and replace with special characters, ... letters and numbers enclosed in parenthesis (including the ...
    (microsoft.public.word.docmanagement)
  • Re: reading file to list
    ... data for his programs in the format ... parenthesis in your data file, making it larger for no good reason, ... no good reason, then it would make no sense, I agree. ... The lisp reader by ...
    (comp.lang.lisp)
  • Re: Phone number format
    ... > Access calls it a "mask" but in Filemaker8, is it possible to format ... > a field to make phone numbers appear with parenthesis and hyphen in ... This calculation comes from the custom functions library at ... If there are too few characters, it formats the text in red. ...
    (comp.databases.filemaker)