Re: Parsing a chemical formal
From: Luotao Fu (luotao_at_milliways.kammer.uni-hannover.de)
Date: 02/25/05
- Next message: Ian: "Do sparse arrays take up full memory?"
- Previous message: Martin Kissner: "Re: 'warn' produces additional output"
- In reply to: GreenLeaf: "Re: Parsing a chemical formal"
- Next in thread: GreenLeaf: "Re: Parsing a chemical formal"
- Reply: GreenLeaf: "Re: Parsing a chemical formal"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 25 Feb 2005 15:20:39 GMT
Hi,
GreenLeaf <newspost@kohombanDELETE.net> schrieb:
> Abigail wrote:
>
>> I wouldn't use split, just parse what you want to keep. What you want is
>> very simple: exactly one capital letter, followed by zero or more lower
>> case letters, followed by zero or more numbers. Written as a regex, this
>> is:
>
@Abigail:
fancy idea! Now the famous Question to myself: If this is simple, why
haven't I gotten it myself? ;-) works like a charm, thanx a lot.
> to OP:
>
> If this is an exercise, considering the real world scenario, you might
> want to consider the rule that an element name is always exactly one
> capital letter followed by _exactly zero or one simple letter_, with the
> exception of elements that start with Uu. I'm assuming here that yours
> is a program for learning, since you admitted to write it 'since days'
>:). Considering these facts will make your re more robust.
;-) Actually it's not an exercise, the perlscript should format Database
Files for my C Programm, which handles with CT Scanners. On the other side,
I'm indeed learning Perl though writing this. I'd also had written it in C,
but I chose perl to refresh my Memory on RegExp.
>
> You might also want to consider the radicals (such as hydroxyl -OH)
> because they are sure to lead to incorrect results if you just ignore
> parenthesis: for instance Fe(OH)3. You can do this by first capturing
> parenthesis and numbers that follow, then running the same simple rules
> that you used to capture no-parenthesis case for the token within each
> set of parenthesis. Something along the line of
>
> my @atoms = /((?:\(.+\)|Uu.|[A-Z][a-z]?)\d*)/g;
>
> would work here.
>
Thanx for the advise, I didn't think about this one. However it might
not be a serious problem for me. We have limited the Input on only Stuffs
containing the first 100 Elements on the periodic Table. Which is more
important, I define the formatrules of the Inputfiles. I'll notice
in the Readme, that such formats are forbidden :-).
> Since Abigail's post clearly gave you almost everything you need to
> know, it would be quite straightforward to implement these simple
> changes. Good luck! :)
>
> Hope this helps,
Thanx a lot
> sat
Cheers
Luotao Fu
- Next message: Ian: "Do sparse arrays take up full memory?"
- Previous message: Martin Kissner: "Re: 'warn' produces additional output"
- In reply to: GreenLeaf: "Re: Parsing a chemical formal"
- Next in thread: GreenLeaf: "Re: Parsing a chemical formal"
- Reply: GreenLeaf: "Re: Parsing a chemical formal"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|