Re: Parsing file
- From: rcoops@xxxxxxxxx (Rob Coops)
- Date: Thu, 2 Jun 2011 14:44:49 +0200
On Thu, Jun 2, 2011 at 1:28 PM, venkates <venkates@xxxxxxxxxx> wrote:
On 6/2/2011 12:46 PM, John SJ Anderson wrote:This is a simple and ugly way of parsing your file:
On Thu, Jun 2, 2011 at 06:41, venkates<venkates@xxxxxxxxxx> wrote:help needed.
Hi,[ snip ]
I want to parse a file with contents that looks as follows:
Have you considered using this module? ->
<http://search.cpan.org/dist/BioPerl/Bio/SeqIO/kegg.pm>
Alternatively, I think somebody on the BioPerl mailing list was
working on another KEGG parser...
chrs,
j.
I am doing this as an exercise to learn parsing techniques so guidance
Aravind
--
To unsubscribe, e-mail: beginners-unsubscribe@xxxxxxxx
For additional commands, e-mail: beginners-help@xxxxxxxx
http://learn.perl.org/
use strict;
use warnings;
use Carp;
use Data::Dumper;
my $set = parse("ko");
sub parse {
my $keggFile = shift;
my $keggHash;
my $counter = 1;
open my $fh, '<', $keggFile || croak ("Cannot open file '$keggFile': $!");
while ( <$fh> ) {
chomp;
if ( $_ =~ m!///! ) {
$counter++;
next;
}
if ( $_ =~ /^ENTRY\s+(.+?)\s/sm ) { ${$keggHash}{$counter} = { 'ENTRY' =>
$1 }; }
if ( $_ =~ /^NAME\s+(.*)$/sm ) {
my $temp = $1;
$temp =~ s/,\s/,/g;
my @names = split /,/, $temp;
push @{${$keggHash}{$counter}{'NAME'}}, @names;
}
}
close $fh;
print Dumper $keggHash;
}
The output being:
$VAR1 = {
'1' => {
'NAME' => [
'E1.1.1.1',
'adh'
],
'ENTRY' => 'K00001'
},
'3' => {
'NAME' => [
'U18snoRNA',
'snR18'
],
'ENTRY' => 'K14866'
},
'2' => {
'NAME' => [
'U14snoRNA',
'snR128'
],
'ENTRY' => 'K14865'
}
};
Which to me looks sort of like what you are looking for.
The main thing I did was read the file one line at a time to prevent a
unexpectedly large file from causing memory issues on your machine (in the
end the structure that you are building will cause enough issues
when handling a large file.
You already dealt with the Entry bit so I'll leave that open though I
slightly changed the regex but nothing spectacular there.
The Name bit is simple as I just pull out all of them then then remove all
spaces and split them into an array, feed the array to the hash and hop time
for the next step which is up to you ;-)
I hope it helps you a bit, regards,
Rob
- Follow-Ups:
- Re: Parsing file
- From: venkates
- Re: Parsing file
- References:
- Parsing file
- From: venkates
- Re: Parsing file
- From: John SJ Anderson
- Re: Parsing file
- From: venkates
- Parsing file
- Prev by Date: RE: regexp validation (arbitrary code execution) (regexp injection)
- Next by Date: Re: regexp validation (arbitrary code execution) (regexp injection)
- Previous by thread: Re: Parsing file
- Next by thread: Re: Parsing file
- Index(es):
Relevant Pages
|