Re: Parsing file



On Thu, Jun 2, 2011 at 1:28 PM, venkates <venkates@xxxxxxxxxx> wrote:

On 6/2/2011 12:46 PM, John SJ Anderson wrote:

On Thu, Jun 2, 2011 at 06:41, venkates<venkates@xxxxxxxxxx> wrote:

Hi,

I want to parse a file with contents that looks as follows:

[ snip ]

Have you considered using this module? ->
<http://search.cpan.org/dist/BioPerl/Bio/SeqIO/kegg.pm>

Alternatively, I think somebody on the BioPerl mailing list was
working on another KEGG parser...

chrs,
j.

I am doing this as an exercise to learn parsing techniques so guidance
help needed.

Aravind



--
To unsubscribe, e-mail: beginners-unsubscribe@xxxxxxxx
For additional commands, e-mail: beginners-help@xxxxxxxx
http://learn.perl.org/



This is a simple and ugly way of parsing your file:

use strict;
use warnings;
use Carp;
use Data::Dumper;

my $set = parse("ko");

sub parse {
my $keggFile = shift;
my $keggHash;

my $counter = 1;

open my $fh, '<', $keggFile || croak ("Cannot open file '$keggFile': $!");
while ( <$fh> ) {
chomp;
if ( $_ =~ m!///! ) {
$counter++;
next;
}

if ( $_ =~ /^ENTRY\s+(.+?)\s/sm ) { ${$keggHash}{$counter} = { 'ENTRY' =>
$1 }; }
if ( $_ =~ /^NAME\s+(.*)$/sm ) {
my $temp = $1;
$temp =~ s/,\s/,/g;
my @names = split /,/, $temp;
push @{${$keggHash}{$counter}{'NAME'}}, @names;
}
}
close $fh;
print Dumper $keggHash;
}

The output being:

$VAR1 = {
'1' => {
'NAME' => [
'E1.1.1.1',
'adh'
],
'ENTRY' => 'K00001'
},
'3' => {
'NAME' => [
'U18snoRNA',
'snR18'
],
'ENTRY' => 'K14866'
},
'2' => {
'NAME' => [
'U14snoRNA',
'snR128'
],
'ENTRY' => 'K14865'
}
};

Which to me looks sort of like what you are looking for.
The main thing I did was read the file one line at a time to prevent a
unexpectedly large file from causing memory issues on your machine (in the
end the structure that you are building will cause enough issues
when handling a large file.

You already dealt with the Entry bit so I'll leave that open though I
slightly changed the regex but nothing spectacular there.
The Name bit is simple as I just pull out all of them then then remove all
spaces and split them into an array, feed the array to the hash and hop time
for the next step which is up to you ;-)

I hope it helps you a bit, regards,

Rob


Relevant Pages

  • Re: Parsing a chemical formal
    ... > portion of single atoms ... > Beginning letter of a element ist written in upper case. ... > capital letter will be pushed in a temporary Array. ... It can parse formulas like the ...
    (comp.lang.perl.misc)
  • Re: Advanced? Parsing Methods
    ... >>and writes new records to a table in a SQL Server 2000 database. ... >> Would it be faster to use the Splitfunction to parse each line (using ... >> a comma delimiter) and pass the parsed data to an array? ... that I can select a specific item within the array. ...
    (microsoft.public.access.modulesdaovba)
  • Re: Parsing Data, Storing into an array, Infinite Backslashes
    ... > I am using this function to parse data I have stored in an array. ... It looks like you are setting the variable qval to the string ... Hint: it ...
    (comp.lang.python)
  • Re: Reading data from port
    ... I suspect the problem may be in your receive and parsing code, ... and parse out of that array. ... ReadByte method instead of the Read method... ...
    (microsoft.public.dotnet.general)
  • Re: How to extract domain from string with regex?
    ... Given that precondition, I wouldn't use regex: ... The URL to parse ... Otherwise an associative array is returned, ...
    (comp.lang.php)