Database filtering - (Sep/03/2007 )
hello everyone,
does anybody know how to filter databases? i have databases with maize sequences in a fasta format but i'm only interested in those from endosperm, the format the databases have is
>id acc
sequence
how could i extract only those that mention endosperm in the id? thank you for your help.
-rodpck-
QUOTE (rodpck @ Sep 3 2007, 11:58 AM)
hello everyone,
does anybody know how to filter databases? i have databases with maize sequences in a fasta format but i'm only interested in those from endosperm, the format the databases have is
>id acc
sequence
how could i extract only those that mention endosperm in the id? thank you for your help.
does anybody know how to filter databases? i have databases with maize sequences in a fasta format but i'm only interested in those from endosperm, the format the databases have is
>id acc
sequence
how could i extract only those that mention endosperm in the id? thank you for your help.
use perl. Something like the script below should work ... I have not tested this one and I am a bit frazzled from thesis writing so it may not work.
CODE
#!/usr/bin/perl
use warnings;
use strict;
my $fileName = $ARGV[0] or die "I need the location and name of the FASTA file";
open F, $fileName or die $!;
my %hash = ();
my $cid = "";
while(<F>) {
if ( /^>/ ) {
$cid = $_;
next;
}
if ($cid =~ /endosperm/ig) {
$hash{$cid} = $_;
}
}
for (keys %hash){
print "$_\n$hash{$_}\n";
}
use warnings;
use strict;
my $fileName = $ARGV[0] or die "I need the location and name of the FASTA file";
open F, $fileName or die $!;
my %hash = ();
my $cid = "";
while(<F>) {
if ( /^>/ ) {
$cid = $_;
next;
}
if ($cid =~ /endosperm/ig) {
$hash{$cid} = $_;
}
}
for (keys %hash){
print "$_\n$hash{$_}\n";
}
-perlmunky-
hi perlmunky,
the script worked really nicely. thank you very much. i hope you finish your phd thesis soon, it's a pain in the ass. i am sure my institute needs good bioinformatics people, just in case you are interested moving to México...hahaha.
thanks again.
-rodpck-