Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Extracting sequence from Genbank - (Mar/09/2006 )

Hi
I am trying to extract sequence from genbank by the folllowing code and do the get following errors.

CODE:
package Bio::Perl;
use Bio::Perl;
$seq = get_sequence ('genbank',"K01298");
write_sequence (">hum",'fasta',$seq);

Output
-------------------- WARNING ---------------------
MSG: HTTP/1.1 404 Not Found
Connection: close
Date: Thu, 09 Mar 2006 10:22:05 GMT
Server: Nde
Content-Type: text/html; charset=ISO-8859-1
Client-Date: Thu, 09 Mar 2006 10:31:10 GMT
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
Title: NCBI/xpubmed1 - WWW Error 404 Diagnostic

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html>
<head>
<title>NCBI/xpubmed1 - WWW Error 404 Diagnostic</title>
<style type="text/css">
h1.error {color: red; font-size: 40pt}
div.diags {text-indent: 0.5in }
</style>
</head>
<body>
<h1>Can't find the requested web page.</h1>

<p>The page you requested ( h t t p : / / w w w . n c b i . n l m . n i h . g o v / h t b i n - p o s t / E n t r e z / q u e r y ? d b = n & a m p d o p t = g e n b a n k & a m p f o r m = 1 & a m p t i t l e = n o & a m p t e r m = K 0 1 2 9 8 ) could not be found on our web server.
This is usually caused by a error in the web request;
however, it could also be caused by a problem on our server.
You might be able to find what you want via
<a href="/entrez/query.fcgi?db=ncbisearch">NCBI Site Search</a>. </p>
<p>If you know what the request should be, please check it and correct it
if necessary. If not, you may be able to find the desired page by browsing
our site at <a href="http://www.ncbi.nlm.nih.gov">http://www.ncbi.nlm.nih.gov</a>.</p>
<p> If you need additional assistance, send e-mail to webadmin@ncbi.nlm.nih.gov. Please include
the following information in your problem report.</p>

<div class="diags">Error: 404</div>

<div class="diags">URL: h t t p : / / w w w . n c b i . n l m . n i h . g o v / h t b i n - p o s t / E n t r e z / q u e r y ? d b = n & a m p d o p t = g e n b a n k & a m p f o r m = 1 & a m p t i t l e = n o & a m p t e r m = K 0 1 2 9 8 </div>

<div class="diags">Server: xpubmed1</div>

<div class="diags">Time: Thu Mar 9 05:22:05 EST 2006</div>


<p>NOTE: The above may be an internal URL which differs from the one you used to address the page.</p><p>Rev. 02/15/05</p>


<p>
<a href="http://validator.w3.org/check?uri=referer"><img src="http://www.w3.org/Icons/valid-xhtml11" alt="Valid XHTML 1.1!" height="31" width="88" /></a>
</p></body>
</html>


---------------------------------------------------
------------- EXCEPTION -------------

MSG: WebDBSeqI Request Error

STACK Bio:biggrin.gifB::WebDBSeqI::_request /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:523
STACK Bio:biggrin.gifB::WebDBSeqI::get_seq_stream /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:375
STACK Bio:biggrin.gifB::NCBIHelper::get_Stream_by_acc /usr/lib/perl5/site_perl/5.8.0/Bio/DB/NCBIHelper.pm:466
STACK Bio:biggrin.gifB::WebDBSeqI::get_Seq_by_acc /usr/lib/perl5/site_perl/5.8.0/Bio/DB/WebDBSeqI.pm:161
STACK Bio::Perl::get_sequence /usr/lib/perl5/site_perl/5.8.0/Bio/Perl.pm:377
STACK toplevel 1.pl:3

--------------------------------------



Sanjib Gupta

-sanjibgupta-

That's a standard "file not found" response. It means that whatever query was sent to NCBI by the get_sequence method was incorrect.

If all you're trying to do is retrieve a sequence by accession number and save a copy locally in fasta format, try it this way:

CODE
#!/usr/bin/perl -w
use strict;
use LWP::Simple;

my $db = 'nucleotide';
my $acc = 'K01298';
my $form = 'fasta';
my $type = 't';
my $seqfile = $acc . ".$form";

my $link = 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?' .
'&db=' . $db .
'&val=' . $acc .
'&dopt=' . $form .
'&sendto=' . $type;

&get_gb_seq($link);

sub get_gb_seq {
    my $link = shift;
    my $seq = get $link;
    open (SEQ, ">$seqfile") or die "Can\'t open $seqfile: $!\n";

    unless (defined $seq) {
        warn "Having trouble contacting GenBank.  Sleeping once...\n";
        sleep 3;
        $seq = get $link;
        unless (defined $seq) {
            warn "Still having trouble contacting GenBank.  Sleeping twice...\n";
            sleep 5;
            $seq = get $link;
        }
    }
    if (!(defined $seq)) {
        die "Three attempts to retrieve sequence data for $acc were unsuccessful...\n";
    } else {
    chomp ($seq);
    print SEQ $seq;
    close (SEQ);
    }
}

print "Done.\n";

-HomeBrew-

Thanks it works fine
sanjibgupta

-sanjibgupta-