How to use NCBI ? - (May/23/2012 )
Hallo all,
I have a problem with NCBI.
I have a sequence and I want to know what it might be.
Now what I do is the following:
I go to the NCBI website and then I blast this.
When I do this, I get a page with some results (see picture 1)
I see a lot of names with results.
But whats next?
What I do, is check the best result, look for the position of that gene.
Eg: here it states that the gene starts at position 24.409 (see picture 2)
But how do I find that gene without the need to manually start looking for it on the page that has the sequence of the bacterium with the sequence I am looking for? Or is this the only way to do it?... so do I need to manually look up position 24.409 in the genome of that bacterium to fine my gene?(see picture 3)? Or is there an easier method?
I hope my question is a bit clear?
Another question: what if you have more then 1 99% similarity result? Here I have more... I cant look them all up manually? This will take days?
+ I did it with 1 gene , checked 2 results and both results give me another gene (1 says its a phage DNA package gene and 1 says its a bacerium gene).
Hi lyok
Mabe you're looking for a blast parser like this http://kirill-kryuko...s/blast-parser/ to get chromosome locations, in a simple output. And also see this post http://www.biostars....rom-the-genome/
I think you should consider the E-value instead of similarity, for homology prediction. Also if you do not need to look for regulatory elements in DNA, it's easier to work with proteins, they have signature conservation and protein families, that help you to infer protein function for unknown sequences.
Felipillo on Wed May 23 18:36:42 2012 said:
Hi lyok
Mabe you're looking for a blast parser like this http://kirill-kryuko...s/blast-parser/ to get chromosome locations, in a simple output. And also see this post http://www.biostars....rom-the-genome/
I think you should consider the E-value instead of similarity, for homology prediction. Also if you do not need to look for regulatory elements in DNA, it's easier to work with proteins, they have signature conservation and protein families, that help you to infer protein function for unknown sequences.
Ok, thanks for the links.
Why should I use the E-value in stead?
BTW: in my specific situation here, the e-value is the same for the 99% values (or at least for some).
And to be honest: I am not that inclined to check all the "high scoring" genes.. it would take me days.. I just checked 1 or 2 each time.
I am not sure what you mean with: "Also if you do not need to look for regulatory elements in DNA, it's easier to work with proteins, they have signature conservation and protein families, that help you to infer protein function for unknown sequences.".
How can I work with a protein if I just have a DNA sequence?
Can I simple translate the DNA sequence into a protein? But how can I do this, because I cant know in advance what part of the sequence is coding for the protein, right?
The E- value tells you the probability to get a random match, so you must consider the lowest values.
If you wanna get blast results for proteins, you should use blastx against swissprot.
This online training material could help you a lot, to grasp the blast basics http://bioinfbook.or...pter4/index.php from
Bioinformatics and Functional Genomics book, one of the best Bioinformatics books, I have read.
O k thanks.
But in the end, I am just looking for genes, not so much the proteints themself.
So you would recommend that book? Is it more about statistics and bioinformatics in general or also about the practical use of databases?
And what is that last link? it gets me on a twitter page?