Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

Looking for FASTA format of DNA of a gene amongst various species - (May/27/2010 )

Hi

I need the DNA sequences (in FASTA format) of a specific gene (the tissue transglutaminase gene) for as many species as possible.

I've been able to find it for a few species, on the Entrez Gene database and on Entrez Nucleotide database.

Specifically the way I find these sequences is by searching for the gene name (TG2, which is the abbreviation for my gene of interest) in the search engines of these databases.

The problem is I only found the TG2 sequence for a handful of species (about 7), while the gene is supposed to be spread in almost all species.


Is there a more comprehensive way for finding the FASTA sequence of a gene throughout all species?


Thanks.

-humalog-

EnsEMBL may help - otherwise GenBank

-perlmunky-

Go to the Genbank protein page for one of your examples. Click on the "Blink" link. This will do a Genbank blast search for similar proteins, and list them by similarity. The name listed in the annotation is not part of this search. You could also look for CDD matches (conserved domains) which might be present and find proteins containing those domains.

-phage434-

Try searching for "transglutaminase" in NCBI's Protein Clusters database. If you find a cluster you're interested in, say PRK03187, which has 39 proteins in the cluster, you can get all the sequences in FASTA format by going to the bottom of the page, and selecting "Protein FASTA" from the Display menu.

-HomeBrew-

humalog on May 27 2010, 11:02 AM said:

Hi

I need the DNA sequences (in FASTA format) of a specific gene (the tissue transglutaminase gene) for as many species as possible.

I've been able to find it for a few species, on the Entrez Gene database and on Entrez Nucleotide database.

Specifically the way I find these sequences is by searching for the gene name (TG2, which is the abbreviation for my gene of interest) in the search engines of these databases.

The problem is I only found the TG2 sequence for a handful of species (about 7), while the gene is supposed to be spread in almost all species.


Is there a more comprehensive way for finding the FASTA sequence of a gene throughout all species?


Thanks.


Just do a search for the gene name (not the abbrev.) in the NCBI Protein databse - here. You can then select each one you want and go to the Display drop-down menu and choose FASTA and you can also choose to do a BLASTp search from this point too...

Oops you said DNA (nucleotide), you should be able to just do the same with the NCBI gene database and do a BLASTn...

-guyleonard-

Thanks
Most of you thought I'm looking for amino acid sequence but some answers are helpful.

I wanted to ask: when I am in the EnsEMBL website and I search for a gene and get the results, how can I view the FASTA sequences of these?
thanks

-humalog-

humalog on Jul 5 2010, 01:37 PM said:

I wanted to ask: when I am in the EnsEMBL website and I search for a gene and get the results, how can I view the FASTA sequences of these?

You click on Export data in the left menu. You can select FASTA format there.

-Trof-

Thank you Trof

-humalog-