Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Untranslated Region (UTR) - (Nov/27/2007 )

Hello.

I ask myself, if the UTR's of all mRNA's of one gene are always the same (I think so).
The other thing. What ways are the to fetch the complete UTR sequence of one gene.
One pretty easy way I know of is Biomart (Martview), where one can directly select it for downloading.
Another I read about is the UCSC browser. But I am not very familiar with that browser, nor what identifiers it uses.
Is there an easy way to get those UTR sequences from ENTREZ (as my identifiers are Entrez gene id's)?
I am not talking about looking up one or two UTR's, but downloading a complete set of one species.
The only entrez way I found so far, is to download one (is one enough? see my first question) refeseq transcript sequence for that specific Entrez gene id, and then take the sequence between the last exon and the poly A tail.
Is this procedure correct? Is there an easier ENTREZ way, like the Biomart one?

Kai

-monstercoccus-

QUOTE (monstercoccus @ Nov 27 2007, 09:10 PM)
Hello.

I ask myself, if the UTR's of all mRNA's of one gene are always the same (I think so).
The other thing. What ways are the to fetch the complete UTR sequence of one gene.
One pretty easy way I know of is Biomart (Martview), where one can directly select it for downloading.
Another I read about is the UCSC browser. But I am not very familiar with that browser, nor what identifiers it uses.
Is there an easy way to get those UTR sequences from ENTREZ (as my identifiers are Entrez gene id's)?
I am not talking about looking up one or two UTR's, but downloading a complete set of one species.
The only entrez way I found so far, is to download one (is one enough? see my first question) refeseq transcript sequence for that specific Entrez gene id, and then take the sequence between the last exon and the poly A tail.
Is this procedure correct? Is there an easier ENTREZ way, like the Biomart one?

Kai


I can't help you for the retrieval of the sequences, but what I can say you is that not all transcripts of a certain gene will carry the same UTR. For the 3' UTR you can have different polyadenylation sites and I also know of a gene that shows alternative splicing in it's 5'UTR, but I do think these cases are rather exceptional but still they exist.

-dpo-

QUOTE
I can't help you for the retrieval of the sequences, but what I can say you is that not all transcripts of a certain gene will carry the same UTR. For the 3' UTR you can have different polyadenylation sites and I also know of a gene that shows alternative splicing in it's 5'UTR, but I do think these cases are rather exceptional but still they exist.


Thanks for the hint. Do you know some gene symbols of such exceptions?
I will check the just downloaded Ensembl data for that.

-monstercoccus-

Hi yes it is quite easy to download UTR sequences for all genes from UCSC genome browser using the 'table browser' page
goto http://genome.ucsc.edu/
click on tables on the top row
now choose your organism and assembly (for me Human march 2006)

if I wanted refseq UTRs i would then choose group "Genes and gene predictions tracks"
track "Refseq Genes"
table "refGene"
ensure region is set to "genome"
change output format to "custom track"
click "get output"

on the next screen you can choose the name of your custom track eg "RefSeq_UTRs"
and choose to only have in the track 5' UTR exons
the click "get custom track in table browser"

now back at the table browser choose group "custom tracks"
and choose the track you just created

set output format to "sequence"
enter a filename
and click "get output"

good luck, hope this helps!


QUOTE (monstercoccus @ Nov 28 2007, 07:10 AM)
Hello.

I ask myself, if the UTR's of all mRNA's of one gene are always the same (I think so).
The other thing. What ways are the to fetch the complete UTR sequence of one gene.
One pretty easy way I know of is Biomart (Martview), where one can directly select it for downloading.
Another I read about is the UCSC browser. But I am not very familiar with that browser, nor what identifiers it uses.
Is there an easy way to get those UTR sequences from ENTREZ (as my identifiers are Entrez gene id's)?
I am not talking about looking up one or two UTR's, but downloading a complete set of one species.
The only entrez way I found so far, is to download one (is one enough? see my first question) refeseq transcript sequence for that specific Entrez gene id, and then take the sequence between the last exon and the poly A tail.
Is this procedure correct? Is there an easier ENTREZ way, like the Biomart one?

Kai

-frozenlyse-

QUOTE (monstercoccus @ Nov 27 2007, 11:49 PM)
QUOTE
I can't help you for the retrieval of the sequences, but what I can say you is that not all transcripts of a certain gene will carry the same UTR. For the 3' UTR you can have different polyadenylation sites and I also know of a gene that shows alternative splicing in it's 5'UTR, but I do think these cases are rather exceptional but still they exist.


Thanks for the hint. Do you know some gene symbols of such exceptions?
I will check the just downloaded Ensembl data for that.


for the alternative polyA sites, you can check these articles for Nanos1 and Evi5:

Strumane K, Bonnomet A, Stove C, Vandenbroucke R, Nawrocki-Raby B, Bruyneel E, Mareel M, Birembaut P, Berx G, van Roy F.
E-cadherin regulates human Nanos1, which interacts with p120ctn and induces tumor cell migration and invasion.
Cancer Res. 2006 Oct 15;66(20):10007-15.
PMID: 17047063

Faitar SL, Dabbeekeh JT, Ranalli TA, Cowell JK.
EVI5 is a novel centrosomal protein that binds to alpha- and gamma-tubulin.
Genomics. 2005 Nov;86(5):594-605. Epub 2005 Jul 19.
PMID: 16033705 [PubMed - indexed for MEDLINE]

as for the different 5'UTRs, a lot of genes use alternative promoters, which automatically leads to different 5'UTRs, for instance the BORIS gene (Renaud et al., Nucl Acids Res 2007), this gene also shows alternative splicing in it's 5'UTR

-dpo-

@dpo: thanks a lot for the articles. Sounds very interesting.
@frozenlyse: thanks a lot for the little howto. It helped me a lot, and showed me, what a powerfull tool the UCSC table browser can be. I will look into the full manual.

I will now try to extract the UTRs directly from downloaded Entrez files and compare them to the data I already have (Ensembl and UCSC). Perhaps I enter the blogging scene (I always wanted to) and start with a little post about those different methods for retrieving the UTR sequence data. Let's see.

Kai

-monstercoccus-

I checked the data from Biomart for different 3' UTRs of one gene. As dpo already reported, there are some.
One of those is ENSG00000197530 (Symbol: MIB2)
In Ensembl this gene has three different 3' UTRs for seven different transcripts.
But in Entrez this gene (142678) has only one transcript.
What about this inconsistency? Is this normal, when comparing Entrez with Ensembl? (I never did this before.)

-monstercoccus-