Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

how can i find whole exons and introns for a special gene - (Jun/06/2014 )

Hi,

 

How can I find whole exons and introns for a special gene, in exp. PRSS1 gene?

-medibio-

What is so special on that gene, that you can't find it on NCBI Gene  (reference mRNA sequence has marked beginings and ends of exons) or Ensembl (complete intron sequence can be set in the left Configure this page menu)?

-Trof-

I can find it via ensembl: http://www.ensembl.org/Homo_sapiens/Gene/Sequence?db=core;g=ENSG00000204983;r=7:142457319-142460923;t=ENST00000311737

Also I can find it via genome ucsc: http://genome-euro.ucsc.edu/cgi-bin/hgc?hgsid=198084058_by6FDA5NAY6QNEQQkTaDqttYaBZQ&g=htcCdnaAli&i=NM_002769&c=chr7&l=142457318&r=142460927&o=142457318&aliTable=refSeqAli&table=refGene

 

But their results are different?

 

I don' t want to make a mistake during this step. After I find the whole seq., I will primer design for sequencing analysis to make all gene region (all exons and introns).

-medibio-

Did you check the Exon view from Ensembl I posted? The exons there are identical to USCS.

 

As DNA sequence is more or less set, gene can have different transcripts or they may vary in the UTR lengths. Often records are corrected over time and validated or so.

So minor differences may be in certain genes between NCBI RefSeq and Ensembl, in that case I go for RefSeq.

 

Not in this case. The Ensembl link you have.. I honestly doesn't now how you got that.. I usually go right for the Exons of specific transcript. But there seems to be a mistakenly missed several exon-intron boundaries on your link. The Exons view is fine.

 

Anyway, if you're sequencing the whole DNA where the gene lies, you don't need to care much about introns and exons. It's different when you wan to sequence exons only, you need to know where they lie.

-Trof-

Hi Trof,

 

Thank you for your reply.

 

The ensembl link which I post previous message comes from "Sequence"  that is seen left side after searching gene name and it gives all introns and exons. So I have used it.

 

Do you recommend to do Sanger sequencing for this gene (genomic size is 3609) (all exons and introns)?

 

Thanks

 

Regards

-medibio-

I'm not sure why you want to sequence the entire gene, so I can't recommend anything.

 

Usually if your are looking for a mutation affecting gene function, you either only sequence coding sequence (i.e. exons), or known regulatory regions like promoter, enhancers and so.

 

Sequencing coding sequence of this gene would be easy, only 5 exons, you can PCR them separately and sequence. If you for some reasons wanted also introns, you would need to divide it into smaller overlapping PCR products and sequence them.

 

But I see no point in doing that, unless you have a special reason for it.

It's true not only mutations in coding sequence can affect the transription or translation of the gene, but unless you know where to look it's like trying to shoot a duck blinded.

Also actual regulatory regions affecting this gene (unknown at time), can be many more kb apart from the gene itself, so you can hardly be sure you'll find all potential variants when sequencing introns.

 

And, if you find any variation outside the coding sequence (which is very likely, introns are less "stable" and more prone to mutations that doesn't cause anything, unless affecting splicing sites) you can never be sure it has the effect you look for, unless you do a functional study. On the contrary, if you find a mutation in coding sequence that is not a polymorphism, it may be quite likely it has some effect.

-Trof-

Hi Trof,

 

We will sequence the whole gene and then investigate relationship between gene mutations and disase. Maybe we can find new clinical results or new SNP, mutations. For study, we don' t have a special SNP or mutation in the gene region.

 

Thanks

 

Regards

-medibio-

In that case, it's standard to sequence only exons and exon-intron boundaries (that affect splicing) and known regulatory regions. 

Sequencing far too much DNA is not the best approach, since you will find more "variation noise" than something that really matters.

 

(That's the problem with NGS by the way, anyone can do it now, but then they have terabytes of data, find thousands of novel variations in every sample, but can make no sense of it.)

-Trof-

I will try to design primers to use some tools for sanger sequencing. I thought NGS and you are right, also we have much clinical samples and thus it will be very expensive.

 

Thanks for your attention

-medibio-