Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

promoter analysis - help!!! (Feb/06/2007 )

Hi, I want to study the promoter of a gene and I'd like to ask if anyone knows a program for prediction of the promoter region or a program in which I can do the search by asking for the places of interaction with transcription regulation factors (factors that function during stress response)...
The problem I found so far is that when putting a sequence for analysis I always get too much options of possible promoters as a result.
Besides this do you know how many bases is the promoter away from the first exon of a sequence, or which segment should I take to input in a prediction program??
I'm studying a protein that starts close to the end of a contig (I have the localization in the genome) and I don't know which part of the sequence I should take into account for the analysis. And apart from that, the transcription factors act in the promoter region detected by the programs or near it, if that's the case how near??

can anyone help me??

-biotech!-

what is the organism? For human, you probably should be experimenting with 2000bp upstream.

You should look at TRANSFAC and MatInspector.

I didn't quite understand your last question. But if you gene is at the end of a contig, you probably want to find the assembled genome, see if you can get more sequence extended upstream.

-cyberpostdoc-

QUOTE (cyberpostdoc @ Feb 7 2007, 01:42 AM)
what is the organism? For human, you probably should be experimenting with 2000bp upstream.

You should look at TRANSFAC and MatInspector.

I didn't quite understand your last question. But if you gene is at the end of a contig, you probably want to find the assembled genome, see if you can get more sequence extended upstream.


Thanks for the info.....
The gene is for rattus norv... though there is a similar gene in human genome too.
What do you mean about 2000bp upstream? upstream from the first exon?

About the contig.... I search for the gene in the Ensambl and I found that it involves 6 contigs, so as I'm studying splicing too, I decided to analyse the gene step by step, I mean that in order to manipulate the information (and as the whole sequence is too long) I did the analysis of each contig at a time (for splicing analysis I used a programme called Genscan, do you know it?).
So based on Ensambl I know that the first exon starts at a given contig, and the thing is that I don't know how many base pairs should I take into account to look for the promoter since that if I analyse the whole contig the programmes give me too many possible promoters.....and only a short segment of it is part of the gene of interest, to make a long story short I'd like to know if I can take only part of it for the analysis (assuming that the promoter will surely be xxxxx base pairs away from the first exon, and no more away than that.)

Hope you understand my explanation!! rolleyes.gif

-biotech!-

QUOTE (biotech! @ Feb 7 2007, 07:49 AM)
Thanks for the info.....
The gene is for rattus norv... though there is a similar gene in human genome too.
What do you mean about 2000bp upstream? upstream from the first exon?


Yes. For Rat, you would probably try 2000bp upstream fromt he TSS (Transcription start Site, which should be annotated.

QUOTE (biotech! @ Feb 7 2007, 07:49 AM)
About the contig.... I search for the gene in the Ensambl and I found that it involves 6 contigs, so as I'm studying splicing too, I decided to analyse the gene step by step, I mean that in order to manipulate the information (and as the whole sequence is too long) I did the analysis of each contig at a time (for splicing analysis I used a programme called Genscan, do you know it?).


Yes, GeneScan is developed by Burge lab, I know it was originally designed for gene prediction, and could be used for splicing analysis. But I am not aware of its usage in promoter prediction.

QUOTE (biotech! @ Feb 7 2007, 07:49 AM)
So based on Ensambl I know that the first exon starts at a given contig, and the thing is that I don't know how many base pairs should I take into account to look for the promoter since that if I analyse the whole contig the programmes give me too many possible promoters.....and only a short segment of it is part of the gene of interest, to make a long story short I'd like to know if I can take only part of it for the analysis (assuming that the promoter will surely be xxxxx base pairs away from the first exon, and no more away than that.)


Hehe:rolleyes:, I don't think there is a sure thing in biology, but to answer your question, I would probably do this:
1. locate the transcription start site (first exon is also ok), retrieve upstream 500bp, 1000bp, 2000bp, 4000bp of GENOMIC sequence.
2. now you have in hand 4 sequences, you then create another 4 set of sequences by including the sequence of first exon. Now you have 8 sequences.
3. run whatever promoter analysis tool you are comfortable with and then compare the 8 results you have.
4. you might also try to search for transcription factor binding sites directly in all 8 sequences and compare the results.


I don't know emsemble contigs as much as I know NCBI contigs, not sure if they are the same, so I cannot say much about contig here. But if for NCBI contigs your gene span 6 contigs, it is very long. Are you sure its 6 contigs? (How many exons?)

-cyberpostdoc-