Promoter sequence - ensembl vs BLAST (Mar/20/2009 )
Hi,
I'm new to working with promoters. I have to clone the promoters of two gene variants.
I located 1.5kb uptream sequence of the two promoters using ensembl.  To cnfirm the sequnce I did a  blast  using NCBI and it gave me teh correct chromosome contig etc:  Below are teh results.  What does it mean by features flanking the subject sequence?  I'm cloning the promoter of DKK3 gene, which is located on chromosome 11
>ref|NT_009237.17|Hs11_9394  Homo sapiens chromosome 11 genomic contig, reference assembly
Length=49571094
 Features flanking this part of subject sequence:
   789 bp at 5' side: dickkopf homolog 3 precursor
   151286 bp at 3' side: microtubule associated monoxygenase, calponin and LIM dom...
Score = 2771 bits (1500),  Expect = 0.0
 Identities = 1500/1500 (100%), Gaps = 0/1500 (0%)
 Strand=Plus/Minus
2) Also, my 2 variants only differ in their 5'UTR, they encode teh same protein.  This shallI clone -1.5bp to  just befor ethe start codon of either of teh variants? I mean how long after the start codon shall I clone?
variant 1 cds: 240bp
variant2 cds: 226bp
The promoter sequence given by ensembl is the promoter for the longer isoform of DKK3, that is why NCBI says that sequence is 789 bp away from the 5' side of DKK3 gene. Here NCBI means the shorter isoform. 
How far the promoter sequence should go downstream depends on which isoform is more important biological and is the dominant form. If you are not sure, you can include the sequence all the way to the TSS or start codon of the short isoform.
pcrman on Mar 20 2009, 11:17 PM said:
How far the promoter sequence should go downstream depends on which isoform is more important biological and is the dominant form. If you are not sure, you can include the sequence all the way to the TSS or start codon of the short isoform.
I'm confused. It says the same thing when I blast the 1.5kb promoter regions of two of the isoforms? am'I to trust the emsemble sequence?
I don't understand what you meant. Can you post the sequence here?
Hi,
I'm to clone the promoters of two isoforms, which diferin their 5'UTR.  For both isoforms I got the 1.5kb proter sequence from ensembl and then did a BLASt of the sequence (NCBI).
vaiant 1: 1.5kb seq from ensembl
gggcagttcgatatagagagatttttaggttgactctgaaagtcaagacctccagaccgc
atggtagaaggtgtaaggcagaagacaatctcagctggggaaattcctggtctttaagcc
agcaacatgaaggactggaagagcatgatgtgctctgtaaacccgcagcactgcatttcc
tagctcggccccacaatatgcccccacagcaccctccagttcggcattagtttcttccta
atgtccactctgcccgaagtgacaagcgggggcatgtggagactcagctccaggttcctg
gacgggctcagccacccccagaaagctaatgaatgctcaaccagggcttccagatgccca
ggggacagagcaggagatgccggggaatggggctttccttgcagttcaggagggccctgc
cccaggcccagaagtagaagggaaagcggctgttttggcggtaaacagtaatgtggggag
tgctgcagagaaaggcagtcttggggtttcaagctggagagcagtcagctacactcagga
cctctggccatccctgccttcacctgctgtttggcctgatcgtctaacttctctgattct
ccactacccactccttattacgtttttgagacttgtcaaagttttatattagggctaact
gggacgcatacaaatctggtaacttcgccagggcgggaagttaggaaggagcagagctgg
ctgcaggtgtctggtcctgaccactcctctatgccacccttgaggagcttgctgactttc
tcatgacgttctcccattccaggagctgcaagtgcgttatcctggctggagcacggtgtc
aatcacggcagactaaggccagcggtgatggcttgaatgccaggctgggggctgggattt
ttcctgaggatttcacaggacagaggttggcttggaaagaccaaggtgggactgaggaac
attccccctacccccaacctcggtgggctgttgcaagcctggaggccagagaagacgggc
ctgggatgccgcgggcgcaggggcaggcagtgaaggagatggctgccttcggtagagctg
gtcgctgaggcagaagaggagggcgtggggcgtggggcgtgaggtggccggcgccccggc
tggccaatggccgggctgcggcccctccgcggggcggggtgggcctggtgggcgggcggg
gctcggggcgggggcggagagggagcctggtgggcgggcggggcgcgtcttgcgggctcc
ctcgggtaccggcgctgccgcaccccgccgcgctcccgcacccgcggcccgcccaccgcg
ccgctcccgcatctgcacccgcagcccggcggcctcccggcgggagcgagcagatccagt
ccggcccgcagcgcaactcggtccagtcggggtgggtgaggggcggcggcgggggagggg
acgactctgctgagctcagcctctcttggtggatgtggggcggggcgctcgagtaggacc
BLAST results:
>ref|NM_013253.4|  Homo sapiens dickkopf homolog 3 (Xenopus laevis) (DKK3), transcript 
variant 2, mRNA 
Length=2755 
This shows that the last 200bp of the 1.5kb variant 1 seq is identical to the variant 2 seq. 
Query  1213  GGCGGAGAGGGAGCCTGGTGGGCGGGCGGGGCGCGTCTTGCGGGCTCCCTCGGGTACCGG  1272
            
Sbjct  1     GGCGGAGAGGGAGCCTGGTGGGCGGGCGGGGCGCGTCTTGCGGGCTCCCTCGGGTACCGG  60
Query  1273  CGCTGCCGCACCCCGCCGCGCTCCCGCACCCGCGGCCCGCCCACCGCGCCGCTCCCGCAT  1332
             
Sbjct  61    CGCTGCCGCACCCCGCCGCGCTCCCGCACCCGCGGCCCGCCCACCGCGCCGCTCCCGCAT  120
Query  1333  CTGCACCCGCAGCCCGGCGGCCTCCCGGCGGGAGCGAGCAGATCCAGTCCGGCCCGCAGC  1392
         
Sbjct  121   CTGCACCCGCAGCCCGGCGGCCTCCCGGCGGGAGCGAGCAGATCCAGTCCGGCCCGCAGC  180
Query  1393  GCAACTCGGTCCAGTCGGGG  1412
                    
Sbjct  181   GCAACTCGGTCCAGTCGGGG  200
GENE ID: 27122 DKK3 | dickkopf homolog 3 (Xenopus laevis) 
(Over 10 PubMed links)
 Score =  370 bits (200),  Expect = 3e-99
 Identities = 200/200 (100%), Gaps = 0/200 (0%)
 Strand=Plus/Plus
>ref|NT_009237.17|Hs11_9394  Homo sapiens chromosome 11 genomic contig, reference assembly
Length=49571094
 Features flanking this part of subject sequence:
   501 bp at 5' side: dickkopf homolog 3 precursor
   151574 bp at 3' side: microtubule associated monoxygenase, calponin and LIM dom...
 Score = 2771 bits (1500),  Expect = 0.0
 Identities = 1500/1500 (100%), Gaps = 0/1500 (0%)
Variant 2: 1.5kb 1.5kb seq from ensembl
agaccacttatatttgagacctgtagattttcttaccgtttcttctctctccctttcttt
ctttctttctttctttctttctttctttctttctttctttctttctttctttctttcttt
tctttctttctctctttctctttctctctttctttccttcttttctttcctttctttttt
tcgtttgtagtttaacctaataattgaactactgataaattattacatttgggaatacaa
aatgtagactccacacaagaaaacaagcgtccctttgcctgacacttggggcagttcgat
atagagagatttttaggttgactctgaaagtcaagacctccagaccgcatggtagaaggt
gtaaggcagaagacaatctcagctggggaaattcctggtctttaagccagcaacatgaag
gactggaagagcatgatgtgctctgtaaacccgcagcactgcatttcctagctcggcccc
acaatatgcccccacagcaccctccagttcggcattagtttcttcctaatgtccactctg
cccgaagtgacaagcgggggcatgtggagactcagctccaggttcctggacgggctcagc
cacccccagaaagctaatgaatgctcaaccagggcttccagatgcccaggggacagagca
ggagatgccggggaatggggctttccttgcagttcaggagggccctgccccaggcccaga
agtagaagggaaagcggctgttttggcggtaaacagtaatgtggggagtgctgcagagaa
aggcagtcttggggtttcaagctggagagcagtcagctacactcaggacctctggccatc
cctgccttcacctgctgtttggcctgatcgtctaacttctctgattctccactacccact
ccttattacgtttttgagacttgtcaaagttttatattagggctaactgggacgcataca
aatctggtaacttcgccagggcgggaagttaggaaggagcagagctggctgcaggtgtct
ggtcctgaccactcctctatgccacccttgaggagcttgctgactttctcatgacgttct
cccattccaggagctgcaagtgcgttatcctggctggagcacggtgtcaatcacggcaga
ctaaggccagcggtgatggcttgaatgccaggctgggggctgggatttttcctgaggatt
tcacaggacagaggttggcttggaaagaccaaggtgggactgaggaacattccccctacc
cccaacctcggtgggctgttgcaagcctggaggccagagaagacgggcctgggatgccgc
gggcgcaggggcaggcagtgaaggagatggctgccttcggtagagctggtcgctgaggca
gaagaggagggcgtggggcgtggggcgtgaggtggccggcgccccggctggccaatggcc
gggctgcggcccctccgcggggcggggtgggcctggtgggcgggcggggctcggggcggg
BLASt results:
>ref|NT_009237.17|Hs11_9394  Homo sapiens chromosome 11 genomic contig, reference assembly
Length=49571094
 Features flanking this part of subject sequence:
   789 bp at 5' side: dickkopf homolog 3 precursor
   151286 bp at 3' side: microtubule associated monoxygenase, calponin and LIM dom...
 Score = 2771 bits (1500),  Expect = 0.0
 Identities = 1500/1500 (100%), Gaps = 0/1500 (0%)
 Strand=Plus/Minus
Query  1         AGACCACTTATATTTGAGACCTGTAGATTTTCTTACCGTTTCTTCTCTCTCCCTTTCTTT  60
                
Sbjct  10819658  AGACCACTTATATTTGAGACCTGTAGATTTTCTTACCGTTTCTTCTCTCTCCCTTTCTTT  10819599
Query  61        CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTT  120
                 
Sbjct  10819598  CTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTT  10819539
Query  121       TCTTTCTTTCTCTCTTTCTCTTTCTCTCTTTCTTTCCTTCTTTTCTTTCCTTTCTTTTTT  180
                 
Sbjct  10819538  TCTTTCTTTCTCTCTTTCTCTTTCTCTCTTTCTTTCCTTCTTTTCTTTCCTTTCTTTTTT  10819479
Query  181       TCGTTTGTAGTTTAACCTAATAATTGAACTACTGATAAATTATTACATTTGGGAATACAA  240
                 
Sbjct  10819478  TCGTTTGTAGTTTAACCTAATAATTGAACTACTGATAAATTATTACATTTGGGAATACAA  10819419
Query  241       AATGTAGACTCCACACAAGAAAACAAGCGTCCCTTTGCCTGACACTTGGGGCAGTTCGAT  300
                
Sbjct  10819418  AATGTAGACTCCACACAAGAAAACAAGCGTCCCTTTGCCTGACACTTGGGGCAGTTCGAT  10819359
Query  301       ATAGAGAGATTTTTAGGTTGACTCTGAAAGTCAAGACCTCCAGACCGCATGGTAGAAGGT  360
                 
Sbjct  10819358  ATAGAGAGATTTTTAGGTTGACTCTGAAAGTCAAGACCTCCAGACCGCATGGTAGAAGGT  10819299
Query  361       GTAAGGCAGAAGACAATCTCAGCTGGGGAAATTCCTGGTCTTTAAGCCAGCAACATGAAG  420
                
Sbjct  10819298  GTAAGGCAGAAGACAATCTCAGCTGGGGAAATTCCTGGTCTTTAAGCCAGCAACATGAAG  10819239
Query  421       GACTGGAAGAGCATGATGTGCTCTGTAAACCCGCAGCACTGCATTTCCTAGCTCGGCCCC  480
                 
Sbjct  10819238  GACTGGAAGAGCATGATGTGCTCTGTAAACCCGCAGCACTGCATTTCCTAGCTCGGCCCC  10819179
Query  481       ACAATATGCCCCCACAGCACCCTCCAGTTCGGCATTAGTTTCTTCCTAATGTCCACTCTG  540
                 
Sbjct  10819178  ACAATATGCCCCCACAGCACCCTCCAGTTCGGCATTAGTTTCTTCCTAATGTCCACTCTG  10819119
Query  541       CCCGAAGTGACAAGCGGGGGCATGTGGAGACTCAGCTCCAGGTTCCTGGACGGGCTCAGC  600
                 
Sbjct  10819118  CCCGAAGTGACAAGCGGGGGCATGTGGAGACTCAGCTCCAGGTTCCTGGACGGGCTCAGC  10819059
Query  601       CACCCCCAGAAAGCTAATGAATGCTCAACCAGGGCTTCCAGATGCCCAGGGGACAGAGCA  660
                 
Sbjct  10819058  CACCCCCAGAAAGCTAATGAATGCTCAACCAGGGCTTCCAGATGCCCAGGGGACAGAGCA  10818999
Query  661       GGAGATGCCGGGGAATGGGGCTTTCCTTGCAGTTCAGGAGGGCCCTGCCCCAGGCCCAGA  720
                 
Sbjct  10818998  GGAGATGCCGGGGAATGGGGCTTTCCTTGCAGTTCAGGAGGGCCCTGCCCCAGGCCCAGA  10818939
Query  721       AGTAGAAGGGAAAGCGGCTGTTTTGGCGGTAAACAGTAATGTGGGGAGTGCTGCAGAGAA  780
                
Sbjct  10818938  AGTAGAAGGGAAAGCGGCTGTTTTGGCGGTAAACAGTAATGTGGGGAGTGCTGCAGAGAA  10818879
Query  781       AGGCAGTCTTGGGGTTTCAAGCTGGAGAGCAGTCAGCTACACTCAGGACCTCTGGCCATC  840
                 
Sbjct  10818878  AGGCAGTCTTGGGGTTTCAAGCTGGAGAGCAGTCAGCTACACTCAGGACCTCTGGCCATC  10818819
Query  841       CCTGCCTTCACCTGCTGTTTGGCCTGATCGTCTAACTTCTCTGATTCTCCACTACCCACT  900
               
Sbjct  10818818  CCTGCCTTCACCTGCTGTTTGGCCTGATCGTCTAACTTCTCTGATTCTCCACTACCCACT  10818759
Query  901       CCTTATTACGTTTTTGAGACTTGTCAAAGTTTTATATTAGGGCTAACTGGGACGCATACA  960
              
Sbjct  10818758  CCTTATTACGTTTTTGAGACTTGTCAAAGTTTTATATTAGGGCTAACTGGGACGCATACA  10818699
Query  961       AATCTGGTAACTTCGCCAGGGCGGGAAGTTAGGAAGGAGCAGAGCTGGCTGCAGGTGTCT  1020
                 
Sbjct  10818698  AATCTGGTAACTTCGCCAGGGCGGGAAGTTAGGAAGGAGCAGAGCTGGCTGCAGGTGTCT  10818639
Query  1021      GGTCCTGACCACTCCTCTATGCCACCCTTGAGGAGCTTGCTGACTTTCTCATGACGTTCT  1080
                 
Sbjct  10818638  GGTCCTGACCACTCCTCTATGCCACCCTTGAGGAGCTTGCTGACTTTCTCATGACGTTCT  10818579
Query  1081      CCCATTCCAGGAGCTGCAAGTGCGTTATCCTGGCTGGAGCACGGTGTCAATCACGGCAGA  1140
                 
Sbjct  10818578  CCCATTCCAGGAGCTGCAAGTGCGTTATCCTGGCTGGAGCACGGTGTCAATCACGGCAGA  10818519
Query  1141      CTAAGGCCAGCGGTGATGGCTTGAATGCCAGGCTGGGGGCTGGGATTTTTCCTGAGGATT  1200
                 
Sbjct  10818518  CTAAGGCCAGCGGTGATGGCTTGAATGCCAGGCTGGGGGCTGGGATTTTTCCTGAGGATT  10818459
Query  1201      TCACAGGACAGAGGTTGGCTTGGAAAGACCAAGGTGGGACTGAGGAACATTCCCCCTACC  1260
                
Sbjct  10818458  TCACAGGACAGAGGTTGGCTTGGAAAGACCAAGGTGGGACTGAGGAACATTCCCCCTACC  10818399
Query  1261      CCCAACCTCGGTGGGCTGTTGCAAGCCTGGAGGCCAGAGAAGACGGGCCTGGGATGCCGC  1320
             
Sbjct  10818398  CCCAACCTCGGTGGGCTGTTGCAAGCCTGGAGGCCAGAGAAGACGGGCCTGGGATGCCGC  10818339
Query  1321      GGGCGCAGGGGCAGGCAGTGAAGGAGATGGCTGCCTTCGGTAGAGCTGGTCGCTGAGGCA  1380
                 
Sbjct  10818338  GGGCGCAGGGGCAGGCAGTGAAGGAGATGGCTGCCTTCGGTAGAGCTGGTCGCTGAGGCA  10818279
Query  1381      GAAGAGGAGGGCGTGGGGCGTGGGGCGTGAGGTGGCCGGCGCCCCGGCTGGCCAATGGCC  1440
                 
Sbjct  10818278  GAAGAGGAGGGCGTGGGGCGTGGGGCGTGAGGTGGCCGGCGCCCCGGCTGGCCAATGGCC  10818219
Query  1441      GGGCTGCGGCCCCTCCGCGGGGCGGGGTGGGCCTGGTGGGCGGGCGGGGCTCGGGGCGGG  1500
              
Sbjct  10818218  GGGCTGCGGCCCCTCCGCGGGGCGGGGTGGGCCTGGTGGGCGGGCGGGGCTCGGGGCGGG  10818159