Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Residue/base frequency - what the!!? (Jan/22/2006 )

Can anyone reccomend a programme/soft ware/web site that can be used to determine residue/base usage frequency....

if u are reccomending FREAKN from EMBOSS... please also gimme a clue as to how to use the thing??

thanks!!

-janbrisbane-

Well I would recomend EMBOSS.

go to http://emboss.sourceforge.net/ or type emboss into google.

Download the package:

If you are on linux:
unzip (gunzip *.gz)
untar (tar -xvf *.tar)
./configure
make
make install
make clean.

windows: download the package and double click on the installer.

Go to a terminal (type cmd in the windows run box in search) now type the name of the program you wanted to use and it should spring to life in the terminal.

see emboos freak program for a guide to using the package on your own computer

or

LIVE SERVER

Most of the time you can should just google something and RTFM.

-DPK-

An alternative is to use perl to do it.

If you post the kind of file you are trying to read and do the frequency stuff, I can have a look at it for you.

-DPK-

Thanks a lot DPK, i'll give it a try...

i am trying to determine the frequency of codon usage for valine in a family of ABC genes... ie.. how many time is valine coded for by gtc vs gta vs gtt vs gtg in the different members... so the file i have is just a FASTA gene/mRNA sequence... like..

GCGCGCGGAGCCAGCGGAGCCAGCTGAGCCCGAGCCCAGCCCGCGCCCGCGCCGCCATGCCCCTGGCCTT
CTGCGGCAGCGAGAACCACTCGGCCGCCTACCGGGTGGACCAGGGGGTCCTCAACAACGGCTGCTTTGTG
GACGCGCTCAACGTGGTGCCGCACGTCTTCCTACTCTTCATCACCTTCCCCATCCTCTTCATTGGATGGG
GAAGTCAGAGCTCCAAGGTGCACATCCACCACAGCACATGGCTTCATTTCCCCGGGCACAACCTGCGGTG
GATCCTGACCTTCATGCTGCTCTTCGTCCTGGTGTGTGAGATTGCAGAGGGCATCCTGTCTGATGGGGTG
ACCGAATCCCACCATCTGCACCTGTACATGCCAGCCGGGATGGCGTTCATGGCTGCTGTCACCTCCGTGG
TCTACTATCACAACATCGAGACTTCCAACTTCCCCAAGCTGCTAATTGCCCTGCTGGTGTATTGGACCCT
GGCCTTCATCACCAAGACCATCAAGTTTGTCAAGCTCTTGGACCACGCCATCGGCTTCTCGCAGCTACGC
TTCTGCCTCACAGGGCTGCTGGTGATCCTCTATGGGATGCTGCTCCTCGTGGAGGTCAATGTCATCAGGG
TGAGGAGATACATCTTCTTCAAGACACCGAGGGAGGTGAAGCCTCCCGAGGACCTGCAAGACCTGGGGGT
ACGCTTCCTGCAGCCCTTCGTGAATCTGCCGTCCAAAGGCACCTACTGGTGGATGAACGCCTTCATCAAG
ACTGCCCACAAGAAGCCCATCGACTTGCGAGCCATCGGGAAGCTGCCCATCGTTATGAGGGCCCTCACCA
ACTACCAACGGCTCTGCGAGGCCTTTGACGCCCAGGTGCGGAAGGACATTCAGGGCACTCAAGGTGCCCG
GGCCATCTGGCAGGCACTCAGCCATGCCTTCGGGAGGCGCCTGGTCCTCAGCAGCACTTTCCGCATCTTG
GCCGACCTGCTGGGCTTCGCCGGGCCACTGTGCATCTTTGGGATCGTGGACCACCTTGGGAAGGAGAACG
ACGTCTTCCAGCCCAAGACACAATTTCTCGGGGTTTACTTTGTCTCATCCCAAGAGTTCCTTGCCAATGC
CTACGTCTTAGCTGTGCTTCTGTTCCTTGCCCTCCTACTGCAAAGGACATTTCTGCAAGCATCCTACTAT
GTGGCCATTGAAACTGGAATTAACTTGAGAGGAGCAATACAGACCAAGATTTACAATAAAATTATGCACC
TGTCCACCTCCAACCTGTCCATGGGAGAAATGACTGCTGGACAGATCTGTAATCTGGTTGCCATCGACAC
CAATCAGCTCATGTGGTTTTTCTTCTTGTGCCCAAACCTCTGGGCTATGCCAGTACAGATCATTGTGGGT
GTGATTCTCCTCTACTACATACTCGGAGTCAGTGCCTTAATTGGAGCAGCTGTCATCATTCTACTGGCTC
CTGTCCAGTACTTCGTGGCCACCAAGCTGTCTCAGGCCCAGCGGAGCACACTGGAGTATTCCAATGAGCG
GCTGAAGCAGACCAACGAGATGCTCCGCGGCATCAAGCTGCTGAAGCTGTACGCCTGGGAGAACATCTTC
CGCACGCGGGTGGAGACGACCCGCAGGAAGGAGATGACCAGCCTCAGGGCCTTTGCCATCTATACCTCCA
TCTCCATTTTCATGAACACGGCCATCCCCATTGCAGCTGTCCTCATAACTTTCGTGGGCCATGTCAGCTT
CTTCAAAGAGGCCGACTTCTCGCCCTCCGTGGCCTTTGCCTCCCTCTCCCTCTTCCATATCTTGGTCACA
CCGCTGTTCCTGCTGTCCAGTGTGGTCCGATCTACCGTCAAAGCTCTAGTGAGCGTGCAAAAGCTAAGCG
AGTTCCTGTCCAGTGCAGAGATCCGTGAGGAGCAGTGTGCCCCCCATGAGCCCACACCTCAGGGCCCAGC
CAGCAAGTACCAGGCGGTGCCCCTCAGGGTTGTGAACCGCAAGCGTCCAGCCCGGGAGGATTGTCGGGGC
CTCACCGGCCCACTGCAGAGCCTGGTCCCCAGTGCAGATGGCGATGCTGACAACTGCTGTGTCCAGATCA
TGGGAGGCTACTTCACGTGGACCCCAGATGGAATCCCCACACTGTCCAACATCACCATTCGTATCCCCCG
AGGCCAGCTGACTATGATCGTGGGGCAGGTGGGCTGCGGCAAGTCCTCGCTCCTTCTAGCCGCACTGGGG
GAGATGCAGAAGGTCTCAGGGGCTGTCTTCTGGAGCAGCCTTCCTGACAGCGAGATAGGAGAGGACCCCA
GCCCAGAGCGGGAGACAGCGACCGACTTGGATATCAGGAAGAGAGGCCCCGTGGCCTATGCTTCGCAGAA
ACCATGGCTGCTAAATGCCACTGTGGAGGAGAACATCATCTTTGAGAGTCCCTTCAACAAACAACGGTAC
AAGATGGTCATTGAAGCCTGCTCTCTGCAGCCAGACATCGACATCCTGCCCCATGGAGACCAGACCCAGA
TTGGGGAACGGGGCATCAACCTGTCTGGTGGTCAACGCCAGCGAATCAGTGTGGCCCGAGCCCTCTACCA
GCACGCCAACGTTGTCTTCTTGGATGACCCCTTCTCAGCTCTGGATATCCATCTGAGTGACCACTTAATG
CAGGCCGGCATCCTTGAGCTGCTCCGGGACGACAAGAGGACAGTGGTCTTAGTGACCCACAAGCTACAGT
ACCTGCCCCATGCAGACTGGATCATTGCCATGAAGGATGGGACCATCCAGAGGGAGGGTACCCTCAAGGA
CTTCCAGAGGTCTGAATGCCAGCTCTTTGAGCACTGGAAGACCCTCATGAACCGACAGGACCAAGAGCTG
GAGAAGGAGACTGTCACAGAGAGAAAAGCCACAGAGCCACCCCAGGGCCTATCTCGTGCCATGTCCTCGA
GGGATGGCCTTCTGCAGGATGAGGAAGAGGAGGAAGAGGAGGCAGCTGAGAGCGAGGAGGATGACAACCT
GTCGTCCATGCTGCACCAGCGTGCTGAGATCCCATGGCGAGCCTGCGCCAAGTACCTGTCCTCCGCCGGC
ATCCTGCTCCTGTCGTTGCTGGTCTTCTCACAGCTGCTCAAGCACATGGTCCTGGTGGCCATCGACTACT
GGCTGGCCAAGTGGACCGACAGCGCCCTGACCCTGACCCCTGCAGCCAGGAACTGCTCCCTCAGCCAGGA
GTGCACCCTCGACCAGACTGTCTATGCCATGGTGTTCACGGTGCTCTGCAGCCTGGGCATTGTGCTGTGC
CTCGTCACGTCTGTCACTGTGGAGTGGACAGGGCTGAAGGTGGCCAAGAGACTGCACCGCAGCCTGCTAA
ACCGGATCATCCTAGCCCCCATGAGGTTTTTTGAGACCACGCCCCTTGGGAGCATCCTGAACAGATTTTC
ATCTGACTGTAACACCATCGACCAGCACATCCCATCCACGCTGGAGTGCCTGAGCCGCTCCACCCTGCTC
TGTGTCTCAGCCCTGGCCGTCATCTCCTATGTCACACCTGTGTTCCTCGTGGCCCTCTTGCCCCTCGCAG
TCGTGTGCTACTTCATCCAGAAGTACTTCCGGGTGGCGTCCAGGGACCTGCAGCAGCTGGATGACACCAC
CCAGCTTCCACTTCTCTCACACTTTGCCGAAACCGTAGAAGGACTCACCACCATCCGGGCCTTCAGGTAT
GAGGCCCGGTTCCAGCAGAAGCTTCTCGAATACACAGACTCCAACAACATTGCTTCCCTCTTCCTCACAG
CTGCCAACAGATGGCTGGAAGTCCGAATGGAGTACATCGGTGCATGTGTGGTGCTCATCGCAGCGGTGAC
CTCCATCTCCAACTCCCTGCACAGGGAGCTCTCTGCTGGCCTGGTGGGCCTGGGCCTTACCTACGCCCTA
ATGGTCTCCAACTACCTCAACTGGATGGTGAGGAACCTGGCAGACATGGAGCTCCAGCTGGGGGCTGTGA
AGCGCATCCATGGGCTCCTGAAAACCGAGGCAGAGAGCTACGAGGGGCTCCTGGCACCATCGCTGATCCC
AAAGAACTGGCCAGACCAAGGGAAGATCCAGATCCAGAACCTGAGCGTGCGCTACGACAGCTCCCTGAAG
CCGGTGCTGAAGCACGTCAATGCCCTCATCTCCCCTGGACAGAAGATCGGGATCTGCGGCCGCACCGGCA
GTGGGAAGTCCTCCTTCTCTCTTGCCTTCTTCCGCATGGTGGACACGTTCGAAGGGCACATCATCATTGA
TGGCATTGACATCCGCAAACTGCCGCTGCACACCCTGCCGTCACGCCTCTCCATCATCCTGCAGGACCCC
GTCCTCTTCAGCGGCACCATCCGATTTAACCTGGACCCTGAGAGGAAGTGCTCAGATAGCACACTGTGGG
AGGCCCTGGAAATCGCCCAGCTGAAGCTGGTGGTGAAGGCACTGCCAGGAGGCCTCGATGCCATCATCAC
AGAAGGCGGGGAGAATTTCAGCCAGGGACAGAGGCAGCTGTTCTGCCTGGCCCGGGCCTTCGTGAGGAAG
ACCAGCATCTTCATCATGGACGAGGCCACGGCTTCCATTGACATGGCCACGGAAAACATCCTCCAAAAGG
TGGTGATGACAGCCTTCGCAGACCGCACTGTGGTCACCATCGCGCATCGAGTGCACACCATCCTGAGTGC
AGACCTGGTGATCGTCCTGAAGCGGGGTGCCATCCTTGAGTTCGATAAGCCAGAGAAGCTGCTCAGCCGG
AAGGACAGCGTCTTCGCCTCCTTCGTCCGTGCAGACAAGTGA

-janbrisbane-

This is a typical perl or python type bioinformatics problem (hence the emboss program). I suggest that you check out either of the above languages, they are quick to pick-up and very useful for this kind of task.

HTH.

-DPK-

Thanks HTH , if they are going to help, then i will get down to wrapping my head around it!

wish me luck....

-janbrisbane-

Are the tools at http://www.bioinformatics.org/SMS/ any help? See especially Codon Plot
and Codon Usage...

If I've misunderstood what information you're looking for, let me know -- I'm pretty good at Perl; I could probably whip you up a script...

-HomeBrew-