percentage similarity among a set of sequences - (Nov/10/2006 )
Hi ppl,
This question is proabably a bit silly. But, I have never known a tool that will tell me, lets say if i give a set of about 'n' number of sequences, it tells me what is the percentage similarity that all these sequences share among themselves...
So, how are people able to say that so and so families of proteins or a family of such a domain share a particular percentage similarity.....
thanks a lot
you can go for example here
http://npsa-pbil.ibcp.fr/cgi-bin/npsa_auto...a_clustalw.html
enter the sequences to compare as follow :
>name
attgcgttcag...
>name 2
attgggcgtggtgcc....
>name 3
attgcgcgggtgtg...
and submit
I would avoid using clustalw because it can be flaky - you should instead use muscle - look for the paper (on PubMed) describing it and you will see that it is indeed one of the best alignment tools out there.
Thanks a lot ppl.
However, the EBI website that provides clustalw, i see is not providing the identity and similarity percentage values.
Also, yes, Muscle looks to me as a good one.
And then there are just so many alignment programs available like T-coffee, MAFFT, and DIALIGN...
Each one claiming to be good in its own way...
I need to compare my alignments, through each of these programs and see which one is good for my case....
However, the EBI website that provides clustalw, i see is not providing the identity and similarity percentage values.
Also, yes, Muscle looks to me as a good one.
And then there are just so many alignment programs available like T-coffee, MAFFT, and DIALIGN...
Each one claiming to be good in its own way...
I need to compare my alignments, through each of these programs and see which one is good for my case....
You might need to write your own code to do the comparison and calculate what statistics you want to check.