Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Percentage identity and bootstrap values - correlation? (Dec/27/2008 )

Hi all,
I'm trying to construct a phylogenetic tree between highly related sequences (>85% identity) by neighbor joining method (each sequence about 600 bp in length). The problem is that I get a range of bootstrap values, from 35-100. According to my limited knowledge in phylogeny, I find it difficult to understand that sequences having high sequence identity have low bootstrap values i.e. if sequences are highly related, randomly jumbling them aint going to make the alignment any different. Does anybody have an explanation?
Also, I also came across reports which said things CAN have low bootstrap values and indicate a meaningful phylogeny under certain circumstances, and so I dont need to worry about "increasing" bootstrap values (if there's such a way).
Thanks a lot!!

-jangajarn-

QUOTE (jangajarn @ Dec 27 2008, 06:56 PM)
Hi all,
I'm trying to construct a phylogenetic tree between highly related sequences (>85% identity) by neighbor joining method (each sequence about 600 bp in length). The problem is that I get a range of bootstrap values, from 35-100. According to my limited knowledge in phylogeny, I find it difficult to understand that sequences having high sequence identity have low bootstrap values i.e. if sequences are highly related, randomly jumbling them aint going to make the alignment any different. Does anybody have an explanation?
Also, I also came across reports which said things CAN have low bootstrap values and indicate a meaningful phylogeny under certain circumstances, and so I dont need to worry about "increasing" bootstrap values (if there's such a way).
Thanks a lot!!

But dependent from similarity between sequences (or derived data) the software algorithms try to build a tree, i.e. which sequences are more similar, form a cluster or build depended on similarity the most simple tree. So all trees are compromises and no perfect tree exists. If the program didn't finds a good solution then bootstrap values can be low.
These values don't say that two or more sequences are more or less identical, but the branching compromise at that point is not very good and other solutions would perhaps also fit.

-hobglobin-