Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

Need help with nucleic acids alignments - (Oct/04/2005 )

Pages: 1 2 Next

Hello!
I have the following problem:
I have 50 nucleic acids sequences and i have to compare them all one by one.
Is there a program or something that can help do this without much trouble?
It's really painfull and time-consuming to compare them with the help of programs or webpages i now so far.
I hope someone can help me.
Thanks a lot!

-vaitsa-

Do you need the actual alignments, or just some statistical parameters about the comparisons?

-HomeBrew-

QUOTE (vaitsa @ Oct 4 2005, 11:36 AM)
Hello!
I have the following problem:
I have 50 nucleic acids sequences and i have to compare them all one by one.
Is there a program or something that can help do this without much trouble?
It's really painfull and time-consuming to compare them with the help of programs or webpages i now so far.
I hope someone can help me.
Thanks a lot!


Did you try this one?

-Theo22-

Some more Web Sites for Multiple Alignments

-Theo22-

First of all thanks for the interests.
Secondly: the statistical parameters about the comparisons are good enough for what i need.
But i will give you an example to understand what i mean exaclty:
I have the following NAs:
>1
AAGTCGTTCGGAN
>2
GCCTGTAACGGTT
>3
GTCCGTAAGTCCT
>4
GTCCGAAATGCTG

I need the comparison between 1 and 2, 1 and 3, 1 and 4, 2 and 3, and on...
But imagine that i have many many of them...
Thanks SOOOOOO much

-vaitsa-

If you have access to GCG or EMBOSS (command-line version), you could write a shell or Perl script to churn them all out for you...

Barring that, it'd be rather easy to write a Perl script to use NCBI's Blast Two Sequences page.

-HomeBrew-

i'm not very good in writing scripts, so I'll pass.... thanks anyway!

-vaitsa-

QUOTE (vaitsa @ Oct 5 2005, 02:41 AM)
i'm not very good in writing scripts, so I'll pass.... thanks anyway!


Yes, but we could help...

Do you have command-line access to GCG or EMBOSS?

-HomeBrew-

you could?really?
thanks!!!!!!!!!!
I just got access to GCG now what?

-vaitsa-

Sure -- it sounds like fun...

Let's get an idea what tools we have available. From the command line, enter "perl -v" (without the quotes, of course) and let's see if we have Perl available to us. You should get something like:

QUOTE
newmbcrr% perl -v

This is perl, v5.8.4 built for sun4-solaris-thread-multi
(with 3 registered patches, see perl -V for more detail)

Copyright 1987-2004, Larry Wall

Binary build 810 provided by ActiveState Corp. http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Jun 3 2004 14:33:50

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'. If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.

newmbcrr%


Also, give me a little detail about your sequences. I know you said you had about 50 (the precise number doesn't matter), but how long are the sequences? Are they all about the same size -- that is there're not some that are 100 bp and others that are 3000 bp, are there?

Can you get each sequence into its own file? What format will that file be in (GCG, FASTA, etc.)?

Do you want to use Bestfit (uses the Smith & Waterman algorithm; finds best local homology) or Gap (uses the Needleman & Wunch algorithm; finds best full-length match)?

I think you said before that you weren't as interested in the actual alignment as you were in the statistics. Is percent idenity and length of match enough? For example, the header section of Bestfit includes these data (in addition to the alignment):

QUOTE
Quality: 325 Length: 349
Ratio: 1.105 Gaps: 10
Percent Similarity: 72.803 Percent Identity: 72.803


Is this enough?

-HomeBrew-

Pages: 1 2 Next