Decisions in sequence-chromatogram-analysis - how do you decide wihich FASTA-Code is the best? (Sep/05/2006 )
HI!
I just started working with sequnce analysis, and due to that i have some questions concerining the analysis of chromatograms.
Wich FASTA-Code would you set at the second peak?
Wich FASTA-Code would you set at the second peak?
Or do you always take the highest? - isn't it lost of information (peak 5 and 13)
Wich FASTA-Code would you set at the second and which code at th 5th peak?
Do you have a minimum-amplitude that is "acceptable" or do you accept every distict peak (peak 5)
Are peak 8 and 9 two peaks for you? Or is it just 1 A
I this 4 or just 3 A for you?
I'm waiting eagerly for your replies!
CU
Hi!
Difficult decisions...
No. 1 & No. 2:
I can't say... it is possible that the higher peak could be the right one... but on the other hand the lower peak can be a shifted one, in case 1 that would mean CTCTT... I would like to prefer this solution because you have very little background in No. 1 & 2.
Well, it's quite hard to say anyway. Maybe you should do a second sequencing and compare the peaks you get then with the ones you have...
No. 3:
Here it could be the same situation... but to see if it's background you should look at a bigger range of the diagramm.
No. 4:
Well, it looks like a "clear" double peak to me, that means AA.
No. 5:
It's AAAA. Do you see the four little hunches?
Hope my suggestions may help you!
Greetings,
Chakchel
Part of the answer to your question I feel is related to what you've been sequencing. If you're for instance sequencing the genome of an RNA virus (that due to the lack of proofreading of reverse transcriptase) is very diverse, you might be seeing superimposition of 2 different peaks. Just choosing one letter would loose a lot of potential information, and you would then be better of doing sequencing after cloning to make sure you got single genomes and not mixtures.
Another thing worth taking into consideration is where the ambiguities are to be found on the electropherogram. If they are around for instance the 100 base, you'd still expect nice peaks, but after maybe 700 or so (depending on the quality of your template and the reagents/primers you're using in the sequencing) your electropherogram becomes way harder to almost impossible to interpret...
If direct sequence of known human DNA
1. C/T heterozygote (can confirm with reverse r/n). Refer to reference sequence.
2. C/T, A/T, G/T heterozygote - odd! (can confirm with reverse r/n). Refer to reference sequence.
3. A/T, A/G heterozygote (can confirm with reverse r/n). Refer to reference sequence.
4. Clear G, AA.
5. Can't tell, could be; yes if reference sequence is AAAA. Confirm with reverse r/n if need be.
Also take note of quality scores, signal strength and signal to noise ratio.
Also agree with vairus - dependant on what you're sequencing.
Some of this kind of stuff is due to the sensitivities of the machines. ABI's used to have trouble calling As after Gs, for example, but I'm not sure about the newer models. Sometimes, these "machine limitation" miscalls can be overcome by sequecing in the opposite direction (which is why this is done all the time), because then the A after the G is then a C after a T, and thus the problem does not exist.
The other stuff takes experience and (importatntly) good sequence -- low noise, strong peaks, and good signal intensities.
There's a pretty good overview of some of this stuff here.
HI!
These were environmental-samples of bacterial communities, DNA from excised DGGE-bands, so it is possible that there are different sequences.
But: today i read that in BLASTN degenerate nucleotide codes (like K, S, Y) are treated as mismathces in nucleotide-alignment. If there are to much BLAST will reject the sequence.
What to do now?
And how do the other Programmes like ARB and RDP interprete degenerate nucleotide codes?
And Another queston:
Is there a software that finds not observed peaks in my sequence? (third peak)
CU
Is there a software that finds not observed peaks in my sequence? (third peak)
the software that comes with the sequencer should be able to be configured to include the missing peaks (threshold setting, etc.). if you don't have access to the software then you can manually edit the sequence.
there is a company which has a website that you can submit your file and they will improve the data. they only accept abi (7300, i think) data and i don't remember the url but you may be able to find them with google.