cloning artifact? - (Apr/02/2007 )
Hello,
I'm having some problems with the sequences from cloned PCR products. Im cloning a 600 bp fragment of a (supposedly) single copy gene, using proofreading enzyme, and Im sequencing 5-10 clones for each sample. My problem is that almost every clone I sequence is different. For example, for this one individual, I sequenced more than 10 clones, and all of them where different. There were actually two groups differing by more than 10 bp, and within each group sequences had 1-2 bp differences. Other less common issues include some sequences having an extra bit at the 3' end, like 100 bp long, which so far I have only encountered in 2 of about 100 sequences, and two sequences that can be aligned only on half of the gene, the other half is just unalignable.
So, does anyone know of any pcr, cloning or sequencing artifacts that could account for these problems? When you sequence multiple clones from one individual do you expect to get a small variability between sequences due to error even if they are the same allele?
Im using stratagene PCR (Easy-A) and cloning kit (Cat 240205).
thanks,
hans
what is the history of this DNA, is it cDNA?
Are you sure these reading are right? Did the forward and reverse primers seqeuncing reads of the same clone give the same sequence? COuld you show us the chromas reading/line diagram?
Coud this because by really really poor sequencing data?
Are the 'mistakes' in regions of homonucleotide repeats? Or are the deletions in repeat regions?
Can you confirm that your sequencing reaction condition work well.
Hi,
Yep, Im sure the chromatograms readings are correct, they are clean and the signal is strong. Both forward and reverse strands give perfectly matching results. So, nope, it is not poor sequencing data.
The gene is not cDNA, its a gene fragment that include two exons and one intron in the middle.
There are no repeats in any of the sequence that I could notice, although I havent really looked at compositional biases or anything like that in these sequences.
The sequencing reaction, I think, is working well, it just that some of the clones come up with base differences. Its unlikely thats sequencing error, since both strands give the same sequence. So, maybe its the PCR... Right now Im trying a different enzyme, pfu Ultra II (also from stratagene), which has the lowest error rate available. I will see if that makes any difference.
the only other option I can come up with is that it is not a single copy gene... but I dont want to jump there unless Im sure nothing else could explain it.
Are you sure these reading are right? Did the forward and reverse primers seqeuncing reads of the same clone give the same sequence? COuld you show us the chromas reading/line diagram?
Coud this because by really really poor sequencing data?
Are the 'mistakes' in regions of homonucleotide repeats? Or are the deletions in repeat regions?
Can you confirm that your sequencing reaction condition work well.
Do you get a single, strong band from your PCR reaction? My guess is that your PCR reaction is picking up pieces of DNA other than your gene of interest. Try raising the annealing temperature to make the reaction more specific. What gene is it? It's not an antibody gene, for example... Where did the primers come from? Are they annealing to common motifs which might be found in multiple genes? What is the GC content of the primers and the template? Very ow GC regions can cause confusing things to happen if the extension temperature is too high.
Hi,
Well, I do get a single bright band usually, but my annealing temp. is low (48), so I could try rainsing it. Its a protein coding gene, involved in the sex determination pathway of bees. I designed the primers, and their GC content is about 40%. But, still I dont see how this would explain my results, since Im getting sequences that are very similar to eachother, not some random piece of dna. But, yes, it could be something with the PCR. Anyway, the problem is why a single individual is producing more than the 2 expected alleles for this gene. my best guess right now is just pcr error.. at least for the sequences that only have 1-2 bp differences.