How to identify unknown sequence? - No BLAST identity!! (Nov/05/2008 )
Hi guys! (and gals!)
I've got a cDNA sequence (several actually) after differential-display rt-pcr. I purified it and cloned it into TA plasmids then sequenced. When I BLASTn with any existing database on C. albicans no significant similarity could be found to any gene, in fact, no C. albicans identity at all! The BLASTn result gave low max identity to other organisms. Even BLASTx.
Does this mean my cDNA is not from C albicans at all? (But BLAST genome gave whole-genome shotgun sequence for some of my unidentified cDNAs).I'll design primers based on my cDNA sequence and amplify total DNA and sequence and see what I get, but if there's really no BLAST identity is it a novel gene?
SInce the C. albicans genome has been sequenced, it is even possible that this happens? Is my cDNA finding anything significant? What can I do next? Use the cDNA as labeled probe to the CA genome library?
Thanks much,
Chris with a headache
Hi,
If BLAST don't return a hit, probably it's not in the database. It does not mean that your cDNA is not from C albicans.
It might be a novel gene (but then again other labs might be working on it too, and decided not to submit nor publish their findings)
For you to decide
As suggested, you can do Southern analyses on genomic DNA isolated from CA to confirm. Next, with the sequence, you can try to run in silico test to guess what your cDNA is, e.g. any motifs, domains etc which are similar to know proteins. That's the cheapest and easiest you can do. If your sequence is too short, might want to consider getting the full length.
Best of luck
How much sequence do you have? If it's a very short sequence, there may not be any hits that reach mathematical significance, and thus none are returned. What happens if you just blast it against the entire GenBank database?
Hi Homebrew n MSG lover...
My cDNA is about 400bp in length. Is that considered too short I think it's pretty ok compared to my other cDNA products from DDRT-PCR (100-300bp only).
Homebrew, when I BLASTn with the whole GenBank database it returned me with low identity matches (<50%) to a few unrelated organisms like mouse, etc. Same with BLASTx.
There's 1 thing I don't quite understand... If the whole genome has been sequenced and entered in the database, how come I couldn't get an identity match in BLASTn? I have some sequences which are ORFs encoding for hypothetical proteins. That I understand cause the gene function is not known yet.
MSG lover, what in silico program would you recommend? We haven't any experience with this line of bioinformatics.
Thanks a lot,
Chris
You can try Expasy (http://www.expasy.ch/tools/), then decide what you want to know about your sequence from there. You can also browse the C albicans website and get the analysis tools there (http://www.candidagenome.org/), have you blast from there? I'm not sure how sync the database there with NCBI's, but worth a try to BLAST (http://www.candidagenome.org/cgi-bin/compute/blast-sgd.pl).
Bests
How does the sequencing trace look? Are the signal strengths adequate? You may have a sequencing artifact, like a mixed sequence, an n-1 sequence, etc. (see examples of artifactual sequences here).
Hi chris,
I am also working on differential display.but started recently. which kit did u use for differential display.
Thanks & Regards
I am also working on differential display.but started recently. which kit did u use for differential display.
Thanks & Regards
Hi nova,
I used the GeneHunter kit. Good luck to you.
MSG and Homebrew, I'll check my sequencing data and check the bioinfo tools at the CGD website first. Thanks a lot guys
Chris
Hi people,
Please refer to my first post in this thread, so I have checked the original DDRT sequence, it's clean and good. So anyway, I designed primers based on that sequence, amplified C albicans DNA, cloned into pGEM-T vector and sequenced that. The sequence for the DDRT product and cloned fragment is the same, so I know my unknown fragment really is part of the C albicans genome. However, when I BLAST both sequences against NCBI, Candida Genome Database and Candida DB, any sequence similarity to any CA gene wasn't good. So I amplified DNA with the primers I designed, and used that as a probe in colony hybridization to the CA genomic library. I PCR-labeled with biotin. One positive colony plasmid that I sequenced (size >4kb) also did not give satisfactory similarity to the databases above.
I would like to know if all this I did points to an uncharacterized gene of CA? When I BLAST with the supercontig database of the Candida Genome Database, a pretty good similarity came up, so can I say my fragment is really an unknown gene of CA?
Thanks for any comments.
Chris
What does it look like if you BLAST it against the whole genbank database? It doesn't necessarily have to be an "unknown" gene, but perhaps is a gene externally acquired (like a viral gene, or a trnasposon, or an intergrated plasmid) or one that resides in an area that varies between strains (like capsule polysaccharide genes do in bacteria).