track blast result changes over time - useful? - (Jun/04/2006 )
Hi, I'm in a bioinformatics support team at a university and one of the biologists has suggested that the following would be really useful:
+ Periodically run the same blast query and email me if the results change
I.e., notify him/her if new updates to the NCBI databases result in new results that are relevant.
I need some outside perspective before we build a solution. Any quick responses to the following would be really helpful:
* Is this something that other biologists would find useful?
* Would it be useful it it only worked against NCBI? Or would other databases be necessary?
* Any reason why we couldn't use the ncbi qblast interface to do this? Is there a better way?
thanks in advance for any comments!
hi, it's a trivial problem. I would use the ncbi interface and perl www::mechanize or a dirty script using lynx! All you have to do is then sync your run with the nr updates (or whatever database you are using). Then run a diff in the two files, if different sendmail else do nothing.
You could make it into something bigger by giving the user the option to blast various databases but considering the data is pretty much the same it hardly seems worth it.
Seems like a reasonable request to me - but I guess it depends on how many BLAST jobs you want to do too - if there are many thousands it will take some serious time and require more thought. I would do this locally rather than sending data openly over the internet.
Note that every time you BLAST a sequence against a new database, the e-values will change. You will need to parse the output and determine if the hits (accessions) are the same or not and work out whether there are new hits.
D
Yes, if it was popular, we'd definitely have to take it local, but its simpler to start with ncbi and see how much use it gets.
If it ever became that popular, that would definitely fit in the 'good problem to have' category :-)
Yes. Actually we'd like to do both ncbi and also apidb.org (though we'll have to do some custom parsing for the latter), since our local researchers work in that field.
Thank you both for your comments.
Anyone else think this would be useful for others, or not?
thanks! michael
Have a look here:
http://bioweb.pasteur.fr/docs/doc-gensoft/...EADME.DBWatcher
and
http://bioweb.pasteur.fr/docs/doc-gensoft/minore/readme.txt
d'oh - do you reinvent the wheel?
It is a good programming exercise! The question is, do you like using other peoples software?
Be aware that with some databases (commonly genomic ones, but others too) entries may be removed or modified in newer updates as more data becomes available and / or sequences are finished etc. This may change your alignments, scores and evalues in future BLAST hits (in the case the subject has changed), or make some of your hits redundant in the database (in the case the subject has been removed). A biologist looking at hits for a sequence, however, should identify this themselves and interpret such results carefully.
It is a good programming exercise! The question is, do you like using other peoples software?
Actually I hate reinventing the wheel! Thanks Jaknight for those links. With that as a starting point, I was finally able to tease some good hits out of google for this type of software. E.g.: either:
http://www.google.com/search?q=DBWatcher+blast, or
http://www.google.com/search?q=periodic+blast
are both good. Related tools include SEALS and ReHAB, which has a paper with yet more useful leads:
http://www.biomedcentral.com/1471-2105/6/23
You can actually run ReHAB from here: http://athena.bioc.uvic.ca/workbench.php?tool=rehab&db=
DBWatcher itself is quite old, but still available and still appears to work ok. We'd like a web version (of which a couple exist), but it is great to have a starting point that solves some of the matching problems.
And, my original question has been answered - there does appear to be a more general need for a tool like this, at least based on the existence of similar tools.
Thanks everyone for your help and comments!
This is not directly relevant to this question, however I don't think it is worth starting a new thread given that few people seem up for coding, however...
there is a new site for searching the web for code chunks - it's called krugel (krugle) - it is currently in beta but should soon hit the public - for those that code it could be handy
I tried to install both of the programs I listed - well, both install after a bit of hacking but neither works.
Minore seems to hang for no apparent reason, and DBwatcher also fails. DBwatcher queries ncbi and then reports no matches. There is probably a fix but I hate reading other people's code.
Odds are you'd be best to rig something up yourself. A simple script using blastcl3 for ncbi and then a perl program to parse the results. I'm actually tempted to write this as it wouldn't take more than a day to do, but I'm lacking the will power.