SRS - Swissprot_blastp - (Mar/24/2007 )
Hello,
I need some help with blasting a sequence (LPLNITYATWAGLGLVLTTV) and do a Multiple Sequence Alignment (MSA). I need to get a reliable set of sequences to build the MSA.
I need to BLAST the sequence LPLNITYATWAGLGLVLTTV against Swissprot using the SRS site.
I dont know were to find this specific blastp funktion within the SRS. I go to the analysis tolls > expand Similarity Search Tools > NCBI BLASTP. Here I dont know if this is coorect blastp or not! and the databas i blast from is against UniProtKB/Swiss-Prot. is there anyother blstp funktion and not NCBI blastp only?
The problems later I get is that NCBI BLASTP revealing the part of the hit sequence that matches the query! I need to build a proper MSA to retrieve the full sequences from Swissprot. This I dont know wow to do!
To do this I have to go to SRS Swissprot entry for this sequence and I dont know if this is the real site http://srs.ebi.ac.uk/ or maybe there is anotherone. As been told from friends It should be a sidebar to the left there is an option to launch an analysis tool and from there i can pick BlastP from a drop-down. I also want to build a proper MSA to retrieve the full sequences from Swissprot. This is really easy when using SRS, but i dont know how?
Sincerely
M
Are you looking something like this?
temp_job1_uniprot_2_QACC_STAAU LPLNITYATWAGLGLVLTTV
temp_job1_uniprot_3_QACG_STAS9 LPLNITYATWAGLGLVLTTI
temp_job1_uniprot_4_QACH_STASA LPLNITYASWAGLGLVLTTI
temp_job1_uniprot_5_EBRA_BACSU IPLSLSYATWSGAGTVLTTV
temp_job1_uniprot_6_EBRA_BACLD IPLSLSYATWSGAGTVLTAI
temp_job1_uniprot_7_QACF_ENTAE IPVGIAYAVWAGLGIVL---
temp_job1_uniprot_8_EBRB_BACLD LPLSAAYATWAGTGTALT--
temp_job1_uniprot_9_QACE_KLEAE IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_10_QACE_ECOL IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_11_EBR_SALTY IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_12_EBR_PSEAE IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_13_EBR_ECOLI IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_14_YVAE_BACS IDVSVAYAVWSGMGIVLITV
: :. :** *:* * .:
It´s so simple.
1. Go to Tools > Packages Information > NCBI BLAST > NCBI BLASTP - Launch.
2. At Blast-P page, paste your sequence as query string, and select UniprotKB/Swissprot, jeje and push the red button named "Launch". You will be redirected to a webpage with a text like this:
Tool command:
Use Batch job status page to view the results
3. At the results page you will see
Job Name Status Start Date Results from Result Set Queue Name
temp_blastp_1 24-Mar-2007
18:39
BLASTP Q1: 14 extsrv_interactive -R 'ncbib order[p_ncbib:r15s:pg]' -L /bin/sh(batch)
4 Select all the hits and in the left menu, Result Options>Launch analysis tool>ClustalW> Launch and you wil be redirected to the clustalW page, just click Launch to run ClustalW, then proceed like blast to retrive your results
Hi,
I know this and it is simple! but the point is that BLAST has an annoying habit of only revealing the part of the hit sequence that matches the query, so in order to build a proper MSA I need to retrieve the full sequences from Swissprot. This is really easy when using SRS, but I dont know how:(
M.
So you need this:
MIAGYIFLLIAILSEAAAAAMLKISDGFARWQPSVLVVIGYGLAFYMMSLTLQVIPLSLS
YATWSGAGTVLTAIIGVLWFQEKLNRRNIAGIICLVSGVVLINLS
>sw|O31792|EBRA_BACSU Multidrug resistance protein ebrA.
MLIGYIFLTIAICSESIGAAMLKVSDGFKKWKPSALVVIGYSLAFYMLSLTLNHIPLSLS
YATWSGAGTVLTTVIGVKWFKEDLNAKGLIGILLLLSGVVLLNWP
>sw|Q65JB2|EBRB_BACLD Multidrug resistance protein ebrB.
MKGMIFLAAAILSEVFGSTMLKLSEGFSAPLPAAGVIIGFAASFTFLSFSLKTLPLSAAY
ATWAGTGTALTAAIGHFIFQEPFNLKTLIGLTLIIGGVFLLNSKRTEAADQKAQLTIEI
>sw|P0AA22|EBR_ECOLI Putative ethidium bromide resistance protein (E1 protein).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW
>sw|P0AA24|EBR_PSEAE Putative ethidium bromide resistance protein (E1 protein).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW
>sw|P0AA23|EBR_SALTY Putative ethidium bromide resistance protein (E1 protein).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW
>sw|P14319|QACC_STAAU Quaternary ammonium compound-resistance protein qacC (Quaternary ammonium determinant C) (Ethidium bromide resistance protein) (Multidrug resistance protein).
MPYIYLIIAISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNITYA
TWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH
>sw|Q55339|QACC_STASS Quaternary ammonium compound-resistance protein qacC (Quaternary ammonium determinant C).
MPYIYLIISISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNITYA
TWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH
>sw|P0AGC9|QACE_ECOLI Quaternary ammonium compound-resistance protein qacE (Quaternary ammonium determinant E).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH
>sw|P0AGD0|QACE_KLEAE Quaternary ammonium compound-resistance protein qacE (Quaternary ammonium determinant E).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH
>sw|Q9X2N9|QACF_ENTAE Quaternary ammonium compound-resistance protein qacF (Quaternary ammonium determinant F).
MKNWIFLAVSIFGEVIATSALKSSHGFTRLVPSVVVVAGYGLAFYFLSLALKSIPVGIAY
AVWAGLGIVLVAAIAWIFHGQKLDFWAFIGMGLIVSGVAVLNLLSKVSAH
>sw|O87866|QACG_STAS9 Quaternary ammonium compound-resistance protein qacG (Quaternary ammonium determinant G).
MHYLYLFISIATEIIGTSFLKTSEGFTKLWPTLGTLLSFGICFYFLSLTIKFLPLNITYA
TWAGLGLVLTTIISVIVFKENVNLISIISIGLIVIGVVLLNVFGESH
>sw|O87868|QACH_STASA Quaternary ammonium compound-resistance protein qacH (Quaternary ammonium determinant H).
MPYLYLLLSIVSEVIGSAFLKSSDGFSKLYPTITTIISFLICFYFLSKTMQHLPLNITYA
SWAGLGLVLTTIVSVLIFKEQINLISIISIILIIFGVVLLNTFGSSH
>sw|O32227|YVAE_BACSU Hypothetical protein yvaE.
MNWVFLCLAILFEVAGTVSMKLSSGFTKLIPSLLLIFFYGGSLFFLTLTLKSIDVSVAYA
VWSGMGIVLITVVGFLFFQEHVSVMKVISIGLIIAGVVSLNLIEHVAVSEPVHKSGQYK
EBRA_BACSU MLIGYIFLTIAICSESIGAAMLKVSDGFKKWKPSALVVIGYSLAFYMLSLTLNHIPLSLS
EBRB_BACLD -MKGMIFLAAAILSEVFGSTMLKLSEGFSAPLPAAGVIIGFAASFTFLSFSLKTLPLSAA
EBR_ECOLI -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
EBR_PSEAE -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
EBR_SALTY -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACC_STAAU -MP-YIYLIIAISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNIT
QACC_STASS -MP-YIYLIISISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNIT
QACE_ECOLI -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACE_KLEAE -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACF_ENTAE -MKNWIFLAVSIFGEVIATSALKSSHGFTRLVPSVVVVAGYGLAFYFLSLALKSIPVGIA
QACG_STAS9 -MH-YLYLFISIATEIIGTSFLKTSEGFTKLWPTLGTLLSFGICFYFLSLTIKFLPLNIT
QACH_STASA -MP-YLYLLLSIVSEVIGSAFLKSSDGFSKLYPTITTIISFLICFYFLSKTMQHLPLNIT
YVAE_BACSU -MN-WVFLCLAILFEVAGTVSMKLSSGFTKLIPSLLLIFFYGGSLFFLTLTLKSIDVSVA
: ::* :* * .: :* * ** *: : : .: ::: :: : :. :
EBRA_BACLD YATWSGAGTVLTAIIGVLWFQEKLNRRNIAGIICLVSGVVLINLS---------------
EBRA_BACSU YATWSGAGTVLTTVIGVKWFKEDLNAKGLIGILLLLSGVVLLNWP---------------
EBRB_BACLD YATWAGTGTALTAAIGHFIFQEPFNLKTLIGLTLIIGGVFLLNSKRTEAADQKAQLTIEI
EBR_ECOLI YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
EBR_PSEAE YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
EBR_SALTY YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
QACC_STAAU YATWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH-----------
QACC_STASS YATWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH-----------
QACE_ECOLI YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH---------
QACE_KLEAE YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH---------
QACF_ENTAE YAVWAGLGIVLVAAIAWIFHGQKLDFWAFIGMGLIVSGVAVLNLLSKVSAH---------
QACG_STAS9 YATWAGLGLVLTTIISVIVFKENVNLISIISIGLIVIGVVLLNVFGESH-----------
QACH_STASA YASWAGLGLVLTTIVSVLIFKEQINLISIISIILIIFGVVLLNTFGSSH-----------
YVAE_BACSU YAVWSGMGIVLITVVGFLFFQEHVSVMKVISIGLIIAGVVSLNLIEHVAVSEPVHKSGQY
** *:* * .: : :. . : .. . .: :: .. .
EBRA_BACLD -
EBRA_BACSU -
EBRB_BACLD -
EBR_ECOLI -
EBR_PSEAE -
EBR_SALTY -
QACC_STAAU -
QACC_STASS -
QACE_ECOLI -
QACE_KLEAE -
QACF_ENTAE -
QACG_STAS9 -
QACH_STASA -
YVAE_BACSU KEBRA_BACLD MIAGYIFLLIAILSEAAAAAMLKISDGFARWQPSVLVVIGYGLAFYMMSLTLQVIPLSLS
EBRA_BACSU MLIGYIFLTIAICSESIGAAMLKVSDGFKKWKPSALVVIGYSLAFYMLSLTLNHIPLSLS
EBRB_BACLD -MKGMIFLAAAILSEVFGSTMLKLSEGFSAPLPAAGVIIGFAASFTFLSFSLKTLPLSAA
EBR_ECOLI -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
EBR_PSEAE -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
EBR_SALTY -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACC_STAAU -MP-YIYLIIAISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNIT
QACC_STASS -MP-YIYLIISISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNIT
QACE_ECOLI -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACE_KLEAE -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACF_ENTAE -MKNWIFLAVSIFGEVIATSALKSSHGFTRLVPSVVVVAGYGLAFYFLSLALKSIPVGIA
QACG_STAS9 -MH-YLYLFISIATEIIGTSFLKTSEGFTKLWPTLGTLLSFGICFYFLSLTIKFLPLNIT
QACH_STASA -MP-YLYLLLSIVSEVIGSAFLKSSDGFSKLYPTITTIISFLICFYFLSKTMQHLPLNIT
YVAE_BACSU -MN-WVFLCLAILFEVAGTVSMKLSSGFTKLIPSLLLIFFYGGSLFFLTLTLKSIDVSVA
: ::* :* * .: :* * ** *: : : .: ::: :: : :. :
EBRA_BACLD YATWSGAGTVLTAIIGVLWFQEKLNRRNIAGIICLVSGVVLINLS---------------
EBRA_BACSU YATWSGAGTVLTTVIGVKWFKEDLNAKGLIGILLLLSGVVLLNWP---------------
EBRB_BACLD YATWAGTGTALTAAIGHFIFQEPFNLKTLIGLTLIIGGVFLLNSKRTEAADQKAQLTIEI
EBR_ECOLI YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
EBR_PSEAE YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
EBR_SALTY YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
QACC_STAAU YATWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH-----------
QACC_STASS YATWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH-----------
QACE_ECOLI YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH---------
QACE_KLEAE YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH---------
QACF_ENTAE YAVWAGLGIVLVAAIAWIFHGQKLDFWAFIGMGLIVSGVAVLNLLSKVSAH---------
QACG_STAS9 YATWAGLGLVLTTIISVIVFKENVNLISIISIGLIVIGVVLLNVFGESH-----------
QACH_STASA YASWAGLGLVLTTIVSVLIFKEQINLISIISIILIIFGVVLLNTFGSSH-----------
YVAE_BACSU YAVWSGMGIVLITVVGFLFFQEHVSVMKVISIGLIIAGVVSLNLIEHVAVSEPVHKSGQY
** *:* * .: : :. . : .. . .: :: .. .
EBRA_BACLD -
EBRA_BACSU -
EBRB_BACLD -
EBR_ECOLI -
EBR_PSEAE -
EBR_SALTY -
QACC_STAAU -
QACC_STASS -
QACE_ECOLI -
QACE_KLEAE -
QACF_ENTAE -
QACG_STAS9 -
QACH_STASA -
YVAE_BACSU K
To get all the sequences, at the blast results page, in the left menu "Apply Options to" > check "unselected results only" then "Result Options" > "Link to related information:", you will be redirected to a search page to "Find entries related to current query: in other databanks" here select "UniProt Universal Protein Resource" > "UniProtKB/Swiss-Prot" then in the left menu "Link Options" > "Find related entries" and click in the search red button (if you select "Refine Query - show only results with related entries" you get only target sequences, like the blast results). And you will be redirected to a page with all the hits related to the complete proteins of the blast results.
Then you can launch clustalW for align the sequences or get a fasta format to store and align locally.
I Hope it helps
Thanx!
I am new at this and thanx for the help:)