Protocol Online logo
Top : Forum Archives: : Bioinformatics and Biostatistics

SRS - Swissprot_blastp - (Mar/24/2007 )

Hello,

I need some help with blasting a sequence (LPLNITYATWAGLGLVLTTV) and do a Multiple Sequence Alignment (MSA). I need to get a reliable set of sequences to build the MSA.
I need to BLAST the sequence LPLNITYATWAGLGLVLTTV against Swissprot using the SRS site.
I dont know were to find this specific blastp funktion within the SRS. I go to the analysis tolls > expand Similarity Search Tools > NCBI BLASTP. Here I dont know if this is coorect blastp or not! and the databas i blast from is against UniProtKB/Swiss-Prot. is there anyother blstp funktion and not NCBI blastp only?

The problems later I get is that NCBI BLASTP revealing the part of the hit sequence that matches the query! I need to build a proper MSA to retrieve the full sequences from Swissprot. This I dont know wow to do! glare.gif

To do this I have to go to SRS Swissprot entry for this sequence and I dont know if this is the real site http://srs.ebi.ac.uk/ or maybe there is anotherone. As been told from friends It should be a sidebar to the left there is an option to launch an analysis tool and from there i can pick BlastP from a drop-down. I also want to build a proper MSA to retrieve the full sequences from Swissprot. This is really easy when using SRS, but i dont know how?

Sincerely

M

-milestone-

Are you looking something like this?

CODE
temp_job1_uniprot_1_QACC_STASS      LPLNITYATWAGLGLVLTTV
temp_job1_uniprot_2_QACC_STAAU      LPLNITYATWAGLGLVLTTV
temp_job1_uniprot_3_QACG_STAS9      LPLNITYATWAGLGLVLTTI
temp_job1_uniprot_4_QACH_STASA      LPLNITYASWAGLGLVLTTI
temp_job1_uniprot_5_EBRA_BACSU      IPLSLSYATWSGAGTVLTTV
temp_job1_uniprot_6_EBRA_BACLD      IPLSLSYATWSGAGTVLTAI
temp_job1_uniprot_7_QACF_ENTAE      IPVGIAYAVWAGLGIVL---
temp_job1_uniprot_8_EBRB_BACLD      LPLSAAYATWAGTGTALT--
temp_job1_uniprot_9_QACE_KLEAE      IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_10_QACE_ECOL      IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_11_EBR_SALTY      IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_12_EBR_PSEAE      IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_13_EBR_ECOLI      IPVGVAYAVWSGLGVVIIT-
temp_job1_uniprot_14_YVAE_BACS      IDVSVAYAVWSGMGIVLITV
                                    : :. :** *:* * .:

It´s so simple.

1. Go to Tools > Packages Information > NCBI BLAST > NCBI BLASTP - Launch.

2. At Blast-P page, paste your sequence as query string, and select UniprotKB/Swissprot, jeje and push the red button named "Launch". You will be redirected to a webpage with a text like this:
QUOTE
Tool was submitted to Queue:extsrv_interactive -R 'ncbib order[p_ncbib:r15s:pg]' -L /bin/sh(batch).
Tool command:

Use Batch job status page to view the results
You need to wait until you see this image change from to at the top right corner of this page. Then you can click the link results or click results at the top menu of page.

3. At the results page you will see
QUOTE
List of Batch Jobs
Job Name Status Start Date Results from Result Set Queue Name
temp_blastp_1 24-Mar-2007
18:39
BLASTP Q1: 14 extsrv_interactive -R 'ncbib order[p_ncbib:r15s:pg]' -L /bin/sh(batch)
click temp_blastp_1 and you will be redirected to a page with all the hits from blast.

4 Select all the hits and in the left menu, Result Options>Launch analysis tool>ClustalW> Launch and you wil be redirected to the clustalW page, just click Launch to run ClustalW, then proceed like blast to retrive your results

-djv0022-

Hi,
I know this and it is simple! but the point is that BLAST has an annoying habit of only revealing the part of the hit sequence that matches the query, so in order to build a proper MSA I need to retrieve the full sequences from Swissprot. This is really easy when using SRS, but I dont know how:(

M.

-milestone-

So you need this:

CODE
>sw|Q65JB1|EBRA_BACLD Multidrug resistance protein ebrA.
MIAGYIFLLIAILSEAAAAAMLKISDGFARWQPSVLVVIGYGLAFYMMSLTLQVIPLSLS
YATWSGAGTVLTAIIGVLWFQEKLNRRNIAGIICLVSGVVLINLS
>sw|O31792|EBRA_BACSU Multidrug resistance protein ebrA.
MLIGYIFLTIAICSESIGAAMLKVSDGFKKWKPSALVVIGYSLAFYMLSLTLNHIPLSLS
YATWSGAGTVLTTVIGVKWFKEDLNAKGLIGILLLLSGVVLLNWP
>sw|Q65JB2|EBRB_BACLD Multidrug resistance protein ebrB.
MKGMIFLAAAILSEVFGSTMLKLSEGFSAPLPAAGVIIGFAASFTFLSFSLKTLPLSAAY
ATWAGTGTALTAAIGHFIFQEPFNLKTLIGLTLIIGGVFLLNSKRTEAADQKAQLTIEI
>sw|P0AA22|EBR_ECOLI Putative ethidium bromide resistance protein (E1 protein).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW
>sw|P0AA24|EBR_PSEAE Putative ethidium bromide resistance protein (E1 protein).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW
>sw|P0AA23|EBR_SALTY Putative ethidium bromide resistance protein (E1 protein).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW
>sw|P14319|QACC_STAAU Quaternary ammonium compound-resistance protein qacC (Quaternary ammonium determinant C) (Ethidium bromide resistance protein) (Multidrug resistance protein).
MPYIYLIIAISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNITYA
TWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH
>sw|Q55339|QACC_STASS Quaternary ammonium compound-resistance protein qacC (Quaternary ammonium determinant C).
MPYIYLIISISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNITYA
TWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH
>sw|P0AGC9|QACE_ECOLI Quaternary ammonium compound-resistance protein qacE (Quaternary ammonium determinant E).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH
>sw|P0AGD0|QACE_KLEAE Quaternary ammonium compound-resistance protein qacE (Quaternary ammonium determinant E).
MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVAY
AVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH
>sw|Q9X2N9|QACF_ENTAE Quaternary ammonium compound-resistance protein qacF (Quaternary ammonium determinant F).
MKNWIFLAVSIFGEVIATSALKSSHGFTRLVPSVVVVAGYGLAFYFLSLALKSIPVGIAY
AVWAGLGIVLVAAIAWIFHGQKLDFWAFIGMGLIVSGVAVLNLLSKVSAH
>sw|O87866|QACG_STAS9 Quaternary ammonium compound-resistance protein qacG (Quaternary ammonium determinant G).
MHYLYLFISIATEIIGTSFLKTSEGFTKLWPTLGTLLSFGICFYFLSLTIKFLPLNITYA
TWAGLGLVLTTIISVIVFKENVNLISIISIGLIVIGVVLLNVFGESH
>sw|O87868|QACH_STASA Quaternary ammonium compound-resistance protein qacH (Quaternary ammonium determinant H).
MPYLYLLLSIVSEVIGSAFLKSSDGFSKLYPTITTIISFLICFYFLSKTMQHLPLNITYA
SWAGLGLVLTTIVSVLIFKEQINLISIISIILIIFGVVLLNTFGSSH
>sw|O32227|YVAE_BACSU Hypothetical protein yvaE.
MNWVFLCLAILFEVAGTVSMKLSSGFTKLIPSLLLIFFYGGSLFFLTLTLKSIDVSVAYA
VWSGMGIVLITVVGFLFFQEHVSVMKVISIGLIIAGVVSLNLIEHVAVSEPVHKSGQYK


CODE
EBRA_BACLD MIAGYIFLLIAILSEAAAAAMLKISDGFARWQPSVLVVIGYGLAFYMMSLTLQVIPLSLS
EBRA_BACSU MLIGYIFLTIAICSESIGAAMLKVSDGFKKWKPSALVVIGYSLAFYMLSLTLNHIPLSLS
EBRB_BACLD -MKGMIFLAAAILSEVFGSTMLKLSEGFSAPLPAAGVIIGFAASFTFLSFSLKTLPLSAA
EBR_ECOLI -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
EBR_PSEAE -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
EBR_SALTY -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACC_STAAU -MP-YIYLIIAISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNIT
QACC_STASS -MP-YIYLIISISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNIT
QACE_ECOLI -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACE_KLEAE -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACF_ENTAE -MKNWIFLAVSIFGEVIATSALKSSHGFTRLVPSVVVVAGYGLAFYFLSLALKSIPVGIA
QACG_STAS9 -MH-YLYLFISIATEIIGTSFLKTSEGFTKLWPTLGTLLSFGICFYFLSLTIKFLPLNIT
QACH_STASA -MP-YLYLLLSIVSEVIGSAFLKSSDGFSKLYPTITTIISFLICFYFLSKTMQHLPLNIT
YVAE_BACSU -MN-WVFLCLAILFEVAGTVSMKLSSGFTKLIPSLLLIFFYGGSLFFLTLTLKSIDVSVA
: ::* :* * .: :* * ** *: : : .: ::: :: : :. :

EBRA_BACLD YATWSGAGTVLTAIIGVLWFQEKLNRRNIAGIICLVSGVVLINLS---------------
EBRA_BACSU YATWSGAGTVLTTVIGVKWFKEDLNAKGLIGILLLLSGVVLLNWP---------------
EBRB_BACLD YATWAGTGTALTAAIGHFIFQEPFNLKTLIGLTLIIGGVFLLNSKRTEAADQKAQLTIEI
EBR_ECOLI YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
EBR_PSEAE YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
EBR_SALTY YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
QACC_STAAU YATWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH-----------
QACC_STASS YATWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH-----------
QACE_ECOLI YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH---------
QACE_KLEAE YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH---------
QACF_ENTAE YAVWAGLGIVLVAAIAWIFHGQKLDFWAFIGMGLIVSGVAVLNLLSKVSAH---------
QACG_STAS9 YATWAGLGLVLTTIISVIVFKENVNLISIISIGLIVIGVVLLNVFGESH-----------
QACH_STASA YASWAGLGLVLTTIVSVLIFKEQINLISIISIILIIFGVVLLNTFGSSH-----------
YVAE_BACSU YAVWSGMGIVLITVVGFLFFQEHVSVMKVISIGLIIAGVVSLNLIEHVAVSEPVHKSGQY
** *:* * .: : :. . : .. . .: :: .. .

EBRA_BACLD -
EBRA_BACSU -
EBRB_BACLD -
EBR_ECOLI -
EBR_PSEAE -
EBR_SALTY -
QACC_STAAU -
QACC_STASS -
QACE_ECOLI -
QACE_KLEAE -
QACF_ENTAE -
QACG_STAS9 -
QACH_STASA -
YVAE_BACSU KEBRA_BACLD MIAGYIFLLIAILSEAAAAAMLKISDGFARWQPSVLVVIGYGLAFYMMSLTLQVIPLSLS
EBRA_BACSU MLIGYIFLTIAICSESIGAAMLKVSDGFKKWKPSALVVIGYSLAFYMLSLTLNHIPLSLS
EBRB_BACLD -MKGMIFLAAAILSEVFGSTMLKLSEGFSAPLPAAGVIIGFAASFTFLSFSLKTLPLSAA
EBR_ECOLI -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
EBR_PSEAE -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
EBR_SALTY -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACC_STAAU -MP-YIYLIIAISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNIT
QACC_STASS -MP-YIYLIISISTEVIGSAFLKSSEGFSKFIPSLGTIISFGICFYFLSKTMQHLPLNIT
QACE_ECOLI -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACE_KLEAE -MKGWLFLVIAIVGEVIATSALKSSEGFTKLAPSAVVIIGYGIAFYFLSLVLKSIPVGVA
QACF_ENTAE -MKNWIFLAVSIFGEVIATSALKSSHGFTRLVPSVVVVAGYGLAFYFLSLALKSIPVGIA
QACG_STAS9 -MH-YLYLFISIATEIIGTSFLKTSEGFTKLWPTLGTLLSFGICFYFLSLTIKFLPLNIT
QACH_STASA -MP-YLYLLLSIVSEVIGSAFLKSSDGFSKLYPTITTIISFLICFYFLSKTMQHLPLNIT
YVAE_BACSU -MN-WVFLCLAILFEVAGTVSMKLSSGFTKLIPSLLLIFFYGGSLFFLTLTLKSIDVSVA
: ::* :* * .: :* * ** *: : : .: ::: :: : :. :

EBRA_BACLD YATWSGAGTVLTAIIGVLWFQEKLNRRNIAGIICLVSGVVLINLS---------------
EBRA_BACSU YATWSGAGTVLTTVIGVKWFKEDLNAKGLIGILLLLSGVVLLNWP---------------
EBRB_BACLD YATWAGTGTALTAAIGHFIFQEPFNLKTLIGLTLIIGGVFLLNSKRTEAADQKAQLTIEI
EBR_ECOLI YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
EBR_PSEAE YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
EBR_SALTY YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIIAAFLLARSPSWKSLRRPTPW----
QACC_STAAU YATWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH-----------
QACC_STASS YATWAGLGLVLTTVVSIIIFKEQINLITIVSIVLIIVGVVSLNIFGTSH-----------
QACE_ECOLI YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH---------
QACE_KLEAE YAVWSGLGVVIITAIAWLLHGQKLDAWGFVGMGLIVSGVVVLNLLSKASAH---------
QACF_ENTAE YAVWAGLGIVLVAAIAWIFHGQKLDFWAFIGMGLIVSGVAVLNLLSKVSAH---------
QACG_STAS9 YATWAGLGLVLTTIISVIVFKENVNLISIISIGLIVIGVVLLNVFGESH-----------
QACH_STASA YASWAGLGLVLTTIVSVLIFKEQINLISIISIILIIFGVVLLNTFGSSH-----------
YVAE_BACSU YAVWSGMGIVLITVVGFLFFQEHVSVMKVISIGLIIAGVVSLNLIEHVAVSEPVHKSGQY
** *:* * .: : :. . : .. . .: :: .. .

EBRA_BACLD -
EBRA_BACSU -
EBRB_BACLD -
EBR_ECOLI -
EBR_PSEAE -
EBR_SALTY -
QACC_STAAU -
QACC_STASS -
QACE_ECOLI -
QACE_KLEAE -
QACF_ENTAE -
QACG_STAS9 -
QACH_STASA -
YVAE_BACSU K


To get all the sequences, at the blast results page, in the left menu "Apply Options to" > check "unselected results only" then "Result Options" > "Link to related information:", you will be redirected to a search page to "Find entries related to current query: in other databanks" here select "UniProt Universal Protein Resource" > "UniProtKB/Swiss-Prot" then in the left menu "Link Options" > "Find related entries" and click in the search red button (if you select "Refine Query - show only results with related entries" you get only target sequences, like the blast results). And you will be redirected to a page with all the hits related to the complete proteins of the blast results.

Then you can launch clustalW for align the sequences or get a fasta format to store and align locally.

I Hope it helps

-djv0022-

Thanx!
I am new at this and thanx for the help:)

-milestone-