Sequence Search and Analysis [email protected] SPE 1653 (703) 308-2923.
-
Upload
bertina-moody -
Category
Documents
-
view
212 -
download
0
Transcript of Sequence Search and Analysis [email protected] SPE 1653 (703) 308-2923.
Sequence SearchSequence Searchand Analysisand Analysis
[email protected]@uspto.govSPE 1653 (703) 308-2923SPE 1653 (703) 308-2923
Biosequence Patent Biosequence Patent SearchSearch
Mission Impossible - ?Mission Impossible - ?
Mission Difficult - ?Mission Difficult - ?
Sample Searchable Public DatabasesSample Searchable Public Databases
National Center for Biotechnology National Center for Biotechnology Information (NCBI) Information (NCBI) EntrezEntrez– www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov
European Bioinformatics Institute (EBI)European Bioinformatics Institute (EBI)– www.ebi.ac.ukwww.ebi.ac.uk
DNA DataBank of Japan (DDBJ)DNA DataBank of Japan (DDBJ)– www.ddbj.nig.ac.jpwww.ddbj.nig.ac.jp
SwissProt, PIR, etc do not cover patentsSwissProt, PIR, etc do not cover patents
NCBI NCBI EntrezEntrez
NCBI GenbankNCBI Genbank– In collaboration with EMBL and DDBJIn collaboration with EMBL and DDBJ
Databases from other producersDatabases from other producers– SwissProt, TrEMBL, PDB, PIR, etcSwissProt, TrEMBL, PDB, PIR, etc
Bibliographic databasesBibliographic databases– E.g., PubMed (MEDLINE)E.g., PubMed (MEDLINE)
NCBI BLASTNCBI BLAST®® sequence searching sequence searching
EMBL-EBI on the WebEMBL-EBI on the Web
EMBL databasesEMBL databases– EMBL Nucleotide Database (i.e. GenBank)EMBL Nucleotide Database (i.e. GenBank)– Translated EMBL (TrEMBL)Translated EMBL (TrEMBL)
Databases from other producersDatabases from other producers– SwissProt, PDB, etcSwissProt, PDB, etc
Many sequence search options: FASTA, Many sequence search options: FASTA, NCBI-BLAST, WU-Blast, Smith-WatermanNCBI-BLAST, WU-Blast, Smith-Waterman
DDBJ via the WebDDBJ via the Web
DDBJ databasesDDBJ databases– DNA DataBank of Japan (i.e. GenBank)DNA DataBank of Japan (i.e. GenBank)
– Protein Mutant Database (PMD)Protein Mutant Database (PMD)
Databases from other producersDatabases from other producers– Protein Databank (PDB)Protein Databank (PDB)
Several sequence search options: FASTA, BLAST, Several sequence search options: FASTA, BLAST, Smith-WatermanSmith-Waterman
USPTOUSPTO
Nucleic Acid DatabasesNucleic Acid Databases– GenEmbl (GenBank)GenEmbl (GenBank)
– N-GenseqN-Genseq
– ESTsESTs
Protein DatabasesProtein Databases– Protein Databank (PDB)Protein Databank (PDB)
– SwissProtSwissProt
– A-GenseqA-Genseq
Searched SequenceSearched Sequence
HIV proteaseHIV proteasePQITLWQAPLVTIKIGGQLKEALLDTPQITLWQAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAQYDQILIEICGHKAIGTVLVGGFIKVAQYDQILIEICGHKAIGTVLVGPTPVNIIGANLLTQIGCTPTPVNIIGANLLTQIGCTDefault parameters selectedDefault parameters selected
Searched Sequence – Results - ASearched Sequence – Results - A
Database:Database: Protein sequences derived from the Protein sequences derived from the Patent division of GenBankPatent division of GenBank78 Hits 78 Hits |gb|AAN27487.1||gb|AAN27487.1| Sequence 17 from patent Sequence 17 from patent US 6440730 Length = 1003 Score = 191 bits (486), US 6440730 Length = 1003 Score = 191 bits (486), Expect = 4e-50 Identities = 93/96 (96%), Positives Expect = 4e-50 Identities = 93/96 (96%), Positives = 93/96 (96%)= 93/96 (96%)
Query1Query1: : PQITLWQPQITLWQAAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAAQYD 60QYD 60 PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYDPQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYDSbjct: 57 Sbjct: 57 PQITLWQPQITLWQRRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVGGQYD 116QYD 116
Query1Query1: : QILIEICGHKAIGTVLVGPTPVNIIGQILIEICGHKAIGTVLVGPTPVNIIGAANLLTQIGCT 96NLLTQIGCT 96 QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCTQILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCTSbjct: 117 Sbjct: 117 QILIEICGHKAIGTVLVGPTPVNIIGQILIEICGHKAIGTVLVGPTPVNIIGRRNLLTQIGCT 152 NLLTQIGCT 152
Searched Sequence – Results - BSearched Sequence – Results - B
Database:Database: Protein Data Base (PDB) Protein Data Base (PDB)
75 Hits gi|230577|pdb|2HVP| HIV-1 Protease 75 Hits gi|230577|pdb|2HVP| HIV-1 Protease Length = 99 Score = 172 bits (437), Expect = 1e-44 Length = 99 Score = 172 bits (437), Expect = 1e-44 Identities = 93/96 (96%), Positives = 93/96 (96%) Identities = 93/96 (96%), Positives = 93/96 (96%) Query1Query1: : PQITLWQPQITLWQAAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAAQYD 60 QYD 60
PQITLWQPQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYDPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD
Sbjct: 57 Sbjct: 57 PQITLWQPQITLWQRRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRRQYD 60 QYD 60
Query1Query1: : QILIEICGHKAIGTVLVGPTPVNIIGQILIEICGHKAIGTVLVGPTPVNIIGAANLLTQIGCT 96NLLTQIGCT 96
QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT
Sbjct: 117 Sbjct: 117 QILIEICGHKAIGTVLVGPTPVNIIGQILIEICGHKAIGTVLVGPTPVNIIGRRNLLTQIGCT 96 NLLTQIGCT 96
Searched Sequence – Results - CSearched Sequence – Results - CTitle: TITLE OF YOUR APPLICATION GOES HERETitle: TITLE OF YOUR APPLICATION GOES HEREPerfect score: 521Perfect score: 521Sequence: 1 Sequence: 1 PQITLWQRPLVTIKIGGQLK..........TPVNIIGRNLLTQIGCTLNF 99PQITLWQRPLVTIKIGGQLK..........TPVNIIGRNLLTQIGCTLNF 99Scoring table: BLOSUM62Scoring table: BLOSUM62 Gapop 10.0 , Gapext 0.5 Gapop 10.0 , Gapext 0.5Searched: 908470 seqs, 133250620 residuesSearched: 908470 seqs, 133250620 residuesTotal number of hits satisfying chosen parameters: Total number of hits satisfying chosen parameters: 908470908470Minimum DB seq length: 0Minimum DB seq length: 0Maximum DB seq length: 2000000000Maximum DB seq length: 2000000000Post-processing: Minimum Match 0%Post-processing: Minimum Match 0% Maximum Match 100% Maximum Match 100% Listing first 45 summaries Listing first 45 summariesDatabase : A_Geneseq_101002:*Database : A_Geneseq_101002:*
Searched Sequence – Results - DSearched Sequence – Results - DRESULT 1RESULT 1 ID AAU77767 standard; Protein; 99 AA.ID AAU77767 standard; Protein; 99 AA.AC AAU77767;AC AAU77767;DT 05-JUN-2002 (first entry)DT 05-JUN-2002 (first entry)DE Human immunodeficiency virus type 1 (HIV-1) related protein DE Human immunodeficiency virus type 1 (HIV-1) related protein
#1.#1.KW Human immunodeficiency virus type 1; HIV-1; protease.KW Human immunodeficiency virus type 1; HIV-1; protease.OS Unidentified.OS Unidentified.PN KR98066681-A.PN KR98066681-A.PD 15-OCT-1998.PD 15-OCT-1998.PF 28-JAN-1997; 97KR-0002361.PF 28-JAN-1997; 97KR-0002361.PR 28-JAN-1997; 97KR-0002361.PR 28-JAN-1997; 97KR-0002361.PA (GLDS ) LG CHEM LTD.PA (GLDS ) LG CHEM LTD.PI Kwon YD, Lee TG;PI Kwon YD, Lee TG;DR WPI; 1999-598487/51.DR WPI; 1999-598487/51.PT Mutated human immunodeficiency virus type 1 (HIV-1) proteasePT Mutated human immunodeficiency virus type 1 (HIV-1) proteasePT and process for preparing the same -PT and process for preparing the same -PS Example 3; Page 10; 18pp; Korean.PS Example 3; Page 10; 18pp; Korean.CC The invention relates to a mutated human immunodeficiencyCC The invention relates to a mutated human immunodeficiencyCC virus type 1 (HIV-1) protease and a process for preparing theCC virus type 1 (HIV-1) protease and a process for preparing theCC mutants. This sequence represents a human immunodeficiencyCC mutants. This sequence represents a human immunodeficiencyCC virus associated protein described in the invention.CC virus associated protein described in the invention.SQ Sequence 99 AA;SQ Sequence 99 AA;
Searched Sequence – Results - ESearched Sequence – Results - EPred. No. is the number of results predicted by chance to have Pred. No. is the number of results predicted by chance to have a score greater than or equal to the score of the result being a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score printed, and is derived by analysis of the total score distribution.distribution. SUMMARIES SUMMARIES % %Result QueryResult QueryNo. Score Match Length DB ID DescriptionNo. Score Match Length DB ID Description--------------------------------------------------------------------------------------------------------------------------
1 521 100.0 99 20 AAU77767 Human immunodefici1 521 100.0 99 20 AAU77767 Human immunodefici15 516 99.0 177 11 AAR05744 HIV-1 protease gen15 516 99.0 177 11 AAR05744 HIV-1 protease gen
SQ Sequence 99 AA;SQ Sequence 99 AA;RESULT 1
Query Match 100.0%; Score 521; DB 20; Length 99;Query Match 100.0%; Score 521; DB 20; Length 99; Best Local Similarity 100.0%; Pred. No. 2.6e-58;Best Local Similarity 100.0%; Pred. No. 2.6e-58; Matches 99; Conservative 0; Mismatches 0; Indels 0; Gaps 0;Matches 99; Conservative 0; Mismatches 0; Indels 0; Gaps 0;Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Db 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60Db 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Db 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99Db 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99
Searched Sequence – Results - FSearched Sequence – Results - FPred. No. is the number of results predicted by chance to have Pred. No. is the number of results predicted by chance to have a score greater than or equal to the score of the result being a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score printed, and is derived by analysis of the total score distribution.distribution. SUMMARIES SUMMARIES % %Result QueryResult QueryNo. Score Match Length DB ID DescriptionNo. Score Match Length DB ID Description--------------------------------------------------------------------------------------------------------------------------
1 521 100.0 99 20 AAU77767 Human immunodefici1 521 100.0 99 20 AAU77767 Human immunodefici15 516 99.0 177 11 AAR05744 HIV-1 protease gen15 516 99.0 177 11 AAR05744 HIV-1 protease gen
SQ Sequence 177 AA;SQ Sequence 177 AA;RESULT 15RESULT 15
Query Match 99.0%; Score 516; DB 11; Length 177;Query Match 99.0%; Score 516; DB 11; Length 177; Best Local Similarity 99.0%; Pred. No. 2.3e-57;Best Local Similarity 99.0%; Pred. No. 2.3e-57; Matches 98; Conservative 1; Mismatches 0; Indels 0; Gaps Matches 98; Conservative 1; Mismatches 0; Indels 0; Gaps
0; 0;Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMQy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNNLPGRWKPKMIGGIGGFIKVRQYD 60LPGRWKPKMIGGIGGFIKVRQYD 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||::||||||||||||||||||||||||||||||||||||||||||||||Db 56 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMDb 56 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSSLPGRWKPKMIGGIGGFIKVRQYD 115LPGRWKPKMIGGIGGFIKVRQYD 115Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Db 116 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 154Db 116 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 154
Sample ClaimsSample ClaimsA polypeptide having HIV protease activityA polypeptide having HIV protease activityAn isolated polypeptide having HIV protease activityAn isolated polypeptide having HIV protease activityAn isolated polypeptide comprising SEQ ID NO: 1An isolated polypeptide comprising SEQ ID NO: 1An isolated polypeptide consisting essentially of SEQ ID NO: 1An isolated polypeptide consisting essentially of SEQ ID NO: 1An isolated polypeptide consisting of SEQ ID NO: 1An isolated polypeptide consisting of SEQ ID NO: 1A peptide fragment having HIV protease activityA peptide fragment having HIV protease activityA peptide fragment of SEQ ID NO: 1 with HIV protease activityA peptide fragment of SEQ ID NO: 1 with HIV protease activityA epitope of ten amino acids in length of SEQ ID NO: 1 capable A epitope of ten amino acids in length of SEQ ID NO: 1 capable of binding to an antibody to SEQ ID NO:1of binding to an antibody to SEQ ID NO:1An isolated polypeptide or fragment thereof of SEQ ID NO: 1 An isolated polypeptide or fragment thereof of SEQ ID NO: 1 wherein one or more of amino acid residues have been substituted, wherein one or more of amino acid residues have been substituted, deleted, or inserted and which polypeptide retains HIV protease deleted, or inserted and which polypeptide retains HIV protease enzymatic activityenzymatic activity
AcknowledgementsAcknowledgements
STIC / Toby Port and David SchreiberSTIC / Toby Port and David Schreiber
TC 1600 / James MartinellTC 1600 / James Martinell