Sequence Search and Analysis [email protected] SPE 1653 (703) 308-2923.

16
Sequence Search Sequence Search and Analysis and Analysis [email protected] [email protected] SPE 1653 (703) 308-2923 SPE 1653 (703) 308-2923

Transcript of Sequence Search and Analysis [email protected] SPE 1653 (703) 308-2923.

Page 1: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Sequence SearchSequence Searchand Analysisand Analysis

[email protected]@uspto.govSPE 1653 (703) 308-2923SPE 1653 (703) 308-2923

Page 2: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Biosequence Patent Biosequence Patent SearchSearch

Mission Impossible - ?Mission Impossible - ?

Mission Difficult - ?Mission Difficult - ?

Page 3: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Sample Searchable Public DatabasesSample Searchable Public Databases

National Center for Biotechnology National Center for Biotechnology Information (NCBI) Information (NCBI) EntrezEntrez– www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov

European Bioinformatics Institute (EBI)European Bioinformatics Institute (EBI)– www.ebi.ac.ukwww.ebi.ac.uk

DNA DataBank of Japan (DDBJ)DNA DataBank of Japan (DDBJ)– www.ddbj.nig.ac.jpwww.ddbj.nig.ac.jp

SwissProt, PIR, etc do not cover patentsSwissProt, PIR, etc do not cover patents

Page 4: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

NCBI NCBI EntrezEntrez

NCBI GenbankNCBI Genbank– In collaboration with EMBL and DDBJIn collaboration with EMBL and DDBJ

Databases from other producersDatabases from other producers– SwissProt, TrEMBL, PDB, PIR, etcSwissProt, TrEMBL, PDB, PIR, etc

Bibliographic databasesBibliographic databases– E.g., PubMed (MEDLINE)E.g., PubMed (MEDLINE)

NCBI BLASTNCBI BLAST®® sequence searching sequence searching

Page 5: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

EMBL-EBI on the WebEMBL-EBI on the Web

EMBL databasesEMBL databases– EMBL Nucleotide Database (i.e. GenBank)EMBL Nucleotide Database (i.e. GenBank)– Translated EMBL (TrEMBL)Translated EMBL (TrEMBL)

Databases from other producersDatabases from other producers– SwissProt, PDB, etcSwissProt, PDB, etc

Many sequence search options: FASTA, Many sequence search options: FASTA, NCBI-BLAST, WU-Blast, Smith-WatermanNCBI-BLAST, WU-Blast, Smith-Waterman

Page 6: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

DDBJ via the WebDDBJ via the Web

DDBJ databasesDDBJ databases– DNA DataBank of Japan (i.e. GenBank)DNA DataBank of Japan (i.e. GenBank)

– Protein Mutant Database (PMD)Protein Mutant Database (PMD)

Databases from other producersDatabases from other producers– Protein Databank (PDB)Protein Databank (PDB)

Several sequence search options: FASTA, BLAST, Several sequence search options: FASTA, BLAST, Smith-WatermanSmith-Waterman

Page 7: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

USPTOUSPTO

Nucleic Acid DatabasesNucleic Acid Databases– GenEmbl (GenBank)GenEmbl (GenBank)

– N-GenseqN-Genseq

– ESTsESTs

Protein DatabasesProtein Databases– Protein Databank (PDB)Protein Databank (PDB)

– SwissProtSwissProt

– A-GenseqA-Genseq

Page 8: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Searched SequenceSearched Sequence

HIV proteaseHIV proteasePQITLWQAPLVTIKIGGQLKEALLDTPQITLWQAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAQYDQILIEICGHKAIGTVLVGGFIKVAQYDQILIEICGHKAIGTVLVGPTPVNIIGANLLTQIGCTPTPVNIIGANLLTQIGCTDefault parameters selectedDefault parameters selected

Page 9: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Searched Sequence – Results - ASearched Sequence – Results - A

Database:Database: Protein sequences derived from the Protein sequences derived from the Patent division of GenBankPatent division of GenBank78 Hits 78 Hits |gb|AAN27487.1||gb|AAN27487.1| Sequence 17 from patent Sequence 17 from patent US 6440730 Length = 1003 Score = 191 bits (486), US 6440730 Length = 1003 Score = 191 bits (486), Expect = 4e-50 Identities = 93/96 (96%), Positives Expect = 4e-50 Identities = 93/96 (96%), Positives = 93/96 (96%)= 93/96 (96%)

Query1Query1: : PQITLWQPQITLWQAAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAAQYD 60QYD 60 PQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYDPQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYDSbjct: 57 Sbjct: 57 PQITLWQPQITLWQRRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVGGQYD 116QYD 116

Query1Query1: : QILIEICGHKAIGTVLVGPTPVNIIGQILIEICGHKAIGTVLVGPTPVNIIGAANLLTQIGCT 96NLLTQIGCT 96 QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCTQILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCTSbjct: 117 Sbjct: 117 QILIEICGHKAIGTVLVGPTPVNIIGQILIEICGHKAIGTVLVGPTPVNIIGRRNLLTQIGCT 152 NLLTQIGCT 152

Page 10: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Searched Sequence – Results - BSearched Sequence – Results - B

Database:Database: Protein Data Base (PDB) Protein Data Base (PDB)

75 Hits gi|230577|pdb|2HVP| HIV-1 Protease 75 Hits gi|230577|pdb|2HVP| HIV-1 Protease Length = 99 Score = 172 bits (437), Expect = 1e-44 Length = 99 Score = 172 bits (437), Expect = 1e-44 Identities = 93/96 (96%), Positives = 93/96 (96%) Identities = 93/96 (96%), Positives = 93/96 (96%) Query1Query1: : PQITLWQPQITLWQAAPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVAAQYD 60 QYD 60

PQITLWQPQITLWQ PLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYDPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKV QYD

Sbjct: 57 Sbjct: 57 PQITLWQPQITLWQRRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRRQYD 60 QYD 60

Query1Query1: : QILIEICGHKAIGTVLVGPTPVNIIGQILIEICGHKAIGTVLVGPTPVNIIGAANLLTQIGCT 96NLLTQIGCT 96

QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT QILIEICGHKAIGTVLVGPTPVNIIG NLLTQIGCT

Sbjct: 117 Sbjct: 117 QILIEICGHKAIGTVLVGPTPVNIIGQILIEICGHKAIGTVLVGPTPVNIIGRRNLLTQIGCT 96 NLLTQIGCT 96

Page 11: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Searched Sequence – Results - CSearched Sequence – Results - CTitle: TITLE OF YOUR APPLICATION GOES HERETitle: TITLE OF YOUR APPLICATION GOES HEREPerfect score: 521Perfect score: 521Sequence: 1 Sequence: 1 PQITLWQRPLVTIKIGGQLK..........TPVNIIGRNLLTQIGCTLNF 99PQITLWQRPLVTIKIGGQLK..........TPVNIIGRNLLTQIGCTLNF 99Scoring table: BLOSUM62Scoring table: BLOSUM62 Gapop 10.0 , Gapext 0.5 Gapop 10.0 , Gapext 0.5Searched: 908470 seqs, 133250620 residuesSearched: 908470 seqs, 133250620 residuesTotal number of hits satisfying chosen parameters: Total number of hits satisfying chosen parameters: 908470908470Minimum DB seq length: 0Minimum DB seq length: 0Maximum DB seq length: 2000000000Maximum DB seq length: 2000000000Post-processing: Minimum Match 0%Post-processing: Minimum Match 0% Maximum Match 100% Maximum Match 100% Listing first 45 summaries Listing first 45 summariesDatabase : A_Geneseq_101002:*Database : A_Geneseq_101002:*

Page 12: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Searched Sequence – Results - DSearched Sequence – Results - DRESULT 1RESULT 1 ID AAU77767 standard; Protein; 99 AA.ID AAU77767 standard; Protein; 99 AA.AC AAU77767;AC AAU77767;DT 05-JUN-2002 (first entry)DT 05-JUN-2002 (first entry)DE Human immunodeficiency virus type 1 (HIV-1) related protein DE Human immunodeficiency virus type 1 (HIV-1) related protein

#1.#1.KW Human immunodeficiency virus type 1; HIV-1; protease.KW Human immunodeficiency virus type 1; HIV-1; protease.OS Unidentified.OS Unidentified.PN KR98066681-A.PN KR98066681-A.PD 15-OCT-1998.PD 15-OCT-1998.PF 28-JAN-1997; 97KR-0002361.PF 28-JAN-1997; 97KR-0002361.PR 28-JAN-1997; 97KR-0002361.PR 28-JAN-1997; 97KR-0002361.PA (GLDS ) LG CHEM LTD.PA (GLDS ) LG CHEM LTD.PI Kwon YD, Lee TG;PI Kwon YD, Lee TG;DR WPI; 1999-598487/51.DR WPI; 1999-598487/51.PT Mutated human immunodeficiency virus type 1 (HIV-1) proteasePT Mutated human immunodeficiency virus type 1 (HIV-1) proteasePT and process for preparing the same -PT and process for preparing the same -PS Example 3; Page 10; 18pp; Korean.PS Example 3; Page 10; 18pp; Korean.CC The invention relates to a mutated human immunodeficiencyCC The invention relates to a mutated human immunodeficiencyCC virus type 1 (HIV-1) protease and a process for preparing theCC virus type 1 (HIV-1) protease and a process for preparing theCC mutants. This sequence represents a human immunodeficiencyCC mutants. This sequence represents a human immunodeficiencyCC virus associated protein described in the invention.CC virus associated protein described in the invention.SQ Sequence 99 AA;SQ Sequence 99 AA;

Page 13: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Searched Sequence – Results - ESearched Sequence – Results - EPred. No. is the number of results predicted by chance to have Pred. No. is the number of results predicted by chance to have a score greater than or equal to the score of the result being a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score printed, and is derived by analysis of the total score distribution.distribution. SUMMARIES SUMMARIES % %Result QueryResult QueryNo. Score Match Length DB ID DescriptionNo. Score Match Length DB ID Description--------------------------------------------------------------------------------------------------------------------------

1 521 100.0 99 20 AAU77767 Human immunodefici1 521 100.0 99 20 AAU77767 Human immunodefici15 516 99.0 177 11 AAR05744 HIV-1 protease gen15 516 99.0 177 11 AAR05744 HIV-1 protease gen

SQ Sequence 99 AA;SQ Sequence 99 AA;RESULT 1

Query Match 100.0%; Score 521; DB 20; Length 99;Query Match 100.0%; Score 521; DB 20; Length 99; Best Local Similarity 100.0%; Pred. No. 2.6e-58;Best Local Similarity 100.0%; Pred. No. 2.6e-58; Matches 99; Conservative 0; Mismatches 0; Indels 0; Gaps 0;Matches 99; Conservative 0; Mismatches 0; Indels 0; Gaps 0;Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Db 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60Db 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYD 60Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Db 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99Db 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99

Page 14: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Searched Sequence – Results - FSearched Sequence – Results - FPred. No. is the number of results predicted by chance to have Pred. No. is the number of results predicted by chance to have a score greater than or equal to the score of the result being a score greater than or equal to the score of the result being printed, and is derived by analysis of the total score printed, and is derived by analysis of the total score distribution.distribution. SUMMARIES SUMMARIES % %Result QueryResult QueryNo. Score Match Length DB ID DescriptionNo. Score Match Length DB ID Description--------------------------------------------------------------------------------------------------------------------------

1 521 100.0 99 20 AAU77767 Human immunodefici1 521 100.0 99 20 AAU77767 Human immunodefici15 516 99.0 177 11 AAR05744 HIV-1 protease gen15 516 99.0 177 11 AAR05744 HIV-1 protease gen

SQ Sequence 177 AA;SQ Sequence 177 AA;RESULT 15RESULT 15

Query Match 99.0%; Score 516; DB 11; Length 177;Query Match 99.0%; Score 516; DB 11; Length 177; Best Local Similarity 99.0%; Pred. No. 2.3e-57;Best Local Similarity 99.0%; Pred. No. 2.3e-57; Matches 98; Conservative 1; Mismatches 0; Indels 0; Gaps Matches 98; Conservative 1; Mismatches 0; Indels 0; Gaps

0; 0;Qy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMQy 1 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNNLPGRWKPKMIGGIGGFIKVRQYD 60LPGRWKPKMIGGIGGFIKVRQYD 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||::||||||||||||||||||||||||||||||||||||||||||||||Db 56 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMDb 56 PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSSLPGRWKPKMIGGIGGFIKVRQYD 115LPGRWKPKMIGGIGGFIKVRQYD 115Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99Qy 61 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 99 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Db 116 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 154Db 116 QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF 154

Page 15: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

Sample ClaimsSample ClaimsA polypeptide having HIV protease activityA polypeptide having HIV protease activityAn isolated polypeptide having HIV protease activityAn isolated polypeptide having HIV protease activityAn isolated polypeptide comprising SEQ ID NO: 1An isolated polypeptide comprising SEQ ID NO: 1An isolated polypeptide consisting essentially of SEQ ID NO: 1An isolated polypeptide consisting essentially of SEQ ID NO: 1An isolated polypeptide consisting of SEQ ID NO: 1An isolated polypeptide consisting of SEQ ID NO: 1A peptide fragment having HIV protease activityA peptide fragment having HIV protease activityA peptide fragment of SEQ ID NO: 1 with HIV protease activityA peptide fragment of SEQ ID NO: 1 with HIV protease activityA epitope of ten amino acids in length of SEQ ID NO: 1 capable A epitope of ten amino acids in length of SEQ ID NO: 1 capable of binding to an antibody to SEQ ID NO:1of binding to an antibody to SEQ ID NO:1An isolated polypeptide or fragment thereof of SEQ ID NO: 1 An isolated polypeptide or fragment thereof of SEQ ID NO: 1 wherein one or more of amino acid residues have been substituted, wherein one or more of amino acid residues have been substituted, deleted, or inserted and which polypeptide retains HIV protease deleted, or inserted and which polypeptide retains HIV protease enzymatic activityenzymatic activity

Page 16: Sequence Search and Analysis Christopher.low@uspto.gov SPE 1653 (703) 308-2923.

AcknowledgementsAcknowledgements

STIC / Toby Port and David SchreiberSTIC / Toby Port and David Schreiber

TC 1600 / James MartinellTC 1600 / James Martinell