Biopython
-
Upload
bosc -
Category
Technology
-
view
3.642 -
download
5
description
Transcript of Biopython
![Page 1: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/1.jpg)
The 8th annualBioinformatics Open Source
Conference(BOSC 2007)
18th July, Vienna, Austria
Peter Cock,MOAC Doctoral Training Centre,University of Warwick, UK
Biopython Project Update
![Page 2: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/2.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Talk OutlineTalk Outline What is python?What is python? What is Biopython?What is Biopython? Short historyShort history Project organisationProject organisation What can you do with it?What can you do with it? How can you contribute?How can you contribute? AcknowledgementsAcknowledgements
![Page 3: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/3.jpg)
What is Python?What is Python?
High level programming High level programming languagelanguage
Object orientatedObject orientated Open Source, free ($$$) Open Source, free ($$$) Cross platform:Cross platform:
Linux, Windows, Mac OS X, …Linux, Windows, Mac OS X, … Extensible in C, C++, …Extensible in C, C++, …
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 4: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/4.jpg)
What is Biopython?What is Biopython?
Set of libraries for computational Set of libraries for computational biologybiology
Open Source, free ($$$)Open Source, free ($$$) Cross platform:Cross platform:
Linux, Windows, Mac OS X, … Linux, Windows, Mac OS X, … Sibling project to BioPerl, BioRuby, Sibling project to BioPerl, BioRuby,
BioJava, …BioJava, …
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 5: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/5.jpg)
Popularity by Google HitsPopularity by Google Hits PythonPython
PerlPerl
RubyRuby
JavaJava
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
98 million98 million
101 101 millionmillion
101 101 millionmillion
289 289 millionmillion
BiopythonBiopython
BioPerlBioPerl
BioRubyBioRuby
BioJavaBioJava
252,000252,000
610,000610,000
122,000122,000
185,000185,000
BioPerl BioPerl 610,000610,000
Both Perl and Python are strong at textBoth Perl and Python are strong at text Python may have the edge for numerical Python may have the edge for numerical
work (with the Numerical python libraries)work (with the Numerical python libraries)
![Page 6: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/6.jpg)
Biopython historyBiopython history 1999 : Started by Jeff Chang & Andrew 1999 : Started by Jeff Chang & Andrew
DalkeDalke 2000 : Biopython 0.90, first release2000 : Biopython 0.90, first release 2001 : Biopython 1.00, “semi-complete”2001 : Biopython 1.00, “semi-complete” 2002 : Biopython 1.10, “semi-stable”2002 : Biopython 1.10, “semi-stable” 2003 : Biopython 1.20, 1.21, 1.22 and 1.232003 : Biopython 1.20, 1.21, 1.22 and 1.23 2004 : Biopython 1.24 and 1.302004 : Biopython 1.24 and 1.30 2005 : Biopython 1.40 and 1.412005 : Biopython 1.40 and 1.41 2006 : Biopython 1.422006 : Biopython 1.42 2007 : Biopython 1.432007 : Biopython 1.43
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 7: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/7.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Biopython Project Biopython Project OrganisationOrganisation
Releases:Releases: No fixed scheduleNo fixed schedule Currently once or twice a yearCurrently once or twice a year Work from a stable CVS baseWork from a stable CVS base
Bugs:Bugs: Online bugzillaOnline bugzilla Some small changes handled on mailing listSome small changes handled on mailing list
Tests:Tests: Many based on unittest python libraryMany based on unittest python library Also simple scripts where output is verifiedAlso simple scripts where output is verified
![Page 8: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/8.jpg)
What can you do with What can you do with Biopython?Biopython?
Read, write & manipulate sequencesRead, write & manipulate sequences Restriction enzymesRestriction enzymes BLAST (local and online)BLAST (local and online) Web databases (e.g. NCBI’s EUtils)Web databases (e.g. NCBI’s EUtils) Call command line tools (e.g. clustalw)Call command line tools (e.g. clustalw) Clustering (Bio.Cluster)Clustering (Bio.Cluster) Phylogenetics (Bio.Nexus)Phylogenetics (Bio.Nexus) Protein Structures (Bio.PDB)Protein Structures (Bio.PDB)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 9: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/9.jpg)
Manipulating SequencesManipulating Sequences
Use Biopython’s Seq object, holds:Use Biopython’s Seq object, holds: Sequence data (string like)Sequence data (string like) Alphabet (can include list of letters)Alphabet (can include list of letters)
Alphabet allows type checking, Alphabet allows type checking, preventing errors like appending preventing errors like appending DNA to ProteinDNA to Protein
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 10: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/10.jpg)
Manipulating SequencesManipulating Sequencesfrom Bio.Seq import Seqfrom Bio.Alphabet.IUPAC import unambiguous_dna
my_dna=Seq('CTAAACATCCTTCAT', unambiguous_dna)print 'Original:'print my_dnaprint 'Reverse complement:'print my_dna.reverse_complement()
Original:Seq('CTAAACATCCTTCAT', IUPACUnambiguousDNA())Reverse complement:Seq('ATGAAGGATGTTTAG', IUPACUnambiguousDNA())
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 11: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/11.jpg)
Translating SequencesTranslating Sequencesfrom Bio import Translatebact_trans=Translate.unambiguous_dna_by_id[11]
print 'Forward translation'print bact_trans.translate(my_dna)print 'Reverse complement translation'print bact_trans.translate( \ my_dna.reverse_complement())
Forward translationSeq('LNILH', HasStopCodon(IUPACProtein(), '*'))Reverse complement translationSeq('MKDV*', HasStopCodon(IUPACProtein(), '*'))
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 12: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/12.jpg)
Sequence Input/OutputSequence Input/Output
Bio.SeqIO is new in Biopython 1.43Bio.SeqIO is new in Biopython 1.43 Inspired by BioPerl’s SeqIOInspired by BioPerl’s SeqIO Works with SeqRecord objectsWorks with SeqRecord objects
(not format specific representations)(not format specific representations) Builds on existing Biopython parsersBuilds on existing Biopython parsers
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 13: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/13.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
SeqIO – Sequence InputSeqIO – Sequence Inputfrom Bio import SeqIOhandle = open('ls_orchid.fasta')format = 'fasta'for rec in SeqIO.parse(handle, format) : print "%s, len %i" % (rec.id, len(rec.seq)) print rec.seq[:40].tostring() + "..."handle.close()
gi|2765658|emb|Z78533.1|CIZ78533, len 740CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT...gi|2765657|emb|Z78532.1|CCZ78532, len 753CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT......
![Page 14: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/14.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
SeqIO – Sequence InputSeqIO – Sequence Inputfrom Bio import SeqIOhandle = open('ls_orchid.gbk')format = 'genbank'for rec in SeqIO.parse(handle, format) : print "%s, len %i" % (rec.id, len(rec.seq)) print rec.seq[:40].tostring() + "..."handle.close()
Z78533.1, len 740CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT...Z78532.1, len 753CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCAT......
![Page 15: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/15.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
SeqIO – Extracting DataSeqIO – Extracting Datafrom Bio import SeqIOhandle = open('ls_orchid.gbk')format = 'genbank'from sets import Setprint Set([rec.annotations['organism'] \ for rec in SeqIO.parse(handle, format)])handle.close()
Set(['Cypripedium acaule', 'Paphiopedilum primulinum', 'Phragmipedium lindenii', 'Paphiopedilum papuanum', 'Paphiopedilum stonei', 'Paphiopedilum urbanianum', 'Paphiopedilum dianthum', ...])
![Page 16: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/16.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
SeqIO – Filtering OutputSeqIO – Filtering Outputi_handle = open('ls_orchid.gbk')o_handle = open('small_orchid.faa', 'w')SeqIO.write([rec for rec in \ SeqIO.parse(i_handle, 'genbank') \ if len(rec.seq) < 600], o_handle, 'fasta')i_handle.close()o_handle.close()
>Z78481.1 P.insigne 5.8S rRNA gene and ITS1 ...CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTT... >Z78480.1 P.gratrixianum 5.8S rRNA gene and ...CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTT......
![Page 17: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/17.jpg)
3D Structures3D Structures
Bio.Nexus was added in Biopython Bio.Nexus was added in Biopython 1.30 by 1.30 by Frank Kauff and Cymon CoxFrank Kauff and Cymon Cox
Reads Nexus alignments and treesReads Nexus alignments and trees Also parses Newick format treesAlso parses Newick format trees
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 18: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/18.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Newick Tree ParsingNewick Tree Parsing(Bovine:0.69395,(Gibbon:0.36079,(Orang:0.33636,(Gorilla:0.17147,(Chimp:0.19268, Human:0.11927):0.08386):0.06124):0.15057):0.54939,Mouse:1.21460):0.10;
from Bio.Nexus.Trees import Treetree_str = open("simple.tree").read()tree_obj = Tree(tree_str)print tree_obj
tree a_tree = (Bovine,(Gibbon,(Orang,(Gorilla,(Chimp,Human)))),Mouse);
![Page 19: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/19.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Newick Tree ParsingNewick Tree Parsing
# taxon prev succ brlen blen (sum) support0 - None [1,2,11] 0.0 0.0 -1 Bovine 0 [] 0.69395 0.69395 -2 - 0 [3,4] 0.54939 0.54939 -3 Gibbon 2 [] 0.36079 0.91018 -4 - 2 [5,6] 0.15057 0.69996 -5 Orang 4 [] 0.33636 1.03632 -6 - 4 [7,8] 0.06124 0.7612 -7 Gorilla 6 [] 0.17147 0.93267 -8 - 6 [9,10] 0.08386 0.84506 -9 Chimp 8 [] 0.19268 1.03774 -10 Human 8 [] 0.11927 0.96433 -11 Mouse 0 [] 1.2146 1.2146 -
Root: 0
tree_obj.display()
![Page 20: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/20.jpg)
3D Structures3D Structures
Bio.PDB was added in Biopython 1.24 Bio.PDB was added in Biopython 1.24 by Thomas Hamelryckby Thomas Hamelryck
Reads PDB and CIF format filesReads PDB and CIF format files
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 21: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/21.jpg)
Working with 3D StructuresWorking with 3D Structures
http://www.warwick.ac.uk/go/peter_cock/http://www.warwick.ac.uk/go/peter_cock/
python/protein_superposition/python/protein_superposition/
Before: After:This example (online) uses Bio.PDB to align 21 alternative X-Ray crystal structures for PDB structure 1JOY.
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 22: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/22.jpg)
Population Genetics Population Genetics (planned)(planned)
Tiago Antão (with Ralph Haygood) plans Tiago Antão (with Ralph Haygood) plans to start a Population Genetics moduleto start a Population Genetics module
See also:See also: PyPop: Python for Population GeneticsPyPop: Python for Population Genetics
Alex Lancaster Alex Lancaster et al.et al. (2003) (2003)www.pypop.orgwww.pypop.org
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 23: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/23.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Areas for ImprovementAreas for Improvement Documentation!Documentation! I’m interested in sequences & I’m interested in sequences &
alignments:alignments: Seq objects – more like strings?Seq objects – more like strings? Alignment objects – more like arrays?Alignment objects – more like arrays? SeqIO – support for more formatsSeqIO – support for more formats AlignIO? – alignment equivalent to SeqIOAlignIO? – alignment equivalent to SeqIO
Move from Numeric to NumPyMove from Numeric to NumPy Move from CVS to SVN?Move from CVS to SVN?
![Page 24: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/24.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
How can you Contribute?How can you Contribute? Users:Users:
Discussions on the mailing listDiscussions on the mailing list Report bugsReport bugs Documentation improvementDocumentation improvement
Coders:Coders: Suggest bug fixesSuggest bug fixes New/extended test casesNew/extended test cases Adopt modules with no current ‘owner’Adopt modules with no current ‘owner’ New modulesNew modules
![Page 25: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/25.jpg)
Biopython Biopython AcknowledgementsAcknowledgements
Open Bioinformatics Foundation or O|B|F Open Bioinformatics Foundation or O|B|F for web hosting, CVS servers, mailing listfor web hosting, CVS servers, mailing list
Biopython developers, including:Biopython developers, including:Jeff Chang, Andrew Dalke, Brad Jeff Chang, Andrew Dalke, Brad Chapman, Iddo Chapman, Iddo FriedbergFriedberg, , Michiel de Michiel de Hoon, Frank Kauff, Cymon Cox, Thomas Hoon, Frank Kauff, Cymon Cox, Thomas Hamelryck, meHamelryck, me
Contributors who report bugs & join in Contributors who report bugs & join in the mailing list discussionsthe mailing list discussions
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 26: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/26.jpg)
Personal Personal AcknowledgementsAcknowledgements
Everyone for listeningEveryone for listening Open Bioinformatics Foundation or O|B|Open Bioinformatics Foundation or O|B|
F for the BOSC 2007 invitationF for the BOSC 2007 invitation Iddo Iddo Friedberg &Friedberg & Michiel de Hoon for Michiel de Hoon for
their encouragementtheir encouragement The EPSRC for my PhD funding via theThe EPSRC for my PhD funding via the
MOAC Doctoral Training Centre, MOAC Doctoral Training Centre, WarwickWarwick
http://www.warwick.ac.uk/go/moachttp://www.warwick.ac.uk/go/moacThe 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
![Page 27: Biopython](https://reader035.fdocuments.in/reader035/viewer/2022070315/5550128eb4c905af648b49c4/html5/thumbnails/27.jpg)
The 8th annual Bioinformatics Open Source ConferenceBiopython Project Update @ BOSC 2007, Vienna,
Austria
Questions?Questions?