Chapman bosc2010 biopython

14
Community Integration Democratization Biopython: challenges Brad Chapman Peter Cock Biopython contributors http://biopython.org 10 July 2010

Transcript of Chapman bosc2010 biopython

Page 1: Chapman bosc2010 biopython

Community Integration Democratization

Biopython: challenges

Brad Chapman

Peter Cock

Biopython contributors

http://biopython.org

10 July 2010

Page 2: Chapman bosc2010 biopython

Community Integration Democratization

3 challenges for successful open source

projects

Community

Integration

Democratization

Page 3: Chapman bosc2010 biopython

Community Integration Democratization

Distributed code access

Page 4: Chapman bosc2010 biopython

Community Integration Democratization

Recruiting and training

Google Summer of Code

2009 Eric Talevich

phyloXML; Bio.Phylo

Nick Matzke

Biogeographical Phylogenetics

2010 Joao Rodrigues

Structural biology; Bio.PDB

Page 5: Chapman bosc2010 biopython

Community Integration Democratization

Answering questions better

Page 6: Chapman bosc2010 biopython

Community Integration Democratization

Recognizing contributions

Page 7: Chapman bosc2010 biopython

Community Integration Democratization

Diversity of Python bioinformatics

Page 8: Chapman bosc2010 biopython

Community Integration Democratization

Interoperability

Avoid re-implementation

Convert core objects

Document workflows with multiple

libraries

Communicate better

Page 9: Chapman bosc2010 biopython

Community Integration Democratization

Wrapping external tools

import subprocess

from Bio.Blast.Applications import (

NcbiblastxCommandline)

cl = NcbiblastxCommandline(query="opuntia.fasta",

db="nr", evalue=0.001, outfmt=5,

out="opuntia.xml")

subprocess.call(str(cl))

Page 10: Chapman bosc2010 biopython

Community Integration Democratization

Documenting standards

Page 11: Chapman bosc2010 biopython

Community Integration Democratization

Making code easier to use

>>> from Bio import SeqIO

>>> memory_dict = SeqIO.index("in.gb", "genbank")

>>> memory_dict.keys()

[’Z78484.1’, ... ’Z78471.1’]

>>> seq_record = memory_dict["Z78475.1"]

>>> print seq_record.description

P.supardii 5.8S rRNA gene and ITS1 and ITS2 DNA

>>> seq_record.seq

Seq(’CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGG...GGT’,

IUPACAmbiguousDNA())

Page 12: Chapman bosc2010 biopython

Community Integration Democratization

Challenges of big data

Page 13: Chapman bosc2010 biopython

Community Integration Democratization

Cloud: easier to distribute

On-demand computational resources like

Amazon EC2

Provide ready-to-go images

Biopython and many associated

bioinformatics libraries

Biological data

http://github.com/chapmanb/bcbb/tree/master/ec2/biolinux/

Page 14: Chapman bosc2010 biopython

Community Integration Democratization

Following up

Home http://biopython.org

Code http://github.com/biopython

BOSC Talk to Eric, Tiago or myself