Rice Emboss Bosc2009

18
EBI is an Outstation of the European Molecular Biology Laboratory. EMBOSS European Molecular Biology Open Software Suite Peter Rice [email protected]

Transcript of Rice Emboss Bosc2009

Page 1: Rice Emboss Bosc2009

EBI is an Outstation of the European Molecular Biology Laboratory.

EMBOSS

European Molecular Biology

Open Software Suite

Peter Rice [email protected]

Page 2: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.232

A quick introduction

• Open source package for sequence analysis• ANSI C source code• GPL licensed applications, LGPL libraries• 200+ applications• 100+ third party applications in 15 associated packages• Project started 1996 at Sanger and HGMP• Now based at EBI• Release 6.1.0 15th July 2009• Funded by UK-BBSRC and EMBL-EBI

Page 3: Rice Emboss Bosc2009

BOSC EMBOSS 200912.04.233

A near death experience

• April 2004: The UK Medical Research Council decided to close the UK Human Genome Mapping Project Resource Centre (now the Rosalind Franklin Institute)

• That was where all the EMBOSS developers worked• We announced the potential end of EMBOSS development to our

user community• HGMP closed in July 2005• The developers moved to EBI, interim funding to April 2006.• Funding was secured in May 2006 (BBSRC)• … and again in May 2009 (BBSRC)• As far as we are aware, all our academic and industry users

continued running EMBOSS … with no risk• That is a huge advantage for open source licensing

Page 4: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.234

Who do we serve?

• Expert software developers• Bioinformaticians• Computer scientists

• Expert users• Biology research community• Industry

• Scientific users• Biology research community• Industry

Page 5: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.235

EMBOSS World Wide

We have users in every continent - and a picture to prove it. This is British Antarctica. We are promised another photo from the frozen North

The first EMBOSS course was in Beijing, April 1999.

The wEMBOSS interface is from Canada, Argentina and Belgium

Page 6: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.236

EMBOSS command line interface

• EMBOSS applications run from the command line• This is not the only interface

• There are over 100 interfaces and packaged systems available

• All applications have a command definition file (.acd)• Defines all inputs, outputs, and other options• Read at startup• Contains all command line options with descriptions• Template for any other interface

Page 7: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.237

EMBOSS command line example

% antigenic

Input protein sequence(s): uniprot:actb1_fugru

Minimum length of antigenic region [6]:

Output report [actb1_fugru.antigenic]:

% antigenic uniprot:actb1_fugru -auto

Page 8: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.238

EMBOSS ACD File

application: antigenic [ documentation: "Finds antigenic sites in proteins" groups: "Protein:Motifs"]

section: input [ information: "Input section" type: "page"]

seqall: sequence [ parameter: "Y" type: "PureProtein" ]

endsection: input

section: required [ information: "Required section" type: "page"]

integer: minlen [ standard: "Y" minimum: "1" maximum: "50" default: "6" information: "Minimum length of antigenic region" ]

endsection: required

section: output [ information: "Output section" type: "page"]

report: outfile [ parameter: "Y" rformat: "motif" multiple: "Y" taglist: "int:pos=Max_score_pos" ]

endsection: output

Page 9: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.239

EMBOSS makes things easy

• ACD files define sequence input• Sequence type for DNA/protein, possible ambiguity codes, gaps• Sequences in files

• 40+ formats supported - auto detection• Sequence databases

• Remote servers• SRS, Entrez, MRS• User-specified URL

• Locally indexed - using the original data files• Local script utilities

Page 10: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.2310

EMBOSS Web Interface

http://emboss.ch.embnet.org/wEMBOSS/

Page 11: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.2311

EMBOSS SoapLab Service

MyGrid/EMBRACE projects: for use by Taverna Workflows

Page 12: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.2312

EMBOSS User Survey

Page 13: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.2313

EMBOSS Update

• Release 6.1.0 as usual on 15th July 2009• New EMBL and UniProt formats

• With full set of cross-references

• FASTQ short read formats• Jemboss GUI included as standard• Further profiling for enhanced efficiency• 2000+ QA tests (more needed)• Updated Phylip 3.68 … and file format variants• Services for EMBRACE/SoapLab2• DAS testing

Page 14: Rice Emboss Bosc2009

Example Dasty screen:

Page 15: Rice Emboss Bosc2009

Example Ensembl screen:

Page 16: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.2316

EMBOSS Future plans

• Three open source books: users, developers, admin• Cambridge University Press• Original text can be freely reused

• New areas of interest• Metadata and ontologies (EDAM, taxonomy, GO, SO, …)• (all) public data resources• Coordinate systems (ensembl, gene/protein input/results)• Project-based working• Next-generation sequence data – used by ordinary biologists• 100+ new applications

• Database index updates• Scientific advisory board• Developer courses: anywhere, any time

Page 17: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.2317

Peter RiceAlan Bleasby

Jon Ison Mahmut Uludag

The Emboss Team

Mon 12:15 Technology Track

Mon 17:45 Poster U43

Wed 13:00 Birds of a Feather

Page 18: Rice Emboss Bosc2009

BOSC: EMBOSS 200912.04.2318

Acknowledgements

• EBI: Peter Rice, Alan Bleasby, Jon Ison, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam

• RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop

• LION: Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold

• Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley

• National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina

• Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux...

• IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, LION bioscience, SciTegic, Accelrys, Cambridge University Press

• Open-Bio Foundation, Sourceforge

• ... And the British Antarctic Survey

http://emboss.sourceforge.net

http://emboss.open-bio.org/wiki