PORTING HMMER AND INTERPROSCAN TO THE GRID

Post on 02-Feb-2016

47 views 0 download

Tags:

description

PORTING HMMER AND INTERPROSCAN TO THE GRID. Daniel Alberto Burbano Sefair ( dburbano@uniandes.edu.co ) Michael Angel Pérez Cabarcas ( mic-pere@uniandes.edu.co ) University of The Andes Information Technology Division Colombia November 2008. Topics. Introduction HMMER InterProScan - PowerPoint PPT Presentation

Transcript of PORTING HMMER AND INTERPROSCAN TO THE GRID

PORTING HMMER AND INTERPROSCANTO THE GRID

Daniel Alberto Burbano Sefair (dburbano@uniandes.edu.co)

Michael Angel Pérez Cabarcas (mic-pere@uniandes.edu.co)

University of The Andes

Information Technology DivisionColombia

November 2008

Topics

• Introduction• HMMER• InterProScan• What do we have?• What do we want with your help?• Questions

INTRODUCTION

• Our users, from Biologic department, want to use HMMER and InterProScan by an easy way saving processing time.

– Graphic User Interface instead of command line interface.

– They are few users that submit many jobs (1000 - 3000).

– Submit jobs with files upper than 10 MB.

– Reduce the processing time using other computers.

– Depend of the job, the time could be 1 h to 12 h.

– Some jobs from InterProScan fail, and must be submited again.

1. What is HMMER?

- “HMMER is a sequence analysis tool using profile Hidden Markov Models”.

- It is a set of 9 applications used by command line:

hmmpfam, hmmsearch, hmmalign, hmmbuild, hmmconvert, hmmcalibrate, hmmemit, hmmindex, hmmfetch.

The above definition is taked from: ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf

Home page: http://hmmer.janelia.org/

HMMERProfile Hidden Markov Models

2. How can I use HMMER by command, PBS, and JDL?HMMER is a command line application, this is an example

hmmsearch file.hmm MySequence.fasta >> output

HMMER

1. What is InterProScan?

The following definition is taked from Europan Bioinformatic Institute: http://www.ebi.ac.uk/2can/tutorials/function/InterProScan.html

“InterProscan is a tool that combines different protein recognition methods into one resource. It scans a given protein sequence against the protein signatures of the InterPro member databases (PROSITE, PRINTS, Pfam, ProDom, SMART, TIGRFAMMs.”

Home Page: http://www.ebi.ac.uk/Tools/InterProScan/

InterProScan

2. How does InterProScan work?

1. The User submit a protein sequence.

2. Protein sequence applications are launched and search against specific databases.

3. Each application returns a list of hits.

4. The results are combined.

5. The information returned to the user

1

2 3

4

InterProScan

Infomration and Sshema are taken from: http://www.ebi.ac.uk/2can/tutorials/images/scan_schema.gif

3. How can I use InterProScan by command, PBS, and JDL?

InterProScan is a command line application, this is an example

iprscan -cli –I input.seq -o test.out -format raw -goterms -iprlookup

InterProScan

What do we have?

• Bioinformatic Grid Wrapper (BGW) for HMMER and InterProScan that is a Command Line Interface (CLI)

What do we want with your help?

Architecture

Thanks

?

• “Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER is a freely distributable implementation of profile HMM software for protein sequence analysis.”