Galaxy History: Genome Informatics 2008

60
Galaxy http://galaxy-project.org James Taylor, Emory University

description

Talk on Galaxy at Genome Informatics 2008, I was a session chair so no oversight on this one. We were obsessed with authorization that year, and this talk is probably the most detailed ever on roles, groups, and dataset security in Galaxy. Another classic team slide.

Transcript of Galaxy History: Genome Informatics 2008

  • Galaxy http://galaxy-project.org James Taylor, Emory University
  • Galaxy?
  • Galaxy goals Making large-scale computational analysis more accessible Facilitating transparent analysis Ensuring that analyses are reproducible
  • What Galaxy provides An open-source framework for integrating various computational tools and databases into a cohesive workspace A web-based service we provide, integrating many popular tools and resources for comparative genomics A completely self-contained application for building your own Galaxy style sites
  • So, what about all this data?
  • Tool suites
  • What is a Galaxy Tool? The basic unit of analysis in Galaxy A program, script, external web resource, whatever... Adapted to a standard structured interface Parameters, data inputs, data outputs
  • Short read sequence analysis Analyzing read quality and filtering Genomic analysis Mapping against assembled genomes Coverage, polymorphism, ... Metagenomic analysis Mapping against sequence databases Taxonomy analysis, visualization, ...
  • Statistical Genetics Quality control and filtering Estimating ancestry and correction Case control analysis ...
  • Data and analysis management
  • The Galaxy History
  • Beyond the history
  • Beyond the History I Workflows
  • Galaxy workflows Abstract description of an analysis procedure Essentially: what tools to run, and the flow of data between tools
  • Beyond the History II Data Libraries
  • Galaxy Data Libraries Mechanism for storing and organizing shared datasets in a Galaxy instance An instance can have many libraries, each containing datasets organized using folders as well as tags Full type specific metadata like any other dataset in Galaxy
  • Driving use cases Large shared datasets Genotype data Sequencing reads Direct from the instrument! Data management for distributed projects
  • What about protected data?
  • Galaxy dataset security Fine grained access controls for Galaxy datasets Dierent actions on datasets require dierent permissions Users and groups are granted these permissions Enforced throughout Galaxy e.g. a History can still be shared, but access to individual datasets in the history is controlled
  • Security customization Authentication mechanism can be replaced, or can leverage a single sign-on mechanism (e.g. through a proxying web server) Authorization provider can be customized or replaced
  • Completely integrated with analysis Dataset restrictions propagate through an analysis Analyses that combine datasets also combine their restrictions
  • Up next... Libraries: sequencer integration versioning tagging and annotation automatic workflow triggering Security configurable adapters to dierent authorization providers (e.g. directory services)
  • Acknowledgements Data and browser connections UCSC Biomart GMOD Intermine Funding National Science Foundation Huck Institutes, Pennsylvania Dept. of Health
  • The Galaxy Team Guru Ananda | Penn State Dan Blankenberg | Penn State Wen-Yu Chung | Penn State Nate Coraor | Penn State Greg Von Kuster | Penn State Sergei Kosakovsky | UCSD Ross Lazarus | Harvard MS Anton Nekrutenko | Penn State
  • p.s. I have job openings for people who like to do cool stu: [email protected]