Sierra Taylor-Moxon Database Administrator/Designer [email protected] .

34
Sierra Taylor-Moxon Database Administrator/Designer [email protected] http://zfin.org
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    1

Transcript of Sierra Taylor-Moxon Database Administrator/Designer [email protected] .

Sierra Taylor-Moxon

Database Administrator/Designer

[email protected]

http://zfin.org

My Background

• Database Administrator/Designer: ZFIN

• Database Analyst/Programmer: EPS

• Product Development• Agricultural Research

• Graduated 1999 Whitman College (Biology/Environmental Science)– I wasn’t a CS major!--but I have bioinformatics experience

What is Bioinformatics?

The intersection of computation and biology

• Found in both the private and public sector

• Genomics, Proteomics & Computational biology• Laboratory management and pipeline development.

– High throughput, gene patents, digital lab notebooks

• Modeling and imaging– Drug discovery

• Websites and databases (“Bioinformation”)

People at ZFIN

Our purpose

• Provide a repository of Zebrafish data.

• Design software that interfaces with our database, and allows users to select, insert, update and delete via a web page.– Some users are savvy, others are not.

Why Zebrafish?

• A popular pet that’s easy to care for

• A freshwater fish

• A model organism– Short time between birth and reproduction– Developing embryo is transparent– Vertebrate – Regenerative body parts

ZFIN technical architecture

• Front end:– Apache web server– Webdatablade, Perl, Java, C, SQL (SPL)

• Back end:– Unix (Sun/Solaris) Platform– Informix Relational Database Management

System (RDBMS)

Databases

• MS Access– Filemaker Pro– MS Excel

• MySQL• PostgreSQL• DB2, Oracle, Informix, SQL Server

Why choose Informix over MS Access?

What’s the difference?

MS Access• File based system

– Or ODBC (open database connectivity) to RDBMS

– Files are hosted on a PC, opened directly

• Works great for small number of concurrent users

• Front end tools included• Web server or PC crashes

cause corruption

Informix (RDBMS)– Server manages concurrent

users– Security built into database

server– Data is not corrupted by

web server failure

• Some come with front end tools, others do not

• Support triggers and other kinds of ‘scheduled’ jobs

• Platform independent

What does a DBA at ZFIN do?

1) Doctor

2) Architect

3) Janitor

4) Security guard

5) Liaisonit depends…

1) Keep the database and web servers up.

Doctor

“I think paranoia can be instructive in the right doses. Paranoia is a skill.”

-John Shirley

(Science Fiction novelist)

How?

• Monitor it

• Coordinate backups and test them

• Create and manage development environments

• Plan disk usage

• Plan for upgrades

• Document problems and their solutions.

Interpreting symptoms

• Can’t log into the system, can’t run system commands, or the website is slow.

• Number of users is much higher than usual, some queries will not return data, some will.

• Disappearing data

Stay cool under pressure!

Architect

• There is more than one way to do it.

• Be prepared to do it again.– Biology is the science of

exceptions.

• Do what works.

• There is always a legacy

2) Create the logical and physical data model

A little ZFIN history

• Very few foreign key or primary key constraints

• Test and production on the same machine

• Only two developers

• xfig

ERStudio at ZFIN

http://zfin.org/DataModel

Reverse Engineeringcreate table vocabulary(

voc_zdb_id varchar(50),voc_otype varchar(10),voc_ont_id integer

) in tbldbs1 extent size 16 next size 16 ;

create unique index voc_primary_key_index on vocabulary (voc_zdb_id) using btree in idxdbs2 ;

alter table vocabulary add constraint primary key (voc_zdb_id) constraint voc_primary_key ;

alter table vocabulary add constraint (foreign key (voc_type) references ontology_type constraint voc_name_foreign_key) ;

ontology_type

voc_zdb_id---------------voc_otype voc_ont_id

Logical Model Development

Physical Design

• Column and table definitions

• Placement of data on disk– Data fragmentation

• Performance– Indexes

– “Materialized Views”

create table vocabulary(voc_zdb_id varchar(50),voc_otype varchar(10),voc_ont_id integer

) in tbldbs1 extent size 16 next size 16 ;

create unique index voc_primary_key_index on vocabulary (voc_zdb_id) using btree in idxdbs2 ;

alter table vocabulary add constraint primary key (voc_zdb_id) constraint voc_primary_key ;

alter table vocabulary add constraint (foreign key (voc_type) references ontology_type constraint voc_name_foreign_key) ;

Responsibilities

3) Talk to people• Upgrades• Data model changes• New data• Business logic• New users

BLAST

• Basic Local Alignment Search Tool– Based on the Smith-Waterman algorithm

• Uses substitution matrix (based on the probability that one protein will turn into another protein).

– Performance is increased by matching small areas and building outward

Zebrafish Gene

Human Gene

SmileProtein

SmileProtein

Humans Animal models

Mutant Gene

Mutant or missing Protein

Mutant Phenotype (disease)

Mutant Gene

Mutant or missing Protein

Mutant Phenotype

(disease model)

SHH -/+ SHH -/-Holo-

Prosencephaly(NODAL)

shh -/+ shh-/- oep-/-

(nodal)

1996 1998 2000 2002

mutants discovered

Nodal as cause ofholoprosencephaly

Humans

Zebrafish

genesidentified

Recording and Storing Phenotypes at ZFIN

• Gene matching is easy because all DNA is made up of the same 4 characters.

• Describing the way something looks is subjective.

Vocabulary

Animal

Digestive System

Organ System

Stomach

Gut

Visceral organ system

Ontology

Animal

Organ system

Visceral organ system

Digestive system

Gut Stomach

shh-/-

eye placement abnormal+ +

Phenotype = entity value attribute+ +

brain size small+ +

kidney size hypertrophied+ +

Humanprosencephaly

Zebrafishshh

Zebrafishoep

Human gene:SHH;OMIM:600725 Zebrafish gene:shh Zebrafish gene:oep Ref: OMIM:142945 Ref: ZDB-GENE-980526-166 Ref: ZDB-GENE-990415-198 Entity:prosencephalon development Attribute:process Value:arrested

Entity:prosencephalon development Attribute:process Value:reduced

Entity:prosencephalon development Attribute:process Value:arrested

Entity:brain Attribute:size Value:small

Entity:brain Attribute:size Value:small

Entity:brain Attribute:size Value:small

Entity:brain ventricle Attribute:number Value:single

Entity:brain ventricle Attribute:number Value:single

Entity:midface Attribute:structure Value:hypoplastic

Entity:midface Attribute:structure Value:hypoplastic

Entity:midface Attribute:structure Value:hypoplastic

Entity:eye Attribute:morphology Value:abnormal

Entity:eye Attribute:morphology Value:abnormal

Entity:eye Attribute:morphology Value:abnormal

Entity:eye Attribute:number Value:single

Entity:eye Attribute:number Value:single

Entity:eye Attribute:placement Value:mislocalized

Entity:eye Attribute:placement Value:mislocalized

Entity:eye Attribute:placement Value:mislocalized

Entity:nose Attribute:morphology Value:abnormal

Entity:nose Attribute:morphology Value:abnormal

Entity:nostril Attribute:number Value:single

Entity:nostril Attribute:number Value:single

Entity:kidney Attribute:rel_size Value:hypertrophied

Entity:kidney Attribute:rel_size Value:hypertrophied

Entity:kidney Attribute:rel_size Value:hypertrophied

Data type Total

Total ZFIN genes 16,811

Genes with assigned human orthologs 1,615

Total ZFIN mutants 2,827

ZFIN mutants with potential human references

528

More information

• Genomics, proteomics– http://www.ncbi.nlm.nih.gov/gquery.fgi

• Algorithm Development– http://blast.wustl.edu

• Imaging– http://genex.hgu.mrc.ac.uk/Atlas/intro.html

• Websites and databases– http://zfin.org