Bioinformatics from a drug discovery perspective EMBRACE Workshop, 22-23 March 2007 Niclas Jareborg...
-
Upload
cuthbert-reeves -
Category
Documents
-
view
217 -
download
2
Transcript of Bioinformatics from a drug discovery perspective EMBRACE Workshop, 22-23 March 2007 Niclas Jareborg...
Bioinformatics from a drug discovery perspective
EMBRACE Workshop, 22-23 March 2007
Niclas Jareborg
AstraZeneca R&D Södertälje
AstraZeneca Drug Discovery• Research Areas
• CV/GI (Cardiovasc/Gastrointest), RIRA (Resp/Infl), CNS/Pain, Cancer, Infection
• Discovery Sites• UK
– Charnwood (RIRA), Alderley Park (Cancer, CV/GI, RIRA)
• North America– Boston (Cancer, Infection), Willmington (CNS/Pain), Montreal (CNS/Pain)
• Sweden– Lund (RIRA), Mölndal (CVGI), Södertälje (CNS/Pain)
• India– Bangalore (Infection)
• Bioinformatics• All RAs have their own bioinformatics teams• Infrastructure at Alderley Park (db:s, large Linux clusters)
– IS organisation
A target is defined as…
• ... a biological target protein on which a chemical entity (e.g. a drug molecule) exerts its action
• A drug target must be associated with a disease
Drug discovery processTarget identification
Target validation
Hit identification (HTS)
Hit to lead (Lead identification)
Lead optimisation
Candidate drug
Clinical trials
Protein
Assay
Compound library
HitGenes
Effort
Target Definition• Alternative Splicing
• Identify pharmacologically relevant target variant(s)
• Sequence variation• Function
– Target– Metabolizing enzyme
• Binding of substance• Identify most common variant
– Might differ in different populations!
Target Definition• Expression
• Is the target expressed in a relevant human tissue?
• Databases
– Microarrays
– Immunhistochemistry
– In situ hybridization
– Proteomics
• Literature
Target Definition• Selectivity
• How similar are related proteins?• Do similar proteins have functions that we do not want to affect?
• Animal models• Orthologous genes
– Same family size?
• Splice variants– Same as in human?
• Polymorphisms– Differences between inbred strains
• Tissue expression– Overlap human?
• Available transgenes or knock-outs
LaunchDevelopment for Launch
RegistrationCD Pre-nomination
Hit Identification LeadIdentification
LeadOptimisation
Research Development Commercialisation
Support target
identification flag up population variants in target
MS5MS1 MS2 MS3 MS4
Primary screeningIdentify polymorphic and splice variants
Selectivity screeningIdentify paralogues
TargetIdentification
Bioinformatics input to the drug discovery process
Sales
Genetics &Bioinformatics
Support Biomarker identification
Support choice of model organism(s)
In-house generated gene centric information resource
DNA and protein sequenceSimilarity to other species
Splice variants
Genetic mutations
Tissue expression
DNA and protein sequenceSimilarity to other species
In-house generated gene centric information resource
Splice variants
Genetic mutations
Tissue expression
Pathways
Patents
Gene symbol Synonyms
Literature
Functional motifs
Target identification
ESTs sequencingcampaigns
Target Candidates
Validation (in silico, lab bench)
Validation as potential targets
Micro arrays (Affymetrix, glas etc.)
Proteomics
Specificity / selectivity
Targets from different experimental approaches aswell as validation using different technologies
Genetics/genome information
Differential biology
In silico
Literature
Target identification
~30000 human genes
1 potential target
What?
Where?
Novel?
Link to disease?
The human genome offers many potential drug targets
Samuel Svensson, PhD
AstraZeneca R&D Södertälje
Current Drug Targets - few target classesBased on 483 drugs in Goodman and Gilman's "The Pharmacological basis of therapeutics"
Enzymes28%
Nuclear Receptors
2%
DNA2%
Unknown7%
Ion-Channels
5%
Hormones & factors
11%
Receptors (GPCRs)
45%
~2-3.000 druggable targets
< 5.000 targets for small molecule drugs
< 5.000 targets for small molecule drugs
~30000 humangenes
Only a subfraction of gene productsplay a direct role in diseasepatophysiology
Druggable genome ~2-3.000 genes;500 GPCRs, 50 NHRs,>200 ion channels, >1.000 enzymes(e.g. 450 proteases, 500 kinases, >200 others)
pathogens & commensalgut bacteria genes
Number of druggable targets smaller than expected?
Updating the (shrinking?) “Targetome”
Down to 22K ? (see) PMID: 15174140
Some of the 120 InterPro domains are unpromising – many potentials still functional orphans
– realistically nearer 2000 ?
OMIM still only at 1900 and only low numbers of “robust” genetic association results
Current trends
• “Blue sky genomics” -> literature
• Finding “unknown” targets -> prioritizing the lists
• Moving from single target focus• Comparing and ranking of target candidates
– Integration of relevant but disparate data sources
• Better understanding of the target “neighbourhood”– Disease mechanism
– Biomarkers
– Toxicology
Sources of Contextual Information• Structured • Unstructured
20% 80%
• Internal Docs:
– Tox Reports, Clinical Trial Reports.
• External Docs:
– Patents; USPTO, WIPO, EP, etc
– Literature; Medline, Embase– Press Releases:
– competitor, supplier, collaborator, academic (etc)
– Government Agencies– Conference Proceedings– News Feeds
• Internal Chemical Dbs• Internal Biological Dbs• External, Commercial Dbs
– GVK Bio, Ingenuity IPA…
• External Public Dbs
– EMBL, PDB, SNPdb, etc
Mature Technology Emerging Technology
Current approach to retrieving information from unstructured sources is through
manual extractionI.e. Finding documents and reading them!
Dissecting the Decision Making Process
• Locating relevant documents and information• Retrieving them in a useable format
• Reading information• Locating the facts within documents
• Understanding what it means• Putting the information into context• Turning information into knowledge
• Developing new hypotheses• Input into decision making
Finding Extracting Integrating Creating
Finding Extracting Integrating Creating
• Difficult to capture breadth• Chance to miss things• “White space” in failing to find things
• Limited time to read things• Focus on reviews and summaries
• Based on individual scientists own knowledge• Narrow• Biased
• Hypotheses are “per project”• Reactive not proactive
Issues with the Manual Approach
Text mining• Sources
• Literature• Patents• In-house reports
• Information• Protein-protein interactions• Tissue expression• Pharmacological differences
– Splice variants, Polymorphisms– Species
• Toxicology• etc
• Extraction of facts from unstructured data sources• Natural Language Processing, Ontologies• Linguamatics I2E• Knowledgebase generation
Emerging Systems:Text Mining
Biomedical Entity-Relationship Data
CASP9
PARP
CASP3 CASP8
BCL2
Co-published
Co-published Co-published
Co-publishedCo-published
Co-published
Co-Published Information
Activates
BindsBinds
Inactivates
Activates
Binds
Gene:Gene Semantic Relationships
Inc Expression
Gene:Disease Semantic Relationships
Neoplasia
Hyperplasia
Associated with
ActivatesIncreases
Gene:Metabolite Semantic Relationships
ADP-ribose
Synthesizes
MTPN
TNF
Thalidomide
Inhibits
Gene:Chemical/Drug Semantic Relationships
www.ingenuity.com
Pilot Systems:Pathway Analysis: Ingenuity IPA
BER System in Action
Gene Expression
Proteomic
Metabonomic
Significant Biological Entity List:•Gene List•Protein List•Metabolite List
Genetic
ERSystem (Gene/MetaboliteKnowledgebase)
Biological environmentof the list.
Question: What is the underlying biology, pathology, physiology etc associated with this list of entities?What is it telling me?
Canonical pathways associated with the list
Diseases, Biological processes associated with the list
LiteratureEvidence Trail
Hyp
oth
esis Gen
eration
SpeciesHumanRatDogEtc.
Affects
Affects
Affects
Involved in
Involved in
Is a
Linked with
Linked with
Linked with
Observed in
Structuring the KnowledgeDelivers facts as networks of information: Knowledge Bases
Clinical ObservationsDiarrhoeaVomitingLoose StoolsBloatingNauseaEtc.
Cellular Processes
Compound Genes
Observed in
PathologyGI toxicityGI pathology
GI Tox Knowledge Map
Data source integration
CIRA TSR Interface
Disease/Target KB
DataMart
ETL: Biz rules, scoring
ETL
DiseaseKB Interface
DataMart
Complex Data Query
Automated ETL engines
Genes Expression Targets Chem Literature Patent CI
Focused NLP Extraction
Direct Project QueriesOntologies Ontologies
CVGI TSR Interface
DataMart
Viz
ua
lisa
tion
Re
pre
sen
tatio
nE
xtra
ctio
n
Workflow technology• Enables scientists to use, modify and implement solutions that
specialist groups help them put in place; removes (in principle) the need to make extensive IS projects for new data types.
The Knowledge Technology ZigguratD
eci
sio
n M
aki
ng
Pro
cess
Fin
dE
xtra
ctIn
teg
rate
Cre
ate
Builds on
Builds on
Builds on
Builds on
Content Licensing & Access
Document Retrieval and Storage
Fact Extraction (Text Mining)
Information Structuring
Modelling
Content Licensing & Access
Document Retrieval and Storage
Fact Extraction (Text Mining)
Knowledge Structuring
Modelling
Unstructured Information
Developing semantic relationships
KNOWLEDGE BASES
Current focus
Systems biology
SequencesPatented inhibitors
Literature inhibitors and PDB ligands
HTS, foussed screens & project SAR data
Docking & virtual screening
Fingerprint structure search
Competitor compounds
Library and fragment data
AZ protein and ligand structures
Links to endogenous ligands & modulators
Expression data, gene structure, SNPs & splicing
Families of known targets
Sequence alignment structure hom. modelling
Cross-species (orthology) comparisons
Sequences gene names disease literature links
Functional genomics mouse fish yeast
Linking non-homologs with analogous mechanisms
and binding pockets Chemistry
Structures
“Bio” and “Chemo” Informatics Joins to Aid Target Selection
ClinicalPractice
Biology
Chemistry
What do we need to do ?
Testicular DegenerationTesticular Degeneration
Term Associationvia Text Mining
Candidate CompoundCandidate Compound
Proteins
Ligand-Protein Associationvia Experimental & Virtual Methods
Hypothesis GenerationUsing
Informatics/Modelling
A multidimensional jigsaw puzzle• Target - Biological mechanisms - Disease• Target/Off-target - Biological mechanisms - Toxicology
• Polymorphisms• Splice variants• Interaction partners• Tissues• Compounds• Animal models• etc etc etc…
Current needs
• Pathways / Systems biology
• Mining of unstructured data
• Connect biology and chemistry informatics domains
• System / data integration• Ontologies!
• Workflow technology
AZ - EBI
• AZ member of the Industry programme• Training and Education
• Network meetings
• Research, Standards