Which Drug Did You Mean ?
-
Upload
chris-southan -
Category
Technology
-
view
1.052 -
download
1
description
Transcript of Which Drug Did You Mean ?
[1]
Which Drug Did You Mean?Resolving the linkage spaghetti between semantic names, structures, bioactivity and mixtures
Christopher Southan
ChrisDS Consulting, Göteborg, Sweden,
Prepared for BioIT, Boston, April 2012, Track 14, Tuesday
See also
http://cdsouthan.blogspot.se/2012/06/will-real-bosinhib-please-stand-up-take.html
[2]
History of Drug Names
Approximate timelines
[cpd registration system structure and ID------------------------------------------------------------]
[patent IUPAC or image--------------------------------------------------------------------]
[internal code name(s) externally blinded-------]
[code name(s) > structure declared externally -----]
[journal papers -----------------------------------------------------------------------]
[International Non-proprietary name INN]
[INN indexed in MeSH-----------------]
[USAN, BAN, JAN --------------------]
[brand name(s)-------------------]
[combination brand ]
[3]
History of Atorvastatin
• 1985: (3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-(propan-2-yl)-1H-pyrrol-1-yl]-3,5-dihydroxyheptanoic acid IUPAC
• ~ 1987: Park-Davis internal code number CI-981• ~ 1995: Atorvastatin [INN:BAN] Atorvastatin calcium [USAN], Atorvastatin calcium
trihydrate INN (error ?) Atorvastatina (Spain) • 1997 Lipitor (brand name) Faboxim (Argentina) Zurinel (Chile) etc • 2004: Caduet (brand name) Norvasc (amlodipine besylate) and Lipitor(atorvastatin
calcium)• 2012: atorvastatin calcium – generic - Ranbaxy• 2012: amlodipine besylate and atorvastatin calcium – generic - Ranbaxy
[4]
• Tautomer/stereo mutiplexing and structure interconversion differences (e.g. complex antibiotics)
• Popular structures > 100s of submitters > many vendors > more noise
• Opaque ecosystem of primary submitters, secondary linkers, declared circularity, cryptic circularity, and submitters having independent portals with different rules
• Older drugs accumulate 100’s of synonyms and database x-refs, with erros
• Accumulated wet assay results are dependent on how long the drug has been in which public screening collection
• Deprecated structures not always refreshed between databases globally
• Pro-drugs, metabolites or tested combinations rarely have explicit x-refs
Causes of Drug Linkage Spaghetti (I)
[5]
• Literature extractions flowing into drug databases (including MeSH) can have– Author errors and paucity of standards in the primary report– No quality filtration at the result level– Curation errors and different annotation rules– No discrimination of independent de-novo checking from annotation recycling
• Large-scale patent extraction feeds into databases bring in– Forests of analogues with no data links– High redundency for drugs and leads – Structural differences between pipeline outputs– Opportunistic permutations of salts and mixtures– Opportunistic virtual deuteration of all best-selling drugs
• Drug discovery operations use many drugs as reference compounds in their internal screening collections . This means– Name > structure cross-mapping, internal, public and commercial – Integration of internal and external data across the same drugs
Causes of Drug Linkage Spaghetti (II)
[6]
Atorvastatin • The scale of links provides a good cross section of problems
• Relationship cross-mappings and the PubChem tool-box facilitate navigation through the links
• External submissons get a substance ID (SID) which are merged to compound records (CID) vi chemistry rules (see PubChem documentation)
• This drug has accumulated years of submissions from different sources, BioAssay entries and pharmacology literature links
• The parent CID 60823 has– 99 synonyms– 6 stero forms– 70 cannonicaly-related structures– 449 substance records
•
[7]
What is Atorvastatin ? - for Patients
[8]
Atorvastatin - for Informaticians
PubChem CID 60823
Wikepedia
ChemSpider 54810
DrugBank APRD00055
CHEMBL1487
CAS 134523-00-5
PubChem submissions include: (3R,5R) CID 60823(5R) CID 51052072(3R) CID 21029434(3S,5R) CID 6093359(3S,5S) CID 62976No stereo CID 2250
Query: Same, Isotopes for PubChem Compound (Select 60823)
[9]
Name Retrieval Specificity (I)
[10]
Name Retrieval Specificity (II)
”atorvastin” in DailyMed link not synonyms
[11]
Drug BioAssay Data: Splitting by Submitted Structure Differences
AIDs 406848-53 in ChEMBL – (antimalarial assay specified salt)
Mainly uHTS and counterscreens from Scripps & Burnham
ChEMBL Antimalarial strain assays (also specified salt), in vivo plus three target links
Mainly qHTS from NCGC, no hits
[12]
Pharmacological Activity in vivo is ~70% Active Metabolites i.e. not Atorvastatin
CID 9851106
CID 9808225
CID 60823
Hazardous Substances Data Bank x-ref in the CID, but no direct links to the metabolites (yet). Only one in-vitro assay result for 9808225
[13]
Salt Confusion (I) Atorvastatin Calcium
CID 60822 Mw 1155CAS 134523-03-8
CID 656846 Mw 1209CAS 344423-98-9
CID 11227182 Mw 598
INN = atorvastatinUSAN/BAN = atorvastatin calcium
FDA packege insert lable, hemicalcium trihydrate
[14]
Salt Confusion (II): What gets to Patients
CID 53252956
CID 656846
CID 23665101
No INNs, USANs or clinical trials entries for these salts
[15]
Mixtures: Problematic all Round• Atorvastatin parent (CID 60823) has 379 mixture SIDs and 147 mixture CIDs
permuatated from 122 component CIDs • Of the 122 components 58 have a MeSH pharmacology tag, 92 have
BioAssays results, 70 are in DrugBank, 101 are in ChEMBL, and 47 are below 200 mw (and thus probably salts not drugs)
• Of the 147 mixture CIDs, only the 2 atorvastatin dimers have assay results or pharmacology so none of the drug mixtures have direct data links
• None are in DrugBank CIDs and only atorvastin calcium is in ChEMBL• 138 of the 147 have been extracted from patents by Derwent/Thomson and are
unlikely to get data links• The small number of important drug combinations that do have data and/or
trial results are difficult to identify• Tested drug mixtures rarely get public code names, some get trade names but
never INNs• Chemistry rules may split mixtures and synonyms in databases• PubMed "Drug Combinations"[MeSH Term] = 54,186 but no SID or CID links• Mixture components can be designated with space, / , + or ”co”
[16]
The Famous Polypill: A Fuzzy term
CID 44602839 Thomson Pharma 18 clinicaltrials.gov entries, but only partial component links
aspirin 81 mg, enalapril 2.5 mg, atorvastatin 20 mg and hydrochlorothiazide 12.5 mg (polypill) PMID: 21647425: Australian New Zealand Clinical Trials Registry ACTRN12607000099426
DrugBank and TTD negative
[17]
Caduet: an Approved Combination
http://clinicaltrials.gov/ct2/show/NCT01107743
Drugbank Wikipedia
[18]
Submitter Synonym Noise in PubChem
[19]
A more Recent Combination
But, QA149 is negative in PubChem, DrugBank and TTD
[20]
Spaghetti is Resolvable but Errors are Tough:Will the Real LX4211 Please Stand up ?
http://cenblog.org/the-haystack/2012/03/liveblogging-first-time-disclosures-from-acssandiego/
See also: http://cdsouthan.blogspot.se/2012/03/live-chemical-structure-blogging-but.html
[21]
Summary
• You can navigate the linkage spaghetti in name, synonym, structure bioactivity and mixture space, but this needs perspicacity and circumspection.
• The current drug information ecosystem with multiple stakeholders seems destined to remain ”fuzzy”
• Beyond informatics challenges the consequences, particularly from frank errors, could be more serious
• WHO INNs and naming stems play a key positive role – but ;– No open athoritative database - only 7000 PDF entries (!)– No transparent coordination between USAN, FDA, MeSH, national offices, or
clinical trials registries– Susceptable to commercial flanking tactics
• Drug combinations have a bright pharmacological future but a difficult informatics one
• The fuzz includes scientific challenges (e.g. complex strucutures, dynamic tautomerism, active metabolites, formulation differences, paucity of standardised and comparable activity data.
• Efforts are being made to improve the situation, including from the databases represented in this Workshop session.
[22]
Questions WelcomeChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htm
Mobile: +46(0)702-530710, Skype: cdsouthan
Email: [email protected]
Twitter: http://twitter.com/#!/cdsouthan
Blog: http://cdsouthan.blogspot.com/
LinkedIN: http://www.linkedin.com/in/cdsouthan
Website: http://www.cdsouthan.info/CDS_prof.htm
Publications: http://www.citeulike.org/user/cdsouthan/publications/order/year
Citations: http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=en
Presentations: http://www.slideshare.net/cdsouthan
FYI : A short piece on identifying the names and molecular details of drugs in clinicaltrials.gov
http://www.samedanltd.com/magazine/13/issue/166/article/3152