The pragmatic text miner: From literature to electronic health records

Post on 10-May-2015

892 views 3 download

Tags:

Transcript of The pragmatic text miner: From literature to electronic health records

Lars Juhl Jensen

The pragmatic text miner

From literature to electronic health records

why text mining?

data mining

guilt by association

structured data

unstructured text

biomedical literature

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

dictionary-based approach

identification required

dictionary

cyclin dependent kinase 1

CDC2

expansion rules

CDC2

hCdc2

flexible matching

cyclin dependent kinase 1

cyclin-dependent kinase 1

“black list”

SDS

>10 km<10 hours

the formal way

benchmark

manually annotated corpus

automatic tagging

compare

quality metrics

precision

recall

F-score

manually annotated corpus

use existing corpus

not new

make new corpus

hard work

natural language processing

part-of-speech tagging

semantic tagging

sentence parsing

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

handle negations

directionality

high precision

poor recall

highly domain specific

the pragmatic way

benchmark light™

requires fewer calories

non-annotated corpus

automatic tagging

random sampling

manual inspection

precision

no recall

relative recall

compare methods

co-mentioning

within documents

within paragraphs

within sentences

weighted score

benchmark

associations good?

tagging good enough

unifying text & data

web resources

text mining

curated knowledge

experimental data

computational predictions

common identifiers

quality scores

proteins

STRING

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

small molecules

Kuhn et al., Nucleic Acids Research, 2012

compartments

compartments.jensenlab.org

tissues

tissues.jensenlab.org

diseases

environments

electronic health records

Jensen et al., Nature Reviews Genetics, 2012

structured data

Jensen et al., Nature Reviews Genetics, 2012

unstructured data

clinical narrative

comorbidity

Jensen et al., Nature Reviews Genetics, 2012

Roque et al., PLoS Computational Biology, 2011

in Danish

by busy doctors

confounding factors

age and gender

reporting bias

temporal correlation

diagnosis trajectories

Jensen et al., in preparation, 2013

pharmocovigilance

adverse drug reactions

Eriksson et al., in preparation, 2013

ADR profiles

Eriksson et al., in preparation, 2013

ADR frequencies

Eriksson et al., in preparation, 2013

Acknowledgments

STRINGChristian von MeringDamian SzklarczykMichael KuhnManuel StarkSamuel ChaffronChris CreeveyJean MullerTobias DoerksPhilippe JulienAlexander RothMilan SimonovicJan KorbelBerend SnelMartijn HuynenPeer Bork

Text miningSune FrankildEvangelos PafilisAlberto SantosKalliopi TsafouJanos BinderLucia FaniniSarah FaulwetterChristina PavloudiJulia SchnetzerAikaterini VasileiadouHeiko HornMichael KuhnNigel BrownReinhard SchneiderSean O’Donoghue

EHR miningRobert ErikssonPeter Bjødstrup JensenAnders Boeck JensenFrancisco S. RoqueHenriette SchmockMarlene DalgaardMassimo AndreattaThomas HansenKaren SøebySøren BredkjærAnders JuulTudor OpreaPope MoseleyThomas WergeSøren Brunak