GRIHO Research Group, INSPIRES Research Centre, Universitat de Lleida
Roberto García, Josep Maria Brunetti*, Rosa Gil, Jordi Virgili, Toni Granollers
Multilingual Ontology for Plant Health Threats
Media Monitoring(A Smart Data Approach)
Media Monitoring for New and (Re)Emerging Plant Health Threats• Project: development and testing of the media monitoring tool
MedISys for the early identification and reporting of existing and emerging plant health threats
• Timing (duration): January 2014 – June 2016 (2.5 years)• Funding: EFSA• Coordination: Universitat de Lleida (UdL)• Partners: IRTA and UdL• Other participants: Joint Research Centre (European Commission)
• Objectives: • Collate new and appropriate media information sources• Multilingual ontology for the global identification of emerging new plant health threats to be appended to MedISys
• English, Spanish, Italian, French, Dutch, German, Portuguese, Russian, Chinese and Arabic
• Develop and test strategies to monitor re-emerging plant health threats on global and regional scale• Analyse and test approaches to report identified signals to EFSA Units and experts through MedISys
Approach
• Ontology: key component of the developed system that structures and provides knowledge about plant health threats• Knowledge captured from existing sources and experts• Guides applications for
• Knowledge capture• Indirect sources search• Terms translation• Media monitoring categories generation
3
Ontology Skeleton• Collected 140 pests/diseases from EPPO Alerts, 2000/29-1-A-1 and EU
Emergency Control Measures• 117 linked to UniProt Taxonomy:• Taxonomical information, scientific/common/other names,…
• 47 linked also to Wikipedia• Common names in multiple languages
4
Plant Health Threats Ontology• Enrich ontology with affected crops, hosts, vectors, symptoms
expressions…
5
Plant Health Threats Ontology• All concepts linked to labels in different languages• Extract as keywords for MedISys or Web search filters,…• Example: “Maladie de Pierce” OR ( “grapevine” AND “sharpshooter” )
6
Xylella fastidiosa
Gammaproteobacteria
Nerium oleander, Prunus salicina, Medicago sp., Sorghum halepense,…
Homalodisca coagulata, Graphocephala sp., Oncometopia sp.,
Draeculacephala sp.,…
Grapevine, Citrus, Olive, Almond, Peach, Coffee,…
subClassOf
vector
hostcrop
“Pierce's disease”, “Citrus variegated chlorosis” en
“Maladie de Pierce” fr
“ 葉緣焦枯病菌” zn
“Glassy-winged sharpshooter”, “Spittlebugs”, “Froghoppers”,“Planthoppers”,… en
“vite” it,… …
Ontology Editor• Assist experts during the knowledge capture process
7
http://indagus.udl.cat/medisys/editor/
Ontology Editor – forms with assistance
8
Ontology Editor - autocomplete
9
Ontology Editor - symptoms form
10
Semi-automatic Translation•
11
Multilingual Ontology• Threats names• 1609 terms• 27 languages
Not available61738%
Latin37523%
English26216%
French815%
German684%
Spanish654%
Japanese211%
Dutch171%
Italian161%
Portugues151%
Finish8
0%
Chinese7
0%
Russian6
0%
Other513%
Ontology - symptom expression• Symptom Expression = symptom + plant part • Set of symptoms and plant parts from CABI form and Plant Ontology
• 37 symptoms: – abnormal fall, premature fall– abnormal patterns, chlorotic rings– abnormal shape, malformation, distortion– boring, drilling, internal feeding, mining, tunnelling– canker– chlorosis– colour inversion, colour inversion– curling, curl– dieback– discoloration, discolouration– dwarfing– early senescence, premature senescence– empty– feeding– frass– gummosis– lesion, lesions– mottled, mottle– mummification, wrinkled, hard skin
– dead, death, necrosis– odour– premature drop– premature ripening– reddening– reduced size, smaller– resinosis– roll, rolling– rosetting– rot, rotting– burn, scorch– splitting– stunting– thicker– fallen, toppled, falling– rooted out, uprooted– wilt, wilting– yellowing
356 terms for symptoms
Ontology - symptom expression• Symptom Expression = Symptom + Plant Part• 6 Plant Parts:– fruit– plant, tree, whole plant– bud, sprout– stem– seed, seeds– leaf, leaves
•Examples:– Whole Plant Dwarfing– Leaf Scorch– Stems Stunting– Leaf Reddening– Fruit Premature Drop– Seeds Discoloration – Leaf Mottle
96 plant part terms
Ontology Browser• Complex queries• Example: “all threats with symptoms affecting the leaves”
http://indagus.udl.cat/plantHealthThreats/
Identification of Information Source to Monitor• Objective: collect relevant information sources to be monitored by
MedISys• Methodology• Identify information sources already known by experts, previous research
projects, official sources like EPPO, journals,… Direct Sources
• Identify web information sources (newspapers, blogs, webs, etc.) unknown discovered using search engines and ontology terms
Indirect Sources
• Analyse and evaluate all collected sources using Information Quality measure• First , filter duplicates, irrelevant, non-monitorable, etc.
Methodology Plant Health Threats Sources Inventory
Known Sources Web Search
Reference resources
(expert knowledge)
Existing projects related to pest and food/feed
risks (EFSA)
MedISys sources (JRC)
Filtering and Evaluation
process
List of relevant sources
List of relevant sources
Filtering process
(avoid duplicates & evaluation)
Final list
Search Mechanisms
(query Process)
1956 sources(72 known + 1884 web search)
Ontology
Monitor Known Threats• Known threats: explicit mention of the threat name
• Generate automatically from ontology
• MedISys category for each threat withlist of keywords (terms) with threshold
• 117 categories for known threats:• Bacteria: Xylella fastidiosa, Acidovorax citrulli,… (6)• Fungi: Ceratocystis fagacearum, Diplocarpon mali,… (18)• Insects: Agrilus coxalis auroguttatus, Agrilus planipennis,… (54)• Mollusks: Pomacea (1)• Nematodes: Bursaphelenchus xylophilus, Nacobbus aberrans,… (7)• Oomycetes: Phytophthora ramorum (1)• Phytoplalsma: Elm yellows phytoplasma, Candidatus Phytoplasma pruni,… (7)• Viroid: Tomato apical stunt viroid, Potato spindle tuber viroid (2)• Virus: Andean potato latent virus, Andean potato mottle virus,… (21)
http://medisys.newsbrief.eu/medisys/groupedition/en/PlantHealthAll.html
18
Keyword sources Threshold
Scientific names 100
Common names (all languages) 100
Other names 100
Monitor Unknown Threats• Unknown Threats: name not explicitly mentioned• Approach 1: manual generation of MedISys categories by experts
http://medisys.newsbrief.eu/medisys/filteredition/en/EFSAUnknownPestFilteredEmailAlert.html
19
A combination of Combinations (Proximity: 15)
at least one of alien, danger, dangerous, deadly…
and at least one of agricultural, agriculture, almond…
and at least one of bacteria, bacterial, crop+failure,…
but none of allergies, allergy, animal+abuse,…
Monitor Unknown Threats• Approach 2: automatic generation from ontology (multilingual)
• Concepts associated to the threats (but not their names)• Affected crops, vectors, hosts, symptoms, plant parts,...
• Currently, the ontology models the symptoms for just 7 threats:• Phytophthora ramorum, Anoplophora glabripennis, Bactrocera tryoni, Agrilus planipennis, Xylella fastidiosa, Candidatus liberibacter and
Rhynchophorus ferrugineus• http://medisys.newsbrief.eu/medisys/alertedition/en/AgrilusPlanipennis-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/AnoplophoraGlabripennis-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/BactroceraTryoni-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/CandidatusLiberibacter-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/PhytophthoraRamorum-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/RhynchophorusFerrugineus-PHT-Symptoms.html • http://medisys.newsbrief.eu/medisys/alertedition/en/XylellaFastidiosa-PHT-Symptoms.html
20
Combinations tree (Proximity 10) ExampleAffected crop AND Symptom AND Plant Part “walnut” AND “necrosis” AND “tree”ORAffected crop AND Vectors “lime” AND “asian citrus psyllid”
Results• Known threats
• MedISys categories using threat names as keywords very effective• Example Xylella fastidiosa:
• 5078 relevant news items selected from February 2015 to May 2016 (16 months) • However, they miss items not explicitly mentioning the threat
• Unknown threats• Manually defined categories by experts
• 80% items relevant• 10 items per day
• Categories generated automatically using symptoms, crops, vectors…• 60% items relevant • Just 7 per week
• A lot of noise, terms ambiguity• Added negative words to filter false positives but increased false negatives
• Anyway, just preliminary work (just 7 threats modelled)…
21
Future workBuild Disease-Symptom network like for human health?
22
Zho u, X., Menche, J., Barabási, A. L., & Sharma, A. (2014) Human symptoms–disease network. Nature communications, 5
Thank you very much for your attention
Questions?Roberto García
[email protected]://rhizomik.net/~roberto/
Top Related