Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

82
1

description

Amit Sheth, "Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data,"WSU & AFRL Window-on-Science Seminar on Data Mining, August 05, 2009.http://wiki.knoesis.org/index.php/Seminar_on_Data_Mining#Semantics_empowered_Understanding.2C_Analysis_and_Mining_of_Nontraditional_and_Unstructured_Data

Transcript of Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Page 1: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

1

Page 2: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

WSU & AFRL Window-on-Science Seminar on Data Mining

Amit P. Sheth,LexisNexis Ohio Eminent Scholar

Director, Kno.e.sis center, Wright State Universityknoesis.org

Thanks: K. Gomadam, M. Nagarajan, C. Thomas, C. Henson, C. Ramakrishnan, P. Jain and Kno.e.sis Researchers

Page 3: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Data & Knowledge Ecosystem

3

Data Mining

Knowledge Discovery

Understanding & Perception

IntegrationSearch

Analysis (eg Patterns)

Browsing

Insight

Situational Awareness

Decision Support

Transactional DataObservational Data

Multimedia Data

Experimental Data

Textual Data: Scientific Literature, Web Pages, News, Blogs, Reports, Wiki, Forums, Comments, Tweets

Structured,SemistructuredUnstructuredData

Page 4: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Some examples of R&D we have done

• Semantic Search & Ranking of Stories and Reports – connecting the dots applications (insider threat, financial risk analysis)

• Mining of biomedical (scientific) literature (extraction of entities and relationships) – discovering hidden public knowledge

• Semantic Integration, Analysis and Decision Support over Sensor Data

• Extracting taxonomy/domain model from Wikipedia• Discovering Hidden Relationships (insights) in

Community Created Content (Wikipedia)

4

Page 5: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

• Understanding User Generated Content (on Social Networking Sites)*– What are people talking about– How people write– Why people write

With application to - Artist Popularity Ranking- Advertisement on Social Media- Identifying Social Signals – spatio-temporal-thematic analysis of

Citizen Sensor Data

5* Meena Nagarajan

Page 6: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

TextMultimedia Content

and Web data

Metadata Extraction

Patterns / Inference / Reasoning

Domain Models

Meta data / Semantic Annotations

Relationship Web

SearchIntegrationAnalysisDiscoveryQuestion AnsweringSituational Awareness

Sensor Data

RDB

Structured and Semi-structured data

Page 7: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Insider threat demo (semantic search/querying, ranking, …)

7

Page 8: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Knowledge Discovery from Scientific Literature

Cartic Ramakrishnan

Page 9: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

9

What Knowledge Discovery is NOT

•Search– Keyword-in-document-out – Keywords are fully specified

features of expected outcome

– Searching for prospective mining sites

•Mining – Know where to look– Underspecified

characteristics of what is sought are available

– Patterns

Cartic Ramakrishnan

Page 10: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

10

What is knowledge discovery?

• “knowledge discovery is more like sifting through a warehouse filled with small gears, levers, etc., none of which is particularly valuable by itself. After appropriate assembly, however, a Rolex watch emerges from the disparate parts.” – James Caruther

• “discovery is often described as more opportunistic search in a less well-defined space, leading to a psychological element of surprise” – James Buchanan

• Opportunistic search over an ill-defined space leading to surprising but useful emergent knowledge

Cartic Ramakrishnan

Page 11: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

11

Element of surprise – Swanson’s discoveries

MagnesiumMigraine

PubMed

?Stress

Spreading Cortical Depression

Calcium Channel Blockers

Swanson’s Discoveries

Associations Discovered based on keyword searches followed by manually analysis of text to establish possible relevant relationships

11 possible associations found

Page 12: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

12

Knowledge Discovery over text

Text

Extraction of Semantics from text

Semantic Metadata Guided

Knowledge Explorations

Assigning interpretation to text

Semantic Metadata Guided

Knowledge Discovery

Triple-basedSemantic

Search

Semanticbrowser

Subgraphdiscovery

Semantic metadata in the form ofsemi-structured data

Cartic Ramakrishnan

Page 13: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

13

Information Extraction via Ontology assisted text mining – Relationship extraction

Biologically active substance

LipidDisease or Syndrome

affects

causes

affectscauses

complicates

Fish Oils Raynaud’s Disease???????

instance_of instance_of

UMLS Semantic Network

MeSH

PubMed9284 documents

4733 documents

5 documents

Cartic Ramakrishnan

Page 14: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

14

Background knowledge and Data used

• UMLS – A high level schema of the biomedical domain– 136 classes and 49 relationships– Synonyms of all relationship – using variant lookup (tools from

NLM)– 49 relationship + their synonyms = ~350 verbs

• MeSH – 22,000+ topics organized as a forest of 16 trees– Used to query PubMed

• PubMed – Over 16 million abstract– Abstracts annotated with one or more MeSH terms

Page 15: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

15

Method – Parse Sentences in PubMed

SS-Tagger (University of Tokyo)

SS-Parser (University of Tokyo)

(TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) )

• Entities (MeSH terms) in sentences occur in modified forms• “adenomatous” modifies “hyperplasia”• “An excessive endogenous or exogenous stimulation” modifies

“estrogen”• Entities can also occur as composites of 2 or more other entities

• “adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium”

Cartic Ramakrishnan

Page 16: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

18

Preliminary Results

• Swanson’s discoveries – Associations between Migraine and Magnesium [Hearst99]

• stress is associated with migraines • stress can lead to loss of magnesium • calcium channel blockers prevent some migraines • magnesium is a natural calcium channel blocker • spreading cortical depression (SCD) is implicated in some migraines • high levels of magnesium inhibit SCD • migraine patients have high platelet aggregability • magnesium can suppress platelet aggregability

•Data sets generated using these entities (marked red above) as boolean keyword queries against pubmed

•Bidirectional breadth-first search used to find paths in resulting RDF

Page 17: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

19

Paths between Migraine and Magnesium

Paths are considered interesting if they have one or more named relationshipOther than hasPart or hasModifiers in them

Cartic Ramakrishnan

Page 18: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

20

An example of such a path

platelet(D001792)

collagen(D003094)

migraine(D008881)

magnesium(D008274)

me_3142by_a_primary_abnormality_of_platelet_behavior

me_2286_13%_and_17%_adp_and_collagen_induced_platelet_aggregation

caused_by

hasPart

hasPart

stimulated

stimulatedhasPart

CONCLUSION Rules over parse trees are able to extract structure from

sentences

Our definition of compound and modified entities are critical for identifying both implicit and explicit relationships

Swanson’s discovery can be automated – if recall can be improved – what hurts recall?

Page 19: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Unsupervised Joint Extraction of Compound Entities and Relationship

Cartic Ramakrishnan, Pablo N. Mendes, Shaojun Wang and Amit P. Sheth "Unsupervised Discovery of Compound Entities for Relationship Extraction"EKAW 2008 - 16th International Conference on Knowledge Engineering and Knowledge Management Knowledge Patterns

Page 20: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

22

Joint Extraction approach

•Dependency parse – Stanford Parser

governor

dependent

amod = adjectival modifiernsubjpass = nominal subject in passive voice

Page 21: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

23

Algorithm

Relationship head

Subject head

Object head Object head

Cartic Ramakrishnan

Page 22: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

24

Preliminary results

Cartic Ramakrishnan

Page 23: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

25

Extracted Triples

Page 24: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Semantic Metadata Guided Knowledge Explorations and Discovery

Page 25: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

27

Results

Cartic Ramakrishnan

Page 26: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

28

Hypothesis Driven retrieval of Scientific Literature

PubMed

Complex Query

SupportingDocument setsretrieved

Migraine

Stress

Patient

affects

isaMagnesium

Calcium Channel Blockers

inhibit

Keyword query: Migraine[MH] + Magnesium[MH]

Page 27: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

29

Applications

• Triple-based semantic search• Semantic Browser

Page 28: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

30

Knowledge Discovery = Extraction + Heuristic Aggregation

Leonardo Da Vinci

The Da Vinci code

The Louvre

Victor Hugo

The Vitruvian man

Santa Maria delle Grazie

Et in Arcadia EgoHoly Blood, Holy Grail

Harry Potter

The Last Supper

Nicolas Poussin

Priory of Sion

The Hunchback of Notre Dame

The Mona Lisa

Nicolas Flammel

painted_by

painted_by

painted_by

painted_by

member_of

member_of

member_of

written_by

mentioned_in

mentioned_in

displayed_at

displayed_at

cryptic_motto_of

displayed_at

mentioned_in

mentioned_in

Undiscovered Public Knowledge

Page 29: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Understanding, Analyzing, Mining

Social Media

Meena Nagarajan, Karthik Gomadam

Page 30: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

mumbai, india

Page 31: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

november 26, 2008

Page 32: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

another chapter in the war against civilization

Page 33: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

and

Page 34: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data
Page 35: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data
Page 36: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

the world saw it

Through the eyes of the people

Page 37: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

the world read itThrough the words of the people

Page 38: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

PEOPLE told their stories to PEOPLE

Page 39: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

A powerful new era in Information dissemination had

taken firm ground

Page 40: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Making it possible for us to

create a global network of citizens

Citizen Sensors – Citizens observing, processing,

transmitting, reporting

Page 41: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Image Metadatalatitude: 18° 54′ 59.46″ N, longitude: 72° 49′ 39.65″ E

Geocoder (Reverse Geo-coding)

Address to location database

18 Hormusji Street, Colaba

Nariman House

Identify and extract information from tweetsSpatio-Temporal Analysis

Structured Meta Extraction

Income Tax Office

Vasant Vihar

Page 42: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Research Challenge #1

• Spatio Temporal and Thematic analysis– What else happened “near” this event

location?– What events occurred “before” and

“after” this event?– Any message about “causes” for this

event?

Page 43: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Spatial Analysis….Which tweets originated from an

address near 18.916517°N 72.827682°E?

Page 44: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Which tweets originated during Nov 27th 2008,from 11PM to 12 PM

Page 45: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Giving us

Tweets originated from an address near 18.916517°N, 72.827682°E during time interval 27th Nov 2008 between 11PM to 12PM?

Page 46: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Research Challenge #2:Understanding and Analyzing Casual Text

• Casual text– Microblogs are often written in SMS

style language– Slangs, abbreviations

Page 47: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Understanding Casual Text

• Not the same as news articles or scientific literature– Grammatical errors

• Implications on NL parser results

– Inconsistent writing style• Implications on learning algorithms that

generalize from corpus

Page 48: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Nature of Microblogs

• Additional constraint of limited context– Max. of x chars in a microblog– Context often provided by the discourse

• Entity identification and disambiguation

• Pre-requisite to other sophisticated information analytics

Page 49: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

NL understanding is hard to begin with..

• Not so hard– “commando raid appears to be nigh at

Oberoi now”• Oberoi = Oberoi Hotel, Nigh = high

• Challenging– new wing, live fire @ taj 2nd floor on

iDesi TV stream• Fire on the second floor of the Taj hotel, not

on iDesi TV

Page 50: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Social Context surrounding content

• Social context in which a message appears is also an added valuable resource

• Post 1: – “Hareemane House hostages said by

eyewitnesses to be Jews. 7 Gunshots heard by reporters at Taj”

• Follow up post– that is Nariman House, not (Hareemane)

Page 51: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Understanding content … informal text

• I say: “Your music is wicked”

• What I really mean: “Your music is good”

54

Page 52: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Structured text (biomedical literature)

Multimedia Content and Web

data

Web Services

Semantic Metadata: Smile is a TrackLil transliterates to Lilly Allen

Lilly Allen is an Artist

Informal Text (Social Network

chatter)

Your smile rocks Lil

Urban Dictionary

MusicBrainz Taxonomy

Artist: Lilly AllenTrack: Smile

Sentiment expression: Rocks Transliterates to: cool, good

Page 53: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Example: Pulse of a Community

• Imagine millions of such informal opinions– Individual expressions to mass opinions

• “Popular artists” lists from MySpace comments

Lilly Allen

Lady Sovereign

Amy Winehouse

Gorillaz

Coldplay

Placebo

Sting

Kean

Joss Stone

Page 54: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

What Drives the Spatio-Temporal-Thematic Analysis and Casual Text

Understanding

Semantics with the help of

1. Domain Models2. Domain Models3. Domain Models

(ontologies, folksonomies)

Page 55: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Domain Knowledge: A key driver

• Places that are nearby ‘Nariman house’– Spatial query

• Messages originated around this place– Temporal analysis

• Messages about related events / places– Thematic analysis

Page 56: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Research Challenge #3But Where does the Domain Knowledge come from?

• Expert and committee based ontology creation … works in some domains (e.g., biomedicine, health care,…)

• Community driven knowledge extraction – How to create models that are “socially

scalable”?– How to organically grow and maintain

this model?

Page 57: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Building models…seed word to hierarchy creation using WIKIPEDIA

Seed Query

BWikipedia

Fulltext Concept Search

Wikigraph-Based expansion

Graph Search

Graph Search

Graph Search

Hierarchy Creation

Query: “cognition”

Page 58: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Identifying relationships: Hard, harder than many hard things

But NOT that Hard, When WE do it

Page 59: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Games with a purpose

• Get humans to give their solitaire time – Solve real hard computational problems– Image tagging, Identifying part of an

image– Tag a tune, Squigl, Verbosity, and

Matchin– Pioneered by Luis Von Ahn

Page 60: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

OntoLablr

• Relationship Identification Game

•leads to•causes

Explosion Traffic congestion

Page 61: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

• How do you get comprehensive situational awareness by merging “human sensing” and “machine sensing”?

64

Page 62: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Research Challenge #4: Semantic Sensor Web

Page 63: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Semantically Annotated O&M

<swe:component name="time"><swe:Time definition="urn:ogc:def:phenomenon:time" uom="urn:ogc:def:unit:date-time">

<sa:swe rdfa:about="?time" rdfa:instanceof="time:Instant"><sa:sml rdfa:property="xs:date-time"/>

</sa:swe></swe:Time>

</swe:component><swe:component name="measured_air_temperature">

<swe:Quantity definition="urn:ogc:def:phenomenon:temperature“ uom="urn:ogc:def:unit:fahrenheit"><sa:swe rdfa:about="?measured_air_temperature“

rdfa:instanceof=“senso:TemperatureObservation"><sa:swe rdfa:property="weather:fahrenheit"/><sa:swe rdfa:rel="senso:occurred_when" resource="?time"/><sa:swe rdfa:rel="senso:observed_by" resource="senso:buckeye_sensor"/>

</sa:sml></swe:Quantity>

</swe:component>

<swe:value name=“weather-data">2008-03-08T05:00:00,29.1

</swe:value>

Page 64: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Semantic Sensor ML – Adding Ontological Metadata

67

Person

Company

Coordinates

Coordinate System

Time Units

Timezone

SpatialOntology

DomainOntology

TemporalOntology

Mike Botts, "SensorML and Sensor Web Enablement," Earth System Science Center, UAB Huntsville

Page 65: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

68

Semantic Query• Semantic Temporal Query

• Model-references from SML to OWL-Time ontology concepts provides the ability to perform semantic temporal queries

• Supported semantic query operators include:– contains: user-specified interval falls wholly within a sensor reading interval

(also called inside)– within: sensor reading interval falls wholly within the user-specified interval

(inverse of contains or inside)– overlaps: user-specified interval overlaps the sensor reading interval

• Example SPARQL query defining the temporal operator ‘within’

Page 66: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Kno.e.sis’ Semantic Sensor Web

69

Page 67: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Semantic Sensor Web demo (online)

Semantic Sensor Web demo (local)

70

Page 68: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Synthetic but realistic scenario

• an image taken from a raw satellite feed

71

Page 69: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

• an image taken by a camera phone with an associated label, “explosion.”

Synthetic but realistic scenario

72

Page 70: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

• Textual messages (such as tweets) using STT analysis

Synthetic but realistic scenario

73

Page 71: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

• Correlating to get

Synthetic but realistic scenario

Page 72: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Create better views (smart mashups)

Page 73: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Extracting Social Signals

• what are the important topics of discussions and concerns in different parts of the world on a particular day

• how different cultures or countries are reacting to the same event or situation (eg Mumbai Attack)

• how a situation such as financial crisis is evolving over a period of time in terms of key topics of discussion and issues of concern (eg subprime mortgages and foreclosures, followed by troubled banks and credit freeze, followed by massive government intervention and borrowing, and so on).

Twitris Demo

76

Page 74: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

A few more things

• Use of background knowledge• Event extraction from text

– time and location extraction • Such information may not be present• Someone from Washington DC can tweet about

Mumbai

• Scalable semantic analytics– Subgraph and pattern discovery

• Meaningful subgraphs like relevant and interesting paths

• Ranking paths

Page 75: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

The Sum of the Parts

Spatio-Temporal analysis– Find out where and when

+ Thematic – What and how

+ Semantic Extraction from text, multimedia and sensor data- tags, time, location, concepts, events

+ Semantic models & background knowledge– Making better sense of STT– Integration

+ Semantic Sensor Web– The platform

= Situational Awareness

Page 76: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

KNO.E.SIS as a case study of world class research based higher education environment

http://knoesis.org

79

Page 77: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Kno.e.sis Center Labs (3rd Floor, Joshi)

Amit Sheth•Semantic Science Lab•Semantic Web Lab•Service Research Lab

TK Prasad•Metadata and Languages Lab

Shaojun Wang•Statistical Machine Learning

Pascal Hitzler•Formal Semantics & Reasoning lab

Michael Raymer•Bioinformatics Lab

Guozhu Dong•Data Mining Lab

Keke Chen•Data Intensive Analysis and Computing Lab

Page 78: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

KNO.E.SIS MEMBERS – A SUBSET

Page 79: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Exceptional students

• Six of the senior PhD students: 84 papers, 43 program committees, contributed to winning NIH and NSF grants.

• Successfully competed with two Stanford PhDs, 1000+ citations in 2 years of his graduation.

• “BTW, Meena is an absolute find.  If all of your other students are as talented, you are very lucky.  …  I’d definitely like to work with more interns of her caliber, ... ”[Dr. Kevin Haas, Director of Search at Yahoo!]

• “It has been a few years since I visited Dayton (Wright AFB). However, it is clear that Wright State has transformed itself. Congratulations on your success with the Knoesis Center.” [Dr. Alpers Caglayan –

looking to hire Kno.e.sis grads]

Page 80: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Funding, Collaboration, etc

• UGA, Stanford, CCHMC, SAIC, HP, IBM, Yahoo!

• NIH, NSF, AFRL-HE, AFRL-Sensor, HP, IBM, Microsoft, Google

• 70% Federal, 19% State, 11% Industry

• Students intern at the bestIndustry labs & national labs

• Graduates very successful

83

Page 81: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Interested in more background?

• Semantics-Empowered Social Computing• Semantic Sensor Web • Traveling the Semantic Web through Space, Theme

and Time • Relationship Web: Blazing Semantic Trails between

Web Resources • Text Mining, Workflow Management, Semantic

Web Services, Cloud Computing with application to healthcare, biomedicine, defense/intelligence, energy

Contact/more details: amit @ knoesis.org

Special thanks: Karthik Gomadam, Meena Nagarajan, Christopher Thomas

Partial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and DAGSI (Semantic Sensor Web), Microsoft Research and IBM Research (Analysis of Social Media Content),and HP Research (Knowledge Extraction from Community-Generated Content).

Page 82: Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data