Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e...

14
Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Future ative Commons License: owed to share & remix, must attribute & non-commercial

Transcript of Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e...

Page 1: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

Frank van HarmelenVrije Universiteit Amsterdam

The Information Universe of the (Near)

Future

Creative Commons License: allowed to share & remix,but must attribute & non-commercial

Page 2: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

Frank van HarmelenVrije Universiteit Amsterdam

The Information Universe of the (Near)

Future

Creative Commons License: allowed to share & remix,but must attribute & non-commercial

What it will look likeWhy it needs

infinite scalability

and how to achieve this

with the Large Knowledge

Collider

Page 3: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

The Current Information Universe

linked web-pages, written by people, written for people, used only by people...

Many of these pagesalready come from data,usable by computers!But we can’t link the data....

?

? ?

?

The Future Information Universe

?

linked data,usable by computers!useful for people!

Page 4: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

already many billions of facts & rules

How far away is this ?

Not very far away!

rapidly growing Linked Open Data cloud.

Encyclopedia

Encyclopedia

Geographic names (millio

ns)

Geographic names (millio

ns)

names of artis

ts & art works

(10.000’s)

names of artis

ts & art works

(10.000’s)

scientific bibliographies

scientific bibliographies

hierarchical dictio

naries

(UK, F

R, NL)

hierarchical dictio

naries

(UK, F

R, NL)

life-science databases

life-science databases

any CD ever recorded (a

lmost)

any CD ever recorded (a

lmost)

every book sold by Amazon

every book sold by Amazon

basic facts on every country

on the planet

basic facts on every country

on the planet

common sense rules & fa

cts (100.000’s)

common sense rules & fa

cts (100.000’s)

It gets bigger every month

Page 5: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

Full Web-style decoupling:re-usability, independence

• All identifiers are URL's (= on the Web)– Allows total decoupling of

• data• vocabulary • meta-data

x T

[<x> IsOfType <T>]

differentowners & locations

<person>

Page 6: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

For the first time ever, it is now possible:

to re-use somebody else's knowledge base

• without having to talk to them first (syntax, semantics)

• without having to make copies

Rapid growth: "billion triple challenge"(= machine-reason with a billion facts and rules)

• 2006: “where do we get a billion facts from?”

• 2008: “which billion shall we choose!”

Page 7: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

What to do when success is becoming a problem?

The Large Knowledge Collidera platform for infinitely scalable

reasoning on the data-web

Page 8: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

Infinite scalability?

parallelisation• cluster computing

distribution • “Thinking@home”, “self-computing semantic Web”

approximation • “almost” is often good enough• gets better with more resources

Page 9: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

First result: MaRVIN

Node

Reasoning Routing

InputPool OutputPool

Node

Node

Node

DataPreparation

ResultStorage

Node

Node

statistics & visualisationMaRVIN scales by:•distribution (over many nodes)•approximation (sound but incomplete)•anytime convergence (more complete over time)

brain the siz

e of a planet

Page 10: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

Use case: Drug Discovery • Problem: pharmaceutical R&D in early clinical

development is stagnating

(Q1Q2Q3)

FDA white paper Innovation or Stagnation (March 2004):

“developers have no choice but to use the tools of the last century to assess this century's candidate solutions.”

“industry scientists often lack cross-cutting information about anentire product area, or information about techniques that may be used in areas other than theirs”

FDA white paper Innovation or Stagnation (March 2004):

“developers have no choice but to use the tools of the last century to assess this century's candidate solutions.”

“industry scientists often lack cross-cutting information about anentire product area, or information about techniques that may be used in areas other than theirs”

“Show me any potential liver toxicity associated with the compound’s drug class, target, structure and disease.”

Show me all liver toxicity associated with the target or the pathway.

Genetics

1Q“Show me all liver toxicity associated with compounds with similar structure”

Chemistry

2Q

“Show me all liver toxicity from the public literature and internal reports that are related to the drug class, disease and patient population”LITERATURE

3Q

Current NCBI: linking but no inference

Page 11: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

• Where is the traffic moving • Is public transportation where people are • Which location attracts most people right now• Is public transportation where people will be

• Where is the traffic moving • Is public transportation where people are • Which location attracts most people right now• Is public transportation where people will be

Use Case: City on-line

• Our cities face many challenges • Urban Computing

is the ICT way to address them

Is public transportation where the people are?Is public transportation where the people are?

Which landmarks attract more people?Which landmarks attract more people?

Where are people concentrating?Where are people concentrating?

Where is traffic moving?Where is traffic moving?

improve the quality of life

Page 12: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

Is anybody doing this for real?

• OpenCalais:– enrich text (news items) with semantic meta-data– recognise people, places, events, organisations,...– useful for searching, selecting, personalising, aggregating,

summarising, etc

• From early ’09:– identify “people, places, events, organisations,...”

by linking to the Open Data cloud:

Page 13: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

SummarisingThe Information Universe of the Future will be a

Web of Data

• This Web of Data is rapidly taking shape

• There are compelling use-cases

• Industrial take-up is beginning to happen

• We are building new infrastructure to deal with required scale

Page 14: Frank van Harmelen Vrije Universiteit Amsterdam The Information Universe of the (Near) Futur e Creative Commons License: allowed to share & remix, but.

Contact Info

[email protected]://www.larkc.eu

Want to ask questions?Want to play with LarKC?Want to contribute plugins?Want to run a use-case?

Want to ask questions?Want to play with LarKC?Want to contribute plugins?Want to run a use-case?