Manvsmachinewithnotes

64
Man vs Machine Main theme, Web 2.0 is as much about machine consumable as human consumable data.

description

Web 2.0 is not only about making sites easier for people to interact with, but it is also about creating webs of data that machines can also interact with. These slides looks at a few examples of technologies that can help weave the data web, and shows some example applications, with a focus on science.

Transcript of Manvsmachinewithnotes

Page 1: Manvsmachinewithnotes

Manvs

Machine

Main theme, Web 2.0 is as much about machine consumable as human consumable data.

Page 2: Manvsmachinewithnotes

Web 2.0Google AdSenseFlickrBitTorrentNapsterWikipediabloggingupcoming.org and EVDBsearch engine optimizationcost per clickweb servicesparticipationwikistagging (folksonomy)syndication

Web 1.0

DoubleClick Ofoto Akamai mp3.com Britannica Online personal websites evite domain name speculation page views screen scraping publishingCMSdirectories (taxonomy) stickiness

The meme of Web 2.0 was influenced by comparing pre-dot com bubble companies and postdot com bubble companies.

What is the difference between the list on the left and the list on the right?

Let’s take the example of Brtiannica vs Wikipedia.

The information in Britannica is centrally controlled. It has a relatively small number of contributors.The workload per contributor is high.

Wikipedia is open to anyone to contribute. A collaboration of 1000’s can lead to a work of equal quality to a more centrally controlled method.

Britannica’s revenues decreased from 650M to 50M over a 10 year period!

The new sites make it easy to add information and use that information toanswer or solve problems for people.

Page 3: Manvsmachinewithnotes

easy

easy

easy

hard mining

cont

ribu

ting

Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information.

One of the thesis that we are following by trying to work in this contextis that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems.

Companies that find ways to do this should succeed.

Page 4: Manvsmachinewithnotes

easy

easy

easy

hard mining

cont

ribu

ting

semantic web

Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information.

One of the thesis that we are following by trying to work in this contextis that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems.

Companies that find ways to do this should succeed.

Page 5: Manvsmachinewithnotes

easy

easy

easy

hard mining

cont

ribu

ting

semantic web

plain text, emails

Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information.

One of the thesis that we are following by trying to work in this contextis that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems.

Companies that find ways to do this should succeed.

Page 6: Manvsmachinewithnotes

easy

easy

easy

hard mining

cont

ribu

ting

semantic web

plain text, emails hyperlinks

tagsviews

citations?

Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information.

One of the thesis that we are following by trying to work in this contextis that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems.

Companies that find ways to do this should succeed.

Page 7: Manvsmachinewithnotes

easy

easy

easy

hard mining

cont

ribu

ting

semantic web

plain text, emails

academic papers

hyperlinks

tagsviews

citations?

Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information.

One of the thesis that we are following by trying to work in this contextis that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems.

Companies that find ways to do this should succeed.

Page 8: Manvsmachinewithnotes

easy

easy

easy

hard mining

cont

ribu

ting

semantic web

plain text, emails

academic papers

MicroFormatsmicroformats

hyperlinks

tagsviews

citations?

Two key parts to Web 2.0 are easy addition of information into the system (user generated content), followed by ways of mining that information.

One of the thesis that we are following by trying to work in this contextis that by realizing the nature of the flow of information and the availability of ways of mining that information we can create useful solutions to real problems.

Companies that find ways to do this should succeed.

Page 9: Manvsmachinewithnotes

The Kind of Information that we can capture with Connotea is typical of many sites.For Connotea we have:- citation information- usage patterns, (when did an item get added to our DB, how many times has it been added)- user generated meta-data such as tags- Potentially social network information, how many of my friends have added this item?

Page 10: Manvsmachinewithnotes

The Kind of Information that we can capture with Connotea is typical of many sites.For Connotea we have:- citation information- usage patterns, (when did an item get added to our DB, how many times has it been added)- user generated meta-data such as tags- Potentially social network information, how many of my friends have added this item?

Page 11: Manvsmachinewithnotes

The Kind of Information that we can capture with Connotea is typical of many sites.For Connotea we have:- citation information- usage patterns, (when did an item get added to our DB, how many times has it been added)- user generated meta-data such as tags- Potentially social network information, how many of my friends have added this item?

Page 12: Manvsmachinewithnotes

The Kind of Information that we can capture with Connotea is typical of many sites.For Connotea we have:- citation information- usage patterns, (when did an item get added to our DB, how many times has it been added)- user generated meta-data such as tags- Potentially social network information, how many of my friends have added this item?

Page 13: Manvsmachinewithnotes

The Kind of Information that we can capture with Connotea is typical of many sites.For Connotea we have:- citation information- usage patterns, (when did an item get added to our DB, how many times has it been added)- user generated meta-data such as tags- Potentially social network information, how many of my friends have added this item?

Page 14: Manvsmachinewithnotes

del.icio.us

Gathering

Trusting

Integrating

Analyzing

Triangles

Many Web 2.0 sites, have created islands of data.Some key technologies for bridging these islands include fire eagle, OpenId and OAuth. - rfid, fire eagle point the way to merging these islands with the real world

Page 15: Manvsmachinewithnotes

• Gathering The data

• Trusting the data

• Integration / Disambiguating

• Understanding and analyzing the data

Whats the process?

Page 16: Manvsmachinewithnotes

DOI

Some key technologies for bridging these islands include fire eagle, OpenId and OAuth.In the publishing world DOIʼs are a key technology

Page 17: Manvsmachinewithnotes

Internet

Cf

Site or

ApplicationSiteInternet

OpenID cf OAuth

OpenID allows a single person to interact with multiple web sites using one log-in mechanisimOAuth allows both desktop and web applications to share data using one authentication mechanisim

Page 18: Manvsmachinewithnotes

Rated 5/5 Rated 1/5

Alien

FuturisticBlockbuster Alien

Time-Travel

WarSpace

Spacecraft

Artificial-Intelligence

Soldier

Redemption Android

BlockbusterBased-on-Novel

Based-on-Play

Famous-Score

Melodrama

Broken-Heart

Hero

LoveHope

Racism

Refugee

Once you merge the data, you have to understand it.

The tags that a person uses across different services can give you a more holistic picture of their interests

Page 19: Manvsmachinewithnotes

However tags can be ambiguous.

Some technologies that are addressing this a semantic web technologies, look at projects such asTagora http://www.tagora-project.eu/DBpedia http://dbpedia.org/SIOC http://sioc-project.org/FOAF http://www.foaf-project.org/

Page 20: Manvsmachinewithnotes

Open Science Web 2.0

Semantic Web

Though not exactly the same, web 2.0, Open science and the semantic web work well togetherand they share some common traits, namely sharing, openness and minability of information.

Page 21: Manvsmachinewithnotes

Growth in submissions to the arXiv, demonstrating growth in scientific outputcertainly growth in output of available data online in e-formatThere is some discussion about whether there is an information overload, as the main journalsare still the important ones, but reading habits have changed

Page 22: Manvsmachinewithnotes

Discussion Groups and Mailing lists contain a huge amount of information from from snippets of computer code, to long discussions about topics.

Mark Mail, from MarkLogic, have a site that mines this information. Here we see a comparison of a search for FORTRAN vs a search for Java.

At the moment these kinds of archives are mainly relevant in the computer science area, but these kinds of conversations are going on all the time in every field.

http://markmail.org/

Page 23: Manvsmachinewithnotes

Amazon use page views and a database of user purchases to find things you might like.

Again, here they are using data that they get for free from people using their site.

Google page rank is another canonical example

Page 24: Manvsmachinewithnotes

Crystal Eye

Social/Knowledge Networking

An example of two type of uses in science:

CrystalEye http://wwmm.ch.cam.ac.uk/crystaleye/example bond length for a structure: http://wwmm.ch.cam.ac.uk/crystaleye/bondlengths/H-Rb.svg

Nature Network: human-human interaction

Page 25: Manvsmachinewithnotes

Nature Web Publishing group

OTMI

The main products that we have developed so far are

- database gateways - OTMI (open text mining interface) - podcasts - scintilla - nature network - nature preceedings - connotea

Page 26: Manvsmachinewithnotes

There are also other tools out there that are doing the same kind of thing, but I’m partial.

Page 27: Manvsmachinewithnotes

There are also other tools out there that are doing the same kind of thing, but I’m partial.

Page 28: Manvsmachinewithnotes

There are also other tools out there that are doing the same kind of thing, but I’m partial.

Page 29: Manvsmachinewithnotes

There are also other tools out there that are doing the same kind of thing, but I’m partial.

Page 30: Manvsmachinewithnotes

There are also other tools out there that are doing the same kind of thing, but I’m partial.

Page 31: Manvsmachinewithnotes

There are also other tools out there that are doing the same kind of thing, but I’m partial.

Page 32: Manvsmachinewithnotes

Repository

Discuss how social silo’s can be interchange locations between repositoriesand also between repositories and applications that we might also be built on top of the social silos.

Page 33: Manvsmachinewithnotes

Repository

Discuss how social silo’s can be interchange locations between repositoriesand also between repositories and applications that we might also be built on top of the social silos.

Page 34: Manvsmachinewithnotes

Repository

Discuss how social silo’s can be interchange locations between repositoriesand also between repositories and applications that we might also be built on top of the social silos.

Page 35: Manvsmachinewithnotes

Repository

Discuss how social silo’s can be interchange locations between repositoriesand also between repositories and applications that we might also be built on top of the social silos.

Page 36: Manvsmachinewithnotes

Repository

Discuss how social silo’s can be interchange locations between repositoriesand also between repositories and applications that we might also be built on top of the social silos.

Page 37: Manvsmachinewithnotes

Repository

RepositoryRepositoryRepositoryRepository

Discuss how social silo’s can be interchange locations between repositoriesand also between repositories and applications that we might also be built on top of the social silos.

Page 38: Manvsmachinewithnotes

Repository

RepositoryRepositoryRepositoryRepository

Discuss how social silo’s can be interchange locations between repositoriesand also between repositories and applications that we might also be built on top of the social silos.

Page 39: Manvsmachinewithnotes

Activity Listing

Pubmed Integration

Citation Management

Repository

RepositoryRepositoryRepositoryRepository

Discuss how social silo’s can be interchange locations between repositoriesand also between repositories and applications that we might also be built on top of the social silos.

Page 40: Manvsmachinewithnotes

Connotea citation parsing modules

This model was quick and easy to implement but using the URL as the unique key.

Page 41: Manvsmachinewithnotes

Amazon.pm DOI.pm LivingReviews.pm PLoS.pm RIS.pm SpamDNSBL.pm autodiscovery.pmBibTeX.pm Dlib.pm NASA.pm PMC.pm Scitation.pm Springer.pm blog.pmBlackwell.pm Highwire.pm NPG.pm PNAS.pm Self.pm Wiley.pm ePrints.pmBmcPdf.pm Hubmed.pm OUP.pm Pubmed.pm Simple.pm arXiv.pm

We have a bunch of citation modules

they currently have to be written in perl, and this is a problem,there is nothing similar to the scaffold infrastructure that Zotero has

Page 42: Manvsmachinewithnotes
Page 43: Manvsmachinewithnotes
Page 44: Manvsmachinewithnotes

Title

Page 45: Manvsmachinewithnotes

Title

Page 46: Manvsmachinewithnotes

Title

Date

Page 47: Manvsmachinewithnotes

Title

Date

Page 48: Manvsmachinewithnotes

Title

DateAuthor

Page 49: Manvsmachinewithnotes

Title

DateAuthor

Page 50: Manvsmachinewithnotes

Title

DateAuthor

PMID/DOI

Page 51: Manvsmachinewithnotes

Getting data in, part 2

The meta-data from the paper has been captured

When you begin to add tags suggested tags are presented based ontags you have already used

paper by Huberman et all shows that displaying all tags drives tag-onomies to stable state (Polya-Renyi urn model)You need to display the full community tags, which we don’t do ... yet.

Page 52: Manvsmachinewithnotes

Getting data in, part 2

The meta-data from the paper has been captured

When you begin to add tags suggested tags are presented based ontags you have already used

paper by Huberman et all shows that displaying all tags drives tag-onomies to stable state (Polya-Renyi urn model)You need to display the full community tags, which we don’t do ... yet.

Page 53: Manvsmachinewithnotes

Getting data in, part 2

The meta-data from the paper has been captured

When you begin to add tags suggested tags are presented based ontags you have already used

paper by Huberman et all shows that displaying all tags drives tag-onomies to stable state (Polya-Renyi urn model)You need to display the full community tags, which we don’t do ... yet.

Page 54: Manvsmachinewithnotes

user home page,toolbox, on rightuser tagsrelated tagsrelated users, groups

Page 55: Manvsmachinewithnotes

user home page,toolbox, on rightuser tagsrelated tagsrelated users, groups

Page 56: Manvsmachinewithnotes

user home page,toolbox, on rightuser tagsrelated tagsrelated users, groups

Page 57: Manvsmachinewithnotes

Getitng data out

Open Data, important

Export only gets out the citation data, and not extra meta data that the userhas added such as comments or tags.

Formats: txt, rdf, BibTex,RIS,EndNote an api??

Page 58: Manvsmachinewithnotes

Getitng data out

Open Data, important

Export only gets out the citation data, and not extra meta data that the userhas added such as comments or tags.

Formats: txt, rdf, BibTex,RIS,EndNote an api??

Page 59: Manvsmachinewithnotes

perl

mod_perl

Template Toolkit

MySQL

Open Source, GPL2.5 v 1.8.1

web1.75 application

Discuss reasons for OS, discuss web1.8.1- hope for community involvement, - Code is not MVC structured, this has led to some problems with adoption- We do have some people running their own instances, with some feedback ,but we would like to eventually make the code easier to work with- Why not port it? That’s a big can of worms, and someone needs to convince me ofthe benefits.- If for some reason we choose to no longer support connotea then the data and the code could be hosted be someone else,- Someone asked me what do how do they know we don’t cheat, and preferentially return NPG articles in searches, well the code is open so if you are that paranoidyou can go and run an instance yourself and check up on us.

Page 60: Manvsmachinewithnotes

http://www.connotea.org/user/IanMulvany

http://www.connotea.org/users/tag/scifoo

http://www.connotea.org/user/IanMulvany/tag/scifoo

http://www.connotea.org/user/IanMulvany/tag/science2.0+citation

http://www.connotea.org/user/IanMulvany/tag/science

Example of calls to query the data, html output

Page 61: Manvsmachinewithnotes

http://www.connotea.org/data/user/IanMulvany

http://www.connotea.org/data/users/tag/scifoo

http://www.connotea.org/data/user/IanMulvany/tag/scifoo

http://www.connotea.org/data/user/IanMulvany/tag/science2.0+citation

http://www.connotea.org/data/user/IanMulvany/tag/science

Example of API calls(you don’t have to type them in green when making the call)

Page 62: Manvsmachinewithnotes

http://www.connotea.org/rss/user/IanMulvany

http://www.connotea.org/rss/users/tag/scifoo

http://www.connotea.org/rss/user/IanMulvany/tag/scifoo

http://www.connotea.org/rss/user/IanMulvany/tag/science2.0+citation

http://www.connotea.org/rss/user/IanMulvany/tag/science

Example of RSS calls(you don’t have to type them in green when making the call)

We create an rss feed of everything

Page 63: Manvsmachinewithnotes

0

100

200

300

400

500

600

Jan-0

5

Mar-05

May-05

Jul-0

5

Sep-05

Nov-05

Jan-0

6

Mar-06

May-06

Jul-0

6

Sep-06

Nov-06

Jan-0

7

Mar-07

May-07

Jul-0

7

Sep-07

Nov-07

Jan-0

8

Mar-08

Thousands

Entries in All Libraries

Bookmark Growth in Connotea

Growth in Connotea bookmarks

Page 64: Manvsmachinewithnotes

Mirko Gontek at the university of Colongeinformation visualization of links in connotea

These social links can create networks of information on top of the basic information.

This is what we want to use to start building collaborative intelligence into these systems.