1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and...

42
1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check- in for project definitions

Transcript of 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and...

Page 1: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

1

Peter Fox

Xinformatics 4400/6400

Week 10, April 9, 2013

Information management, workflow and discovery

/check-in for project definitions

Page 2: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Review of reading• Information Integration

– Social issues in information discovery and sharing– Information integration in geo-informatics – http://cseweb.ucsd.edu/~goguen/projs/data.html– http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839387/

• Information Life Cycle– MSDN Information Life Cycle– Information Life Cycle definition and context– http://www.computerworld.com/s/article/79885/The_new_buzzwords_Information_lifecycle_management– http://www.databasejournal.com/sqletc/article.php/3340301/Database-Archiving-A-Critical-Component-of-Information-

Lifecycle-Management.htm– http://en.wikipedia.org/wiki/Information_Lifecycle_Management– http://msdn.microsoft.com/en-us/library/bb288451.aspx

• Information Visualization– http://mastersofmedia.hum.uva.nl/2011/04/18/the-simple-ways-of-information-visualization/comment-page-1/– http://www.siggraph.org/education/materials/HyperVis/domik/folien.html– http://www.visual-literacy.org/periodic_table/periodic_table.html

• Information model development and visualization– http://www.acm.org/crossroads/xrds7-3/smeva.html

• Outside the current box– Peter Fox and James Hendler, 2011, Changing the Equation on Scientific Data Visualization, Science, Vol. 331 no. 6018

pp. 705-708, DOI: 10.1126/science.1197654 online at http://www.sciencemag.org/content/331/6018/705.full or see: http://escience.rpi.edu/publications/visualization/fox_hendler_science2011.html

2

Page 3: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Logical Collections• The primary goal of a Management system is to

abstract the physical collection into logical collections. The resulting view is a uniform homogeneous collection.

• Note the analogy with logical models and information integration: so EARLY ON

– Identifying naming conventions and organization– Aligning cataloguing and naming to facilitate search,

access, use (who uses?)– Provision of **contextual** information

3

Page 4: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Physical Handling• Map between physical and logical. • Where and who does it come from?– Is there a transfer into a physical form?– Is it backed-up, archived, cached? …– What formats?– Naming conventions – do they change?

• Note analogy to physical models

4

Page 5: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Interoperability Support

5

Page 6: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Security• Access authorization and change verification. This

is the basis of trusting your information.

6

Page 7: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Ownership• Who is responsible for quality and meaning

7

Page 8: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Metadata• Recall metadata are data about data.

• Metainformation?

8

Page 9: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Persistence• Deployment of mechanisms to counteract

technology obsolescence.

9

Page 10: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Discovery• Ability to identify useful relations and

information inside the collection

• More on this later in this class10

Page 11: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Dissemination

11

• Mechanisms to make aware the interested parties of changes and additions to the collections.

• Do you rely on information retrieval? The Web?

Page 12: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Summary of Information Management• Creation of logical collections

• Physical handling

• Interoperability support

• Security support

• Ownership

• Metadata collection, management and access.

• Persistence

• Knowledge and information discovery

• Dissemination and publication 12

Page 13: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Note for your project writeup!• Information management! Cover the 9 areas.

13

Page 14: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Information Workflow• What is a workflow?

• Why would you use it?

• Key considerations for information, cf. data

• Some pointers to workflow systems

14

Page 15: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

15

What is a workflow?• General definition: “series of tasks performed

to produce a final outcome” (taxes?)

• Information workflow – involves people but potentially want to– Automate jobs that a person traditionally

performed manually– Process large volumes of information faster than

one could do by hand

• NB difference from data workflows – it reaches out to encompass the user (e.g. ‘unrecorded actions’)

Page 16: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

16

Background: Business Workflows

• Example: planning a trip• Need to perform a series of tasks: book a flight,

reserve a hotel room, arrange for a rental car, etc.

• Each task may depend on outcome of previous task– Days you reserve the hotel depend on days of the

flight– If hotel has shuttle service, may not need to rent a

car

• Prior information, experience, preferences…

Page 17: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Tripit.com?

17

Page 18: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

18

What about information workflows?

• Perform a set of transformations/ operations on information source(s)

• Examples– Generating images from raw data– Identifying areas of interest from a large

information source (e.g. word cloud)– Classifying a set of objects– Querying a web service for more information

on a set of objects– Many others…

Page 19: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

19

More on Workflows

• Can process many information types:– Archives– Web pages– Streaming/ real time– Images – Semiotic systems

• Robust workflows depending on formal (concept and logical) models of the flow of information among components

• May be simple and linear or very complex

Page 20: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

20

Challenges • Questions:

– What are some challenges for users in implementing workflows?

– What are some challenges to executing these workflows?

– What are limitations of writing a program?

• Mastering a programming language

• Visualizing workflow

• Sharing/exchanging workflow

• Formatting issues

• Locating datasets, services, or functions

Page 21: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

21

Workflow Management Systems

Page 22: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

22

Benefits of Workflows

• Documentation of aspects of analysis

• Visual communication of analytical steps

• Ease of testing/debugging• Reproducibility• Reuse of part or all of workflow in

a different project

Page 23: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

23

Additional Benefits

• Integration of and between multiple computing environments

• ‘Automated’ access to distributed resources via other architectural components, e.g. web services and Grid technologies

• System functionality to assist

with information integration of

heterogeneous components and

source

Page 24: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Why not just use a script?• Script does not specify

low-level task scheduling and communication

• May be platform-dependent

• Can’t be easily reused• May not have sufficient

documentation to be adapted for another purpose

24

Page 25: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Why can a GUI be useful?• No need to learn a programming language

• Visual representation of what workflow does

• Allows you to monitor workflow execution

• Enables user interaction (though not necessarily collaboration)

• Facilitates sharing of workflows

25

Page 26: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Some workflow systems• Kepler• SCIRun• Sciflo• Triana• Taverna• Pegasus• Some commercial tools:

– Windows Workflow Foundation– Mac OS X Automator

• http://www.isi.edu/~gil/AAAI08TutorialSlides/5-Survey.pdf • http://www.isi.edu/~gil/AAAI08TutorialSlides/ • See reading for this week

26

Page 27: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Discovery• How does someone find your information?

• How would you provide discovery of – collections – files – ‘bits’

• How would you find ->

27

Page 28: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Discoveryo Search (Federated Search)oHelped by

oFolksonomies (user contributed)o Intelligent AgentsoSearch EnginesoTaxonomies

o Find photos of KimoBoy or girl?

28

Page 29: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Use cases• Find a sound recording of a swallow.

• Excuse me?

29

Page 30: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Use cases• Find a sound recording of an African Swallow

• Find a sound recording of a bird that sounds like an African Swallow

• Media types – how can you discover them?

30

Page 31: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Use cases• Find the movie that Jean Tripplehorn first

starred in/ that was her most successful/ was lead actress?

• Has anyone gene sequenced a mouse?

• Find images of primary productivity in the North Atlantic

• Discovery can often involve information integration (or is it *almost always*?)

31

Page 32: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

32

Three level ‘metadata’ solution for DATA

Level 1:

Data Registration at the Discovery Level,

e.g. Volcanolocation and activity

Level 2:

Data Registration at the Inventory Level,

e.g. list of datasets,times, products

Level 3:

Data Registration at the Item Detail

Level, e.g. access toindividual quantities

Ontology basedData IntegrationUsing scientific

workflows

Earth Sciences Virtual DatabaseA Data Warehouse where

Schema heterogeneity problem is Solved; schema based integration

Data Discovery Data Integration

A.K.Sinha, Virginia Tech, 2006

Page 33: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

33

Three level ‘metadata’ solution?

Level 1:

Registration at the Discovery Level,

e.g. Find the upperlevel entry point to a

source

Level 2:

Registration at the Inventory Level,

e.g. list of datasets,using the logical

organization

Level 3:

Registration at the Item Detail

Level, i.e. annotatione.g. tagging

Integrationusing mappingmanagement

Catalog/ IndexSchema based integration

Information Discovery

Information

Integration

A.K.Sinha, Virginia Tech, 2006

Page 34: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Information discovery• What makes discovery work?

– Metadata– Logical organization– Attention to the fact that someone would want to

discover it– It turns out that file types are a key enabler or

inhibitor to discovery– Result ranking using *tuned* algorithm

• What does not work?– Result ranking algorithms that depend on

unconventional information types (icon, index, symbol)

34

Page 35: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Federated search• “is the simultaneous search of multiple online

databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia

• Libraries have been doing this for a long time (Z39.50, ISO23950)

• Key is consistent search metadata fields (keywords)• E.g. Geospatial One Stop http://www.geodata.gov

35

Page 36: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Smart search• Semantically aware search, e.g.

http://noesis.itsc.uah.edu , http://eie.cos.gmu.edu (Water -> Semantic Search)

• Faceted search, e.g. mspace (http://mspace.fm ), exhibit (MIT), S2S (RPI; http://aquarius.tw.rpi.edu/s2s )

36

Page 37: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

NOESIS

37

Page 38: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Faceted search

38

logd.tw.rpi.edu

Page 39: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Summary - discovery• Useful to write a few discovery use cases to

drive how your design is developed

• Evolution of your role in facilitating discovery and what/ how others implement access to your information

39

Page 40: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Reading for this week• Is retrospective

40

Page 41: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

Check in for Project Assignment

• Analysis of existing information system content and architecture, critique, redesign and prototype redeployment

• Or a new use case, development, etc.

41

Page 42: 1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.

What is next

• April 16 – Information Audit

• April 23 –

• April 30 –

• May 6 – final project presentations42