1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information...

Post on 28-Dec-2015

216 views 0 download

Tags:

Transcript of 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information...

1

Peter Fox

Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01

Week 9/10, April 13, 2010

Information life-cycle and visualization and check-in for

project definitions

Contents• Review of last class, reading

• Information life-cycle

• Information visualization

• Checking in for project definitions

• Discussion of reading

• Next class

2

And yet only one part of the life cycle of data

Definitions• Life-cycle elements

– Acquisition: Process of recording or generating a concrete artefact from the concept (see transduction)

– Curation: The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future (http://www.dcc.ac.uk/FAQs/data-curator)

– Preservation: Process of retaining usability of data in some source form for intended and unintended use

– Stewardship: Process of maintaining integrity across acquisition, curation and preservation

4

Definitions ctd.• Management: Process of arranging for

discovery, access and use of data, information and all related elements. Also oversees or effects control of processes for acquisition, curation, preservation and stewardship. Involves fiscal and intellectual responsibility.

5

The nature of the challenge• To architect information systems today

– You may play many roles– You may not get all the metadata or information

you need even if you get the data– You will need skills that you were not taught

• To work with end-users today– You may have lots of technical experience– You will need new skills in addressing the

changing use of data and information– One ‘size’ does not fit all

6

Many views of the Information life-cycle

7

Acquisition• Learn / read what you can

about the developer of the means of acquisition– Documents may not be easy

to find– Remember bias!!!

• Document things as you go

• Have a checklist (the Management list) and review it often 8

Curation (partial)• Consider the organization and presentation of

the data

• Document what has been (and has not been) done

• Consider and address the provenance to date, you are now THE next person

• Be as technology-neutral as possible

• Look to add metainformation

9

Preservation• Usually refers to the full life cycle

• Archiving is a component

• Stewardship is the act of preservation

• Intent is that ‘you can open it any time in the future’ and that ‘it will be there’

• This involves steps that may not be conventionally thought of

• Think 10, 20, 50, 200 years…. looking historically gives some guide to future considerations 10

Remember• The life cycle applies within and before and

after your use case…

• So, let’s look in a little more detail

11

How the information is created

• Systemic

• Environmental

• Trial-and-error (or ad-hoc)

12

How the information is delivered?• One-to-many presentation• White paper• Web site FAQ• Web site informational• Web site directed (link sent with e-mail, and so

on) to a specific Web site• Application-based delivery via managed expert

system• One-to-one presentation:

– Word of mouth– Ad-hoc communication

13

How the information is managed• Complexity of the information

• Complexity of the creation process

• Complexity of the management system

• Financial impact of IP/IC creation

14

Type of information created• Tacit (created and stored informally):

– Human memory

– Local hard drive of the computer

– Expert system (moving tacit information into a formalized structure)

• Explicit (created and sorted formally):– Network share

– Network Web site/intranet

– Informal knowledge-management system

– Document-management system

– Formal KM system

• Value of the source

• Age of the information• Proximity of the information to the consumer• Source of the information, and previous interactions with that

specific source

15

Value of the source• Age of the information

• Proximity of the information to the consumer

• Source of the information, and previous interactions with that specific source

16

Mostly Technical Issues

• Data Preservationo Bit-level integrityo Data readability

• Documentation• Metadata• Semantics• Persistent Identifiers• Virtual Data Products• Lineage Persistence• Required ancillary data• Applicable standards

Mostly Non-Technical Issues• Policy (constrained by money…)

• Front end of the lifecycleo Long-term planning, data formats, documentation...

• Governance and policy• Legal requirements• Archive to archive transitions

• Money (intertwined with policy)• Cost-benefit trades• Long-term needs of programs • User input

o Identifying likely users• Levels of service• Funding source and mechanism

Life cycle is a complex issue• Must be managed

• Documented

• As part of the use case, but also outside it

19

Information Visualization• Questions to keep in mind

– What is the improvement in the understanding as compared to the situation without visualization?

– Which visualization techniques are suitable for one's data/ information?

20

Why visualization?• Reducing amount of data, quantization

• Patterns

• Features

• Events

• Trends

• Irregularities

• Exit points for analysis

• Leading to presentation of data

• Recall – cognitive science and the mental representation??!!??

21

Types of visualization• Color coding (including false color)

• Classification of techniques is based on– Dimensionality– Information being sought, i.e. purpose

• Line plots

• Contours

• Surface rendering techniques

• Volume rendering techniques

• Animation techniques

• Non-realistic, including ‘cartoon/ artist’ style22

Image (aka Raster) file formats• CGM, the Computer Graphics Metafile, has

been an ISO standard since 1987. It has the capability to encompass both graphical and image data.

• PostScript or more specifically Encapsulated PostScript Format (EPSF), is a page description language with sophisticated text facilities . For graphics, as compared to CGM, it tends to be expensive in terms of storage.

23

Image file formats• TIFF, the Tagged Image File Format,

encompasses a range of different formats, originally designed for interchange between electronic publishing packages.

• GIF, the Graphical Interchange Format , is quite widespread and can encode a number of separate images of different sizes and colors.

• PNG, the Portable Network Graphic format

24

Image file formats• RGB, the Red Green Blue format of Silicon

Graphics, is used by most visualization software packages as the internal image format. The format consist of a header containing the dimensions of the image, followed by the actual image data.

• The image data is stored as a 2D array of tuples. Each tuple is a vector with 3 components: R, G, and B. The RGB components determine the color of every pixel (picture element) in the image. 25

Image file formats• PPM, the Portable

Pixmap Format (24 bits per pixel), PGM, the Portable Greyscale Format (8 bits per pixel), and PBM, the Portable Bitmap Format (1 bit per pixel) formats are pixel based and are distributed with the the X-Window system (version 11.4).

26

Image file formats• XBM is the X-Window one Bit image file format,

which has been standardized by the MIT X-consortium.

• A major constraint on the use of images is the large data volume which has to be dealt with.

• Large sets of image data can have severe implications for storage, memory, and transmission costs.

• Therefore, compression techniques are very important.

• There are two categories based on whether or not it is possible to reconstruct the initial picture after compression.

27

Compression (any format)• Lossless compression methods are methods for

which the original, uncompressed data can be recovered exactly. Examples of this category are the Run Length Encoding, and the Lempel-Ziv Welch algorithm.

• Lossy methods - in contrast to lossless compression, the original data cannot be recovered exactly after a lossy compression of the data. An example of this category is the Color Cell Compression method.

• Lossy compression techniques can reach reduction rates of 0.9, whereas lossless compression techniques normally have a maximum reduction rate of 0.5.

28

Vector formats• Postscript

• PDF

• SVG

• ‘Shape files’

• CGM (also)

• …

29

Animation formats• Mpeg

• Avi

• Qt

• Wmv

• Animated GIF

30

Remember - metadata• Many of these formats already contain

metadata or fields for metadata, use them!

31

Tools• Conversion

– Imtools– GraphicConverter– Gnu convert– Many more

• Combination/Visualization– IDV– Gnuplot– http://disc.sci.gsfc.nasa.gov/giovanni

32

Visualization

34

Managing visualization products

• The importance of a ‘self-describing’ product

• Visualization products are not just consumed by people

• How many images, graphics files do you have on your computer for which the origin, purpose, use is still known?

• How are these logically organized?

35

Discovery of visualizations• When represented as images:

– Image-based type free text search?– Referred to in publications (articles, books, web

pages)

• Vector graphics:– Postscript or PDF– SVG– Others?

• What makes this easy or hard or impossible?

36

Discussion• About life-cycle in general?

• Visualization?

37

Reading for this week• Is retrospective

38

Check in for Project Assignment

• Analysis of existing information system content and architecture, critique, redesign and prototype redeployment

39

What is next• Week 11 – Information and Workflow

Management

• Week 12 – Information Discovery, Information Integration

40