Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental...

8
Usability and Integration H. V. Jagadish

Transcript of Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental...

Page 1: Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental measurements Public databases Some data may have.

Usability and Integration

H. V. Jagadish

Page 2: Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental measurements Public databases Some data may have.

Many Sources of Data

• Text• XML/semi-structured• Experimental measurements• Public databases

• Some data may have time/space variation

• Need to make sense of this big mess

Page 3: Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental measurements Public databases Some data may have.

Find Patterns in Data

• Conventional data mining seeks patterns that can be mathematically specified over (usually) global extents.

• Typically assume simple data structure.

• Need new approaches to find patterns in messy data.

Page 4: Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental measurements Public databases Some data may have.

Human in the Loop

• Hard for a machine to tell an interesting pattern apart from one that is not.

• Problem exacerbated when we seek smaller/localized patterns, or work with large vocabularies of possible patterns.

• Need human in the loop to make this judgment.

Page 5: Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental measurements Public databases Some data may have.

Computer-Assisted (Human) Analytics

• Patterns found by human and not by computer.

• Job of computer is to make patterns easy to find.

• So computer system must effectively support queries and display results.

• Eg.Visual Analytics

Page 6: Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental measurements Public databases Some data may have.

Organize Data for Analysis

• Join multiple complex temporal data streams into a “windowed” model suitable for efficient analysis. [Manish Singh]

• Permit organic change to schema as information needs evolve. [Eric Qian]

• Provide a spreadsheet interface for direct manipulation of complex and large data. Choose small sets of representatives effectively. [Ben Liu]

Page 7: Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental measurements Public databases Some data may have.

Access Data for Analysis

• Under-specified queries, particularly keyword queries. Derive “qunit” as response unit, mined from observed query logs. [Arnab Nandi]

• Visual manipulation algebra for analyzing large time-varying graphs with data on nodes and edges. [Anna Shaverdian]

Page 8: Usability and Integration H. V. Jagadish. Many Sources of Data Text XML/semi-structured Experimental measurements Public databases Some data may have.

Scientific Data Analysis

• Explain analysis results in terms of source data, even when the source may have been updated since. [Jing Zhang]

• Analyze gene expression microarray data, and electronic health record data, in light of known biomedical knowledge. [Fernando Farfan]