Interactive Generation of Integrated Schemas

27
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh

description

Interactive Generation of Integrated Schemas. Laura Chiticariu et al. Presented by: Meher Talat Shaikh. Objectives: Creation of unified schema based on a set of existing source schemas. provides standard representation of data that deals with heterogeneous sources. Applications: - PowerPoint PPT Presentation

Transcript of Interactive Generation of Integrated Schemas

Page 1: Interactive Generation of Integrated Schemas

Interactive Generation of Integrated Schemas

Laura Chiticariu et al.

Presented by: Meher Talat Shaikh

Page 2: Interactive Generation of Integrated Schemas

Objectives:

Creation of unified schema based on a set of existing source schemas.

provides standard representation of data that deals with heterogeneous sources.

Applications:

Unified schema provides a single access point against which queries are posed.

Consolidation of databases of merged organization.

Page 3: Interactive Generation of Integrated Schemas

Overview

convert each schema into a graph of concepts with Has-A relationships.

Identify matching concepts between different graphs.

For every pair of matching concept merge/separate the concepts.

Allow user to specify constraints on the merging process.

Result is adaptive and interactive enumeration method.

Page 4: Interactive Generation of Integrated Schemas

Overview

Page 5: Interactive Generation of Integrated Schemas

Correspondences

Signify “semantic equivalent” elements in two schemas.

Bidirectional.

Can be specified by the user or discovered by schema matching techniques.

The approach considers correspondences between attributes.

Page 6: Interactive Generation of Integrated Schemas

Correspondences

Page 7: Interactive Generation of Integrated Schemas

Schema Integration

The integrated target schema T is capable of representing all the attributes in the source schemas.

Every attribute in T must represent some attribute of the input source schemas.

The source schema data is transformed into T data via mapping ‘M’.

In short, all basic relationships in the source schemas are preserved in T.

Page 8: Interactive Generation of Integrated Schemas

Graph of Concepts

Each schema is converted into a logical view (graph of concepts).

Each concept is a relation name with an associated set of attributes.

Concepts in a schema may have references to other concepts and these references are captured by Has-A edges.

A concept graph is a pair (V; HasA).

Page 9: Interactive Generation of Integrated Schemas

Concepts

Page 10: Interactive Generation of Integrated Schemas

Graph of Concepts

A concept graph is a pair (V; HasA).

Page 11: Interactive Generation of Integrated Schemas

Matching of Concepts

Use the correspondences to match the concepts.

FORMAL DEFINITION: Let S1 and S2 be two source schemas and let C be a set of correspondences between attributes of S1 and S2. Let A be a concept of S1 and B be a concept of S2. We say that A and B match if there is at least one attribute a in A and one attribute b in B such that there is a correspondence at the schema level between attribute a and attribute b.

Result is a matching graph.

Page 12: Interactive Generation of Integrated Schemas

Matching Concepts

Page 13: Interactive Generation of Integrated Schemas

Matching graph is represented by G=(V, HasA, E)

x0 to x7 are the matching edge Matching edges will be the candidates for merging

x0..x7 are the matching edges

Matching Graph

Page 14: Interactive Generation of Integrated Schemas

Merging of Concepts

Input assignment X.

Merges all the concepts according to input assignment .

Takes union of attributes.

HasA relationship exists as per the source concept graphs.

Page 15: Interactive Generation of Integrated Schemas

Integrated Concept Graph

Page 16: Interactive Generation of Integrated Schemas

Redundancy constraints

Similarly, using redundancy constrains loops can be eliminated.

Page 17: Interactive Generation of Integrated Schemas

Removal of Redundancy

Page 18: Interactive Generation of Integrated Schemas

Merging of Concepts

Maintains an integration function fx for the given assignment X.

Specifies how each individual concept, attribute and HasA edge in a source concept graph relates to integrated concept graph.

Page 19: Interactive Generation of Integrated Schemas

Mapping

Input: integrated concept graph G1, matching graph G and the integration function fx.

Output: mapping M between source and integrated concepts.

For every source concept C a mapping MC is created in M, which specifies how an instance of C, together with all its relationships, is to be transformed into an instance of an integrated concept C1.

Page 20: Interactive Generation of Integrated Schemas

Adaptive Enumeration

Does not enumerate all integrated schemas.

Target schemas are output one by one and the user is allowed to browse through the schemas.

Enumeration constraints: The user can express additional constraints on how concepts should be merged.

Apply(x), (NOT)Apply(x), Merge(A1…An), (NOT)Merge(A1..An).

eg. (NOT) Merge(org, location, emp, phone, fund).

Page 21: Interactive Generation of Integrated Schemas

Experimental Results

Page 22: Interactive Generation of Integrated Schemas

Experimental Results

Page 23: Interactive Generation of Integrated Schemas

Experimental Results

Page 24: Interactive Generation of Integrated Schemas

Strengths And WeaknessSimple model: includes HasA edges as the basic form of

relationships, and a simpler form of Contains.

Graphs of concepts can express most of the essential features that appear in schemas or in conceptual models.

The input is just a set of atomic correspondences.

Weaknesses:

Do not resolve type and representation conflicts.

No weight or probabilities involved into matching between concepts.

Page 25: Interactive Generation of Integrated Schemas
Page 26: Interactive Generation of Integrated Schemas

Conclusion

Provides a systematic and effective way of enumerating multiple integrated schemas.

Allows user interaction to refine the schemas and generated a final integrated schema.

Page 27: Interactive Generation of Integrated Schemas

Thank you.