Interactive Generation of Integrated Schemas

Post on 27-Jan-2016

24 views 0 download

Tags:

description

Interactive Generation of Integrated Schemas. Laura Chiticariu et al. Presented by: Meher Talat Shaikh. Objectives: Creation of unified schema based on a set of existing source schemas. provides standard representation of data that deals with heterogeneous sources. Applications: - PowerPoint PPT Presentation

Transcript of Interactive Generation of Integrated Schemas

Interactive Generation of Integrated Schemas

Laura Chiticariu et al.

Presented by: Meher Talat Shaikh

Objectives:

Creation of unified schema based on a set of existing source schemas.

provides standard representation of data that deals with heterogeneous sources.

Applications:

Unified schema provides a single access point against which queries are posed.

Consolidation of databases of merged organization.

Overview

convert each schema into a graph of concepts with Has-A relationships.

Identify matching concepts between different graphs.

For every pair of matching concept merge/separate the concepts.

Allow user to specify constraints on the merging process.

Result is adaptive and interactive enumeration method.

Overview

Correspondences

Signify “semantic equivalent” elements in two schemas.

Bidirectional.

Can be specified by the user or discovered by schema matching techniques.

The approach considers correspondences between attributes.

Correspondences

Schema Integration

The integrated target schema T is capable of representing all the attributes in the source schemas.

Every attribute in T must represent some attribute of the input source schemas.

The source schema data is transformed into T data via mapping ‘M’.

In short, all basic relationships in the source schemas are preserved in T.

Graph of Concepts

Each schema is converted into a logical view (graph of concepts).

Each concept is a relation name with an associated set of attributes.

Concepts in a schema may have references to other concepts and these references are captured by Has-A edges.

A concept graph is a pair (V; HasA).

Concepts

Graph of Concepts

A concept graph is a pair (V; HasA).

Matching of Concepts

Use the correspondences to match the concepts.

FORMAL DEFINITION: Let S1 and S2 be two source schemas and let C be a set of correspondences between attributes of S1 and S2. Let A be a concept of S1 and B be a concept of S2. We say that A and B match if there is at least one attribute a in A and one attribute b in B such that there is a correspondence at the schema level between attribute a and attribute b.

Result is a matching graph.

Matching Concepts

Matching graph is represented by G=(V, HasA, E)

x0 to x7 are the matching edge Matching edges will be the candidates for merging

x0..x7 are the matching edges

Matching Graph

Merging of Concepts

Input assignment X.

Merges all the concepts according to input assignment .

Takes union of attributes.

HasA relationship exists as per the source concept graphs.

Integrated Concept Graph

Redundancy constraints

Similarly, using redundancy constrains loops can be eliminated.

Removal of Redundancy

Merging of Concepts

Maintains an integration function fx for the given assignment X.

Specifies how each individual concept, attribute and HasA edge in a source concept graph relates to integrated concept graph.

Mapping

Input: integrated concept graph G1, matching graph G and the integration function fx.

Output: mapping M between source and integrated concepts.

For every source concept C a mapping MC is created in M, which specifies how an instance of C, together with all its relationships, is to be transformed into an instance of an integrated concept C1.

Adaptive Enumeration

Does not enumerate all integrated schemas.

Target schemas are output one by one and the user is allowed to browse through the schemas.

Enumeration constraints: The user can express additional constraints on how concepts should be merged.

Apply(x), (NOT)Apply(x), Merge(A1…An), (NOT)Merge(A1..An).

eg. (NOT) Merge(org, location, emp, phone, fund).

Experimental Results

Experimental Results

Experimental Results

Strengths And WeaknessSimple model: includes HasA edges as the basic form of

relationships, and a simpler form of Contains.

Graphs of concepts can express most of the essential features that appear in schemas or in conceptual models.

The input is just a set of atomic correspondences.

Weaknesses:

Do not resolve type and representation conflicts.

No weight or probabilities involved into matching between concepts.

Conclusion

Provides a systematic and effective way of enumerating multiple integrated schemas.

Allows user interaction to refine the schemas and generated a final integrated schema.

Thank you.