Canonical Formats and Business Data

10
Canonical Formats and Business Data Applies to: SAP NetWeaver, SAP Business By Design, Business Process Expert Summary Business data is typically created within an organizations information system independent of data definition standards, common methodologies, or even a canonical format for understanding the semantics inherent in the purpose and use of the data. Emerging common methodologies using the UN/CEFACT Core Components Technical Specification open standards stack are providing a significant step forward in achieving business data interoperability. However, more is required to achieve true business network transformation. Specifically, both a canonical format for expressing that data in every aspect of its storage and use and a tool based collaborative environment that uses the canonical format to develop and manage specific contextualization of the data. Authors: Gunther Stuhec and Mark Crawford Company: SAP Created on: 30 December 2008 Author Bio Gunther Stuhec - Since his master's degree (MSC, 1993) Gunther Stuhec has worked with communications and EDI technologies. As a consultant in a software house for middleware and EDI systems he developed strategic concepts for customers and was responsible for various EDI projects. He joined SAP SI as a consultant in 1999, where he was responsible for implementing XML/EDI projects in conjunction with SAP systems. Since 2001 Mr. Stuhec works for SAP AG as a Standards Architect and has been involved in standardizing business standards on both semantic and syntax levels. He holds a number of patents related to semantic data technologies. Gunther is the past chair of the UN/CEFACT Techniques and Methodologies Group (TMG) that is responsible for the development and maintenance of overall methodologies for the development of collaborative business processes and business data on semantic oriented but technical syntax neutral level. He was also the chair of the UN/CEFACT project team that develops the CCTS standard. Furthermore, he is a member of various international and national standardization bodies, such as UN/CEFACT, ISO TC 154, and DIN. He is actively involved in developing standards and serves as an interface between these bodies and SAP, introducing SAP's requirements into their work and incorporating their latest findings into SAP's development activities. Mark Crawford joined SAP in October 2005. He is a standards architect in the ECO Standards Management and Strategy Group focusing on industry standards and methodologies. Prior to joining SAP, Mark was a Senior Research Fellow for a Washington D.C. government think tank where he specialized in XML, eBusiness standards, and Semantic Data Modeling. Before that he spent 23 years as a U.S. Naval Officer with extensive experience in Logistics, IT, Supply Chain, Procurement and Finance. Mark has been involved in both cross and vertical industry business standards, and the underlying methodology standards that support them. He is actively involved in UN/CEFACT standards activities as Chair of the Applied Technologies Group and Project Lead for the UN/CEFACT XML Naming and Design Rules specification, Chair of the Core Components Working Group, Editor for UN/CEFACT CCTS, Lead for the UN/CEFACT Core Components Harmonization Project and represents SAP on the Board of the Open Applications Group (OAGi). He previously was involved in the X12 Communications and Controls Subcommittee, Vice Chair of the OASIS Universal Business Language Technical Committee, Vice Chair of the X12 XML Working Group, and Chair of the joint X12/CEFACT Core Components initiative. SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 1

Transcript of Canonical Formats and Business Data

Page 1: Canonical Formats and Business Data

Canonical Formats and Business Data

Applies to: SAP NetWeaver, SAP Business By Design, Business Process Expert

Summary Business data is typically created within an organizations information system independent of data definition standards, common methodologies, or even a canonical format for understanding the semantics inherent in the purpose and use of the data. Emerging common methodologies using the UN/CEFACT Core Components Technical Specification open standards stack are providing a significant step forward in achieving business data interoperability. However, more is required to achieve true business network transformation. Specifically, both a canonical format for expressing that data in every aspect of its storage and use and a tool based collaborative environment that uses the canonical format to develop and manage specific contextualization of the data.

Authors: Gunther Stuhec and Mark Crawford

Company: SAP

Created on: 30 December 2008

Author Bio Gunther Stuhec - Since his master's degree (MSC, 1993) Gunther Stuhec has worked with communications and EDI technologies. As a consultant in a software house for middleware and EDI systems he developed strategic concepts for customers and was responsible for various EDI projects. He joined SAP SI as a consultant in 1999, where he was responsible for implementing XML/EDI projects in conjunction with SAP systems. Since 2001 Mr. Stuhec works for SAP AG as a Standards Architect and has been involved in standardizing business standards on both semantic and syntax levels. He holds a number of patents related to semantic data technologies.

Gunther is the past chair of the UN/CEFACT Techniques and Methodologies Group (TMG) that is responsible for the development and maintenance of overall methodologies for the development of collaborative business processes and business data on semantic oriented but technical syntax neutral level. He was also the chair of the UN/CEFACT project team that develops the CCTS standard. Furthermore, he is a member of various international and national standardization bodies, such as UN/CEFACT, ISO TC 154, and DIN. He is actively involved in developing standards and serves as an interface between these bodies and SAP, introducing SAP's requirements into their work and incorporating their latest findings into SAP's development activities.

Mark Crawford joined SAP in October 2005. He is a standards architect in the ECO Standards Management and Strategy Group focusing on industry standards and methodologies. Prior to joining SAP, Mark was a Senior Research Fellow for a Washington D.C. government think tank where he specialized in XML, eBusiness standards, and Semantic Data Modeling. Before that he spent 23 years as a U.S. Naval Officer with extensive experience in Logistics, IT, Supply Chain, Procurement and Finance.

Mark has been involved in both cross and vertical industry business standards, and the underlying methodology standards that support them. He is actively involved in UN/CEFACT standards activities

as Chair of the Applied Technologies Group and Project Lead for the UN/CEFACT XML Naming and Design Rules specification, Chair of the Core Components Working Group, Editor for UN/CEFACT CCTS, Lead for the UN/CEFACT Core Components Harmonization Project and represents SAP on the Board of the Open Applications Group (OAGi). He previously was involved in the X12 Communications and Controls Subcommittee, Vice Chair of the OASIS Universal Business Language Technical Committee, Vice Chair of the X12 XML Working Group, and Chair of the joint X12/CEFACT Core Components initiative.

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 1

Page 2: Canonical Formats and Business Data

Canonical Formats and Business Data

Table of Contents Introduction .........................................................................................................................................................3 Canonical Concepts............................................................................................................................................3 Current Situation.................................................................................................................................................3 Canonical Format Purpose.................................................................................................................................7 Development of Canonical Format .....................................................................................................................7 Summary.............................................................................................................................................................8 Related Content..................................................................................................................................................9 Copyright...........................................................................................................................................................10

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 2

Page 3: Canonical Formats and Business Data

Canonical Formats and Business Data

Introduction

Canonical – the simplest and most significant form possible without loss of generality

In Information Science, the concept of canonical goes by many different names – albeit the same core concept:

• Canonical Format

• Canonical Model

• Common Information Model (CIM)

• Common Object Model (COM)

• Intermediary Format

• Reference Format or Model

• Business Reference Ontology

• Enterprise Canonical Data Model

• Canonical Message Schema

For the purpose of providing a common term and concept used in this article, canonical format is described by the following statements:

"A Canonical Data Model defines message formats that are independent from any specific application so that all applications can communicate with each other in this common format. If the internal format of an application changes, only the message translator between the affected application and the common message channel has to change, while all other applications and message translators remain unaffected." - Enterprise Integration Patterns, Gregor Hohpe, Bobby Woolf

"A data model that represents the inherent structure of data without regard to either individual use or hardware or software implementation." - Vertaasis Inc.

Canonical Concepts A canonical format should be the reference and contain all of the information relevant to a specific business process. It should be devoid of context, and should be independent of syntax.

Current Situation Currently, the diversity and heterogeneity of business data interfaces is one of the key drivers of integration costs. The main reasons for this are:

• Manual negotiation, integration and mapping efforts required by integration experts

• Monolithic, fixed interfaces and direction flows

• Each interface (message type) has thousands, if not millions, potential combinations for expressing business semantics

• Interfaces that are project driven – defined with limited focus on synergy and reusability solutions

In other words, today’s interface structures are:

• based on the user's business system

• based on a selected data expression language, which could be

o A proprietary format, or

o Any one of the thousands of available B2B standards

• devoid of consideration for reuse

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 3

Page 4: Canonical Formats and Business Data

Canonical Formats and Business Data

The perception of current integration experts is that due to high costs, different representations, small reuse, non-flexibility and questionable profitability, B2B connections are best limited to a narrow range of integration projects between a few trading partners. Figure 1 demonstrates the integration issues of different formats in detail.

Figure 1: Representation of interfaces in peer-to-peer connections

The obstacles to achieving interoperability increase through the complexity of the interfaces themselves. Message developers typically try to include every possible aspect or use of a message, which creates bloat and confusion. For example, we fully analyzed a typical standards based, kitchen sink mentality Purchase Order message. Our analysis revealed:

• 10 hierarchy levels,

• 11,840 different elements and attributes, and

• 306,227 combinations for expressing the business semantics of an order

These various combinations and permutations are possible through both the order hierarchy as well as the additional hidden expression of semantics through ambiguous data elements such as "TypeCodes and "ClassificationCodes".

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 4

Page 5: Canonical Formats and Business Data

Canonical Formats and Business Data

There are two principle reasons for the incompatible message interfaces that are typically found in comparing B2B standards:

• The interfaces do not completely address the same identical (semantic based) information

• The comparing interfaces have different names, structure and complexity for representing the identical semantic information

Figure 2: Compliance of Message Interfaces

Figure 2 shows the ratio of compliance that typically exists between two separately developed message interfaces. Approximately 20% of the information is incompatible, which means this part of the message information addresses totally different context specific information requirements between two systems. For example, a purchase order for a car manufacturer does not consider the information that is specifically used for a similar purchasing event in the health care industry.

Although the 20% requires reaching consensus between the two trading partners on how to handle the data that is missing from one or the other, the good news is that a full 80% of the interfaces are easily aligned – the information requirements are identical (aligned to an implicit canonical data model) although they may have different semantics. The interesting point is how to determine that the information is identical despite the differing semantics. This requires the engagement of experts and analysts who are familiar with these message interfaces, and especially the methodologies that are used for defining the elements, types and structure.

Many factors led to the semantic differences. For example, synonyms, abbreviations, acronyms, numbers, language differences can result in different expressions for what is in reality a canonical data element. An example is in order. Let’s assume that "Surname" is the canonical data element. The concept of surname is typically expressed in many different ways:

• Synonyms - "FamilyName" or "LastName"

• Abbreviation - Fam, Famnam, fmlynme, lstnm, srnm

• Acronyms – FN, SN LN

• Identifier - 1_2 for FamilyName

• Different languages - "FamilyName" or "Nachname"

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 5

Page 6: Canonical Formats and Business Data

Canonical Formats and Business Data

Each of these expressions has the same meaning and in reality is the canonical data element of Surname. However, this is not easily recognized, nor readily machine processable. Example 1 considers the semantically same element names as well as message structure between a typical standards organization MessageType "PurchaseOrder" and diverse other standards development organization formats.

Example 1 – Comparison of Three Purchase Orders Against a Common Purchase Order

• RosettaNet Purchase Order Request (RosettaNet methodology)

o 10,21% of element names are identical

o 8,36% of structure is similar

• OASIS UBL Purchase Order (CCTS based)

o 15,36% of element names are identical

o 9,54% of structure is similar

• OAGi CreatePurchase Order (CCTS based)

o 14,21% of element names are identical

o 9,21% of structure is similar

Two of the message interfaces - UBL and OAGi - are mostly based on the CCTS common methodology, which is also used for the development of the analyzed message type. The CCTS common methodology is a kind of grammar – like the grammar of a natural language. It helps to increase the understanding of semantics in message interfaces and provides robust metadata about, and contextualization of, conceptual and resultant logical data models. But despite these advantages, the semantics themselves can be still expressed in many different ways by different users of the common methodologies, since there is no underlying canonical (conceptual) data model from which to derive the different standards.

The conclusion is that a common methodology is quite helpful, but does not increase the compliance significantly unless a canonical data element is used to create the context specific interfaces. In other words, despite the growing adoption of the CCTS common methodology, without a proper, unambiguous and common canonical format, there are still an exponentially growing number of integration points at the semantic level. Consider: "Only 5% of the interface integration (Web Services) is a function of the middleware choice. The remaining 95% is a function of integration of application semantics." (Gartner Group)

Figure 3: From Peer-to-Peer to Canonical Format

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 6

Page 7: Canonical Formats and Business Data

Canonical Formats and Business Data

Canonical Format Purpose As shown in Figure 3, by adopting a canonical format the number of connections will be reduced from exponentially increasing numbers of combinations expressed by 2 x 2 (N-1) to a linearly increasing number of possible combinations expressed by 2 x N (whereby N represents the number of systems). A commonly understandable and semantically unambiguous canonical format could be the single view of data for multi-enterprises, enterprise, division, or process. It could be independently used by any system or partner. This could be realized with a data transformation bridge and data abstraction layer between systems and partners or Integrations that map one service request to many service providers (or vice versa). As a result of the reduction of the number of connections, it is possible to reduce the overall number of transformation maps that must be generated.

Development of Canonical Format Many initiatives have seen the advantage of a canonical format and have already started with their own version:

• Software providers: SAP, Oracle - These companies are primarily using the open standard CCTS common methodology

• B2B Hubs: Liaison, Crossgate - These companies developed their own proprietary CCTS methodology

• Standardization initiatives: OAGi, UBL, UN/CEFACT, APACS, AIAG, SWIFT, UDEF1, GS.1 - Which are focused on CCTS methodology

• Governments: USA, Australia, Germany, Denmark – Using a combination of CCTS and other methodologies

• Companies: Siemens, Gasunie, US DoD - Using their own methodology or CCTS based methodology.

Figure 4: Isolated Development of Canonical Format

Each approach is a manual driven isolated development effort (Figure 4) being conducted by closed groups of experts practicing kitchen sink standards solutions. The result is that, even with the benefits of common methodologies which standardize the structures, expressions, metadata and use; we are still faced with unavoidable different representations of the core semantics (Figure 5).

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 7

Page 8: Canonical Formats and Business Data

Canonical Formats and Business Data

Figure 5: Different results of CCTS based message interface "Purchase Order"

As stated above, the involved integration experts consider their own interests and compromises and contribute to the development of standards based on their own knowledge and semantic preferences – regardless of the methodology being employed. Additionally, due to the lack of tooling around the common methodologies there continues to be an inefficient, paper based, and complicated knowledge transfer between the experts within the groups and user communities. Because of the tremendously diverse amount of requirements in the business domain, it is quite hard if not impossible to achieve convergence between the canonical formats being developed by the different communities. Manual harmonization and governance simply will not scale to support a canonical model meeting all business information requirements. The end result will continue to be just another common point-to-point approach, which has the 2 x 2(n-1) integration connection requirements with limited improvements in flexibility. What is needed is a widely adopted canonical format based on common methodologies, and a community based tool – such as the SAP Warp 10 community based data modeling, mapping and integration tool – that leverages them in a real time integration environment and automates the harmonization and governance processes.

Summary In summary, the common methodology standards – although a critically important step forward on the path to true interoperability – is still insufficient to achieve true business network transformation. As shown in Figure 5, the outcomes of three different message interfaces clearly show that the different complexity and diversity of structures still remains since the main issue - autonomous development of canonical (conceptual) data models and libraries in silos has not been solved. What is needed is to leverage the common methodology standards with robust real-time collaborative tooling and a community network for developing a single sourced canonical data model and the context specific logical models that are implemented from it

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 8

Page 9: Canonical Formats and Business Data

Canonical Formats and Business Data

Related Content

[1] Bianchin, Daniel – Introduction to Global Data Types in SAP NetWeaver Process Integration 7.1 – 1 September 2007

[2] Boeder, Jochen – ESA Architecture Series 2006: The Big Picture (Draft) – 1 February 2006

[3] Crawford, Mark; Stuhec, Gunther – Accelerate your Business Data Modeling and Integration Issues by CCTS Modeler Warp 10 – 17 November 2007

[4] Crawford, Mark; Stuhec, Gunther – Getting Started with ISO 11179 – 26 May 2006

[5] Crawford, Mark; Stuhec, Gunther – How to Solve the Business Standards Dilemma – The CCTS Standards Stack – 7 November 2006

[6] Crawford, Mark; Stuhec, Gunther – SAP Network Blog: CCTS Modeler Warp 10 - The Speed of Data Integration and B2B – 20 November 2007

[7] Fiedler, Thomas; Meinert Holger; Wiechers, Volker – ESA Architecture Series 2006: Services (Draft) – 1 February 2006

[8] Stuhec, Gunther – Enabling of Next Generation B2B by Web 3.0 – 30 September 2007

[9] Stuhec, Gunther – How to Solve the Business Standards Dilemma - CCTS Key Model Concepts – 3 March 2006

[10] Stuhec, Gunther – How to Solve the Business Standards Dilemma - The Context Driven Business Exchange – 1 December 2005

[11] Stuhec, Gunther – How to Solve the Business Standards Dilemma - The CCTS based Core Data Types – 20 September 2006

[12] Stuhec, Gunther – SAP Network Blog: UN Trade Data Element Dictionary based on CCTS – 3 March 2006

[13] Stuhec, Gunther – Using CCTS Modeler Warp 10 To Customize Business Information Interfaces – 20 November 2007

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 9

Page 10: Canonical Formats and Business Data

Canonical Formats and Business Data

SAP COMMUNITY NETWORK SDN - sdn.sap.com | BPX - bpx.sap.com | BOC - boc.sap.com © 2008 SAP AG 10

Copyright © 2008 SAP AG. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.

Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation.

IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries, zSeries, System i, System i5, System p, System p5, System x, System z, System z9, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, Informix, i5/OS, POWER, POWER5, POWER5+, OpenPower and PowerPC are trademarks or registered trademarks of IBM Corporation.

Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries.

Oracle is a registered trademark of Oracle Corporation.

UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.

Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc.

HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.

Java is a registered trademark of Sun Microsystems, Inc.

JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape.

MaxDB is a trademark of MySQL AB, Sweden.

SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.

These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

These materials are provided “as is” without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement.

SAP shall not be liable for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials.

SAP does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within these materials. SAP has no control over the information that you may access through the use of hot links contained in these materials and does not endorse your use of third party web pages nor provide any warranty whatsoever relating to third party web pages.

Any software coding and/or code lines/strings (“Code”) included in this documentation are only examples and are not intended to be used in a productive system environment. The Code is only intended better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, except if such damages were caused by SAP intentionally or grossly negligent.