Accentuate the Positive: Modeling Enterprise Ontologies

14
Accentuate the Positive: Modeling Enterprise Ontologies Prepared by: Christine Connors, Principal June 2010 TriviumRLG LLC T +1.910.874.8486 [email protected] www.triviumrlg.com Twitter/Skype TriviumRLG TriviumRLG LLC

Transcript of Accentuate the Positive: Modeling Enterprise Ontologies

Page 1: Accentuate the Positive: Modeling Enterprise Ontologies

Accentuate the Positive:Modeling Enterprise Ontologies

Prepared by: Christine Connors, Principal

June 2010

TriviumRLG LLC T +1.910.874.8486 [email protected] www.triviumrlg.com Twitter/Skype TriviumRLG

TriviumRLG LLC

Page 2: Accentuate the Positive: Modeling Enterprise Ontologies

The prospect of building an enterprise ontology is daunting for many of the people who are the most vocal supporters of knowledge sharing in an organization. It certainly is when the project's goals are to model all institutional knowledge, even if done a department at a time. Agreeing to meanings and labels, defining de-tailed relationships and restrictions and then properly encoding them is, frankly, hard in large groups. Nearly every project has a stakeholder or two that want the model to represent their view of the world, with demand for a superior position, forgetting that what they are building is a multi-dimensional graph, not a hierarchy. Keeping it simple and scoped (the KISS principle) and keeping it positive (just the facts) in early phases will make all the difference in the success of downstream applications.

Getting startedAs with any project, an understanding of the key team members, end-users and problems to be solved will be critical for success.

The team

Who generates excitement around the project? The subject matter experts? The search or content man-agement or R&D teams? Executive leadership? The end-users? Each type of person will influence the pro-ject in unique ways, and the strength of that bias must be accounted for. Who does the work? What skills are helpful?

Subject matter experts, for the purposes of this discussion, are either experts in classification - taxonomists, ontologists, librarians, or subject area experts (engineering, bio-med, human resources, supply chain, et al.) The resulting ontology is significantly better when both types are engaged. Classification experts cannot be expected to have a deep understanding of all the subject areas they will be asked to help model; subject area experts will not necessarily have enough understanding of the various applications for which the model is being built, and how the model could be optimized for a project. Classification experts may be more formal than is needed, but will most certainly help make sound modeling decisions that will provide a more solid and stable foundation for future growth. Subject area experts can provide precise definitions and insight into modeling the sometimes nuanced relationships among concepts. They can however become caught in a narrow world-view that does not fully acknowledge the integration of other disciplines and schools of thought. Frequently in such cases it is useful to provide these team members with a model to modify, rather than a blank sheet, as they can focus their energies on adapting the model to their needs rather than fight among themselves (as much).

A major hiccup in teams building concept hierarchies is getting stuck in the notion that the model is a simple hierarchy - a tree. That is no longer the case. If you find the team fighting about “I want my concept at the

Accentuate the Positive: Modeling Enterprise Ontologies 1

TriviumRLG LLC

Page 3: Accentuate the Positive: Modeling Enterprise Ontologies

top, it’s more important than yours!” find a tactful way of reminding them that the model is not 2-dimensional. Use a child’s building blocks, show them images of a crystal lattice or the steel frame of a building under construction. Do not waste time on the fight, it is moot.

Project teams have passion around solving their particular problems - be it management of intellectual prop-erty, improving findability, or detailed analysis of research data. Prioritizing the granularity of detail to truly solve these problems, rather than glossing them over will make for great solutions. The catch comes in mak-ing sure that these individual solutions can be integrated in the future with other solutions or an enterprise scale model. Whenever possible it is best to create even a basic enterprise model before digging deep into a particular function. This will ensure that no concept is modeled in so fine a manner that it cannot be used in pan-enterprise applications.

Executive leadership is a great place to find support for any project, but it is more difficult to find sponsorship at this level for projects that have a challenge showing hard dollar savings. Keep your source of funding and authority happy by providing solutions that balance their functional responsibilities with benefits to the entire organization. Examples of “soft dollar savings” include efficiency and productivity gains, business and com-petitive intelligence, employee satisfaction, risk mitigation, customer delight and loyalty. “Hard dollar savings” include reduced content creation costs (e.g. don’t pay an agency a second time to create the same content because you can’t find it), reduced server costs (e.g. minimize redundancy and thereby server space re-quired). Revenue generation include aggregated data and statistics based on linked data usage.

End-users are an interesting bunch. They don’t always know how to ask for what they want. It may be that enabling them, in a secure information environment, to semantically tag to their hearts content may be enough. RDFa, microformats and semantically-enabled tools, such as wikis, are available - provide the tools and train interested staff in the techniques available. Or perhaps they know a little more - give them space to install some open-source tools and try their hands at building a graphstore or a personal/group ontology. More often than not in the enterprise folks simply want to know the definition of a term, abbreviation, acro-nym or equation. How does it impact their work, communications, analysis? What are the “official” details? They do not want to waste time asking around - they want a stable place to just get the answer and include it in their presentation, cost-benefit analysis or meeting. This is a large benefit of an enterprise ontology.

What kind of user-interface is required? This will impact modeling decisions at the enterprise level. Will the ontology be used to generate field labels in a print or screen-based form? Are preferred and alternate labels required for concepts for use by different groups? Should the concepts be made publicly available on a web-site? Do the URIs need to be intelligible to humans? (Humans that don’t waste memory space on memoriz-ing delimiters and position indications that is!)? User-requirements for ontologies are important.

If you don’t have a base of excited supporters, but can work at an individual or group level, keep these best practices in mind to minimize re-work at a later date.

The scopeWhat problems do you believe an enterprise ontology will solve? Consistent definitions and use of terms? More nuanced analytical capabilities? Strategic forecasting? More granular use and tracking of intellectual property? All of the above, and more?

Accentuate the Positive: Modeling Enterprise Ontologies 2

TriviumRLG LLC

Page 4: Accentuate the Positive: Modeling Enterprise Ontologies

Understanding the problem can help you understand when to STOP building, or how best to PHASE the build. If you simply need a data dictionary, you can stop after creating a simple hierarchy and annotating the concepts with their labels, definitions, synonyms and perhaps notes on style or examples of usage. If you need to ask questions of your data, then you will want to model more complex relationships among the classes and instances in your ontology.

Far too few projects begin with asking “what questions are we trying to answer.” Many enterprise data pro-jects consider a class of objects - customer data, for example - throw some interesting factoids about cus-tomers on the wall - their age, location, income, buying habits - and add those that stick to the data ware-house. “We’ll figure out reports later.” NO! Figure out reports first. Then you’ll know what data to collect.

Start where you have the most pain, solving the need for the project at hand. Don’t feel like you have to build it ALL at once - build what you need. One of the wonderful aspects of this technique is that you can modify it later with less disruption than current popular methods. Trying to tackle all of the RDF or OWL standards in one go is more often than not an exercise in frustration. Learn and apply the nuances of each capability as you discover a use case. If you have the bandwidth (human resources, consultants or time), start with the basics at an enterprise level.

First StepsDecide if you will start with a schema or a vocabulary. If you are working on a specific system, with a scoped purpose and known questions, start with the schema - the form that collects the data that provide the inputs that when combined give you answers. If you want to harmonize subject tags across several systems, start with a vocabulary.

Recommendations

If you are not well-versed in software purpose-built for metadata/taxonomies/ontologies you may find it less frustrating to work on paper at the start. Index cards can be useful - place each concept alone on a card with it’s definition and source information. You may want to use different colored cards for different kinds of data. Then you can move things around more easily, group “like” concepts and visualize the model more easily than in a spreadsheet or text document.

Detailed books and articles have been written around the development of taxonomies, thesauri, ontologies and the like. Consult them! Principles of critical import to linked data projects in the enterprise will be called out here in the interest of encouraging these kinds of projects but are by no means the only methods and constructs available.

Starting with a schemaDon’t overcomplicate the schema and wonder where to start - you use these conventions every day. Do you recall when you started your job? All of those forms HR had you fill out? Those forms are a simple kind of metadata schema: Name, Address, Phone, Contacts, Bank etc. A type of data is asked for, frequently a

Accentuate the Positive: Modeling Enterprise Ontologies 3

TriviumRLG LLC

Page 5: Accentuate the Positive: Modeling Enterprise Ontologies

prescribed format is indicated, in some fields a selection is made from a list and blanks are left for values to be entered.

Let’s say you want to be able to identify subject matter expertise in your company so that project teams can more quickly identify the right people to solve problems that arise. What do you need to know about these folks? Their areas of expertise, how to get in contact with them, where they are, their name. A simple form could look like this:

Name:Location:Department:Email:Phone:Degree:Expertise:Special Interests:

These elements are easy enough to fill out and simple for a fielded search, with known datatypes - name, location, department, degree, expertise and special interests are text fields; phone is numerical with pattern xxx-xxx-xxxx, email requires an ‘@’ sign. The expected values are understood and clean data can be pro-moted by limiting the entries with controlled vocabularies. Vocabularies that are sourced in other systems. Linked data.

The Name field can be populated by a lookup in the LDAP for example. Once the correct record is found, the location, department, email and phone fields can be populated as well. Make better use of the data in the LDAP rather than re-creating it. Such directory systems are an excellent hub for enterprise data around people.

Degree could be found in an LDAP perhaps, or an HR system. The same could be said of Expertise and Special Interests. But what if they are not?

Creating a vocabulary for an elementCreating a vocabulary for department should be a task completed simply by enumerating the organization’s divisions:

Business DevelopmentEngineeringFinanceHuman ResourcesInformation TechnologyLegalManufacturingMarketingSupply Chain

Accentuate the Positive: Modeling Enterprise Ontologies 4

TriviumRLG LLC

Page 6: Accentuate the Positive: Modeling Enterprise Ontologies

How could we create a vocabulary to serve as the values allowed in the Degree field? Look to job descrip-tions created by HR or ask for an anonymized extract of just that information from an applicant database for inputs as to the concepts that need to be represented. That could be a very long list to select from! We can make it more manageable by breaking it into to parts. In this case, “Degree” is actually asking for two differ-ent types of information - the achievement or mastery level, and the subject area. We start by enumerating the terms in the vocabulary.

Degree Earned High School Diploma Certificate Associates Degree Bachelor’s Degree Bachelor of Science Bachelor of Arts Bachelor of Fine Arts Master’s Degree Master of Science Master of Arts Master of Fine Arts Doctoral Degree Doctor of Philosophy Doctor of Medicine

Subject Area Human Resources Talent Management Benefits Information Systems Information Assurance Database Administration Application Development Engineering Chemical Engineering Civil Engineering Electrical Engineering Mechanical Engineering Structural Engineering

The field entry for Degree can be created by combining a selection from each vocabulary together. We can do this in an ontology which allows for the creation of a new class from the intersection of other classes.

The benefit of this approach for our expertise system is in search, where only individuals that belong to the class represented by the intersection of the desired degree level and subject area are retrieved. The benefit of this approach for the enterprise is that each discrete class of information (degree earned and subject area) is available for use in other applications without being tied to the other. An example might be for employee

Accentuate the Positive: Modeling Enterprise Ontologies 5

TriviumRLG LLC

Page 7: Accentuate the Positive: Modeling Enterprise Ontologies

awards - an award may be given for achievement in a subject area without regard for the degree, or, an award may be given to those who have earned a doctoral degree in any subject area.

Starting with a vocabularyEnterprise ontologies offer few opportunities for “build vs. buy” decisions; vocabulary development is one of them. Some decisions are easier than others; for example, no other organization should be defining your company’s products, nor should any other organization be maintaining your authority files of people. Too many project teams hold fast to the belief that their organization is unique in regards to core functions how-ever, and try to re-create the wheel regarding human resources, legal, and engineering disciplines. Even if you do not need to use all of the concepts in a pre-built vocabulary, cutting out entire classes or levels of specificity, it can give you a significant boost in terms of concept definition, labeling and relationship map-pings. Acquiring and customizing a vocabulary from an authoritative source is a best practice, not just from a resource management perspective, but also from a linked data perspective - it allows you the ability to con-nect to external sources of data in the future if desired or needed. Consider the MeSH (Medical Subject Headings) vocabulary: it is used by the National Library of Medicine (U.S.), the MEDLINE/PubMED database and a number of academic and corporate projects. These projects can share data among them if desired because they use the same concepts to represent information.

For example, identify the most commonly used terms in your enterprise vernacular and define them. Why would you do this? To standardize information about people, products, organizations and processes. The benefit is consistent use across enterprise applications and clear sharing of data within and among work-flows that are standard operating procedure. Where can you find these terms? You might be surprised at how much you already have; here are some examples of places to look:

• Search engine logs - examine the phrases used to search for data indexed on your intranet. More impor-tantly, look at the resources that were retrieved for a search. Does your definition match the expectations of the users?

• Rules-based classification, auto-categorization or entity extraction tools - some enterprise search sys-tems include these capabilities. While they cannot fully realize the subtle definitions in your data, they are excellent for pattern recognition and teasing out concepts the humans may not realize are significant.

• Social bookmarking tools - has your organization organized an internal bookmarking tool which allows tagging? Folk-tags are an excellent, user-friendly source of concept labels.

• Blogs, wikis, forums - has your organization made these tools available, and are user-generated tags al-lowed on posts?

• Database schema - are controlled term lists used? What classes have been created? If you have a data dictionary available - use it!

• Document repositories - how have the folders been set up?• Digital asset management / content management systems - what type of information is required by the

check-in form? Are there controlled term lists? This type of metadata schema is easily understood by most knowledge workers and can minimize confusion.

• Corporate library or information center - if you have one, it is very likely that a taxonomy or thesaurus aligned to industry needs is available. Check into the rules for its licensing and use, and build of off it if possible.

Accentuate the Positive: Modeling Enterprise Ontologies 6

TriviumRLG LLC

Page 8: Accentuate the Positive: Modeling Enterprise Ontologies

• Publicly available classification systems and ontologies - public and academic libraries, government or-ganizations, universities and professional associations are among the reputable publishers of data mod-els, controlled vocabularies and metadata schema.

Extracting, sorting and discarding duplicate and redundant near-duplicate terms from these sources will pro-vide the team with a the first rough draft. This first draft should be a fraction of the size of the inputs. Each term should represent the broadest possible definition

Making a taxonomy more usefulThe term ‘taxonomy’ has come to encompass a wide variety of linked data constructs than its original use defined. To many librarians and long-time vocabulary managers a taxonomy is a hierarchical representation of concepts. The relationships between terms are limited to Broader Than (BT) and Narrower Than) and visu-alizations look much the same as a computer’s file folders are displayed. The rule of thumb for determining if one thing is higher or lower in the hierarchy than another is to consider if Object B is a “kind of,” “part of” or “one of” Object A. If it is, then:

Object A Object B

If it is not, then:

Object AObject B

Linked data applications expose the power of the predicate - the relationships between and among con-cepts provide much greater value than the simple awareness of their existence. Defining those relationships more accurately than BT/NT gives applications built on linked data sets more precision and analytical power. It is not enough that a human could easily interpret the meaning behind a BT/NT relationship, the machine must be able to do so as well. This construct follows the hierarchical rule of thumb, but is it useful?

Company Consumer Software Commercial Software Consulting Support John Doe Jane Lee Human Resources Legal Information Technology

Not very useful. Each of those concepts is a part of Company, but what kind of thing are they? More useful is this construct which indicates the kind of relationship between ‘Company’ and its components:

Company

Accentuate the Positive: Modeling Enterprise Ontologies 7

TriviumRLG LLC

Page 9: Accentuate the Positive: Modeling Enterprise Ontologies

Product { Consumer Software Commercial Software Consulting Support} Employee { John Doe Jane Lee } Department { Human Resources Legal Information Technology }

Another useful exercise in making taxonomies more powerful is to consider synonyms. A single concept may be referred to in several ways. In the above example we could indicate that the concept labeled “Human Re-sources” may also be labeled “HR,” the concept behind “Information Technology” could also be labeled “IT.” Indicating all of the variants, and distinguishing a preferred label, will aid in accurate and comprehensive re-trieval of information, as well as guide users towards conformance to an organizational standard. Concepts from one silo of information can be mapped to others via their synonyms, allowing for legacy systems to indi-cate that “Clock #” contains the same unique identifier as “PeopleSoft ID” and the same as “Employee ID” for example.

ModelsOur last example indicates one good use case for the “Hub-and-Spoke” model. Concepts can be mapped to their local lexical variants to aide in data interoperability. In this particular example, the concept for a unique employee identification string can be represented in 3 ways. If the three are mapped together in a core re-pository, each contributing application can continue to use the term its users are accustomed to while still providing analytical and application capabilities across repositories.

Employee IDClock#

Peo-pleSoft ID

Accentuate the Positive: Modeling Enterprise Ontologies 8

TriviumRLG LLC

Page 10: Accentuate the Positive: Modeling Enterprise Ontologies

Another example of the Hub-and-Spoke methodology is for translating concept labels in different languages.

Each diagram is a slice of an overall data model. Each circle represents a separate system or data model that can be linked back to a central model.

Another model uses layers, or stacking, to achieve precision and cohesion. The federal model of government as generally practiced in the United States is an example - the federal government mandates regulations which individual states can make more restrictive, but not less restrictive. An enterprise ontology can follow the same idea: the enterprise defines a model consisting of the lowest common denominators of terms; each consuming local ontology can annotate it to make its usage more precise for their application.

Conceptin

English

Con-ceptin

Con-ceptin

Con-ceptin

Con-ceptin

Por-

Con-ceptin

Con-ceptin

Con-ceptin

Con-ceptin

Accentuate the Positive: Modeling Enterprise Ontologies 9

TriviumRLG LLC

Page 11: Accentuate the Positive: Modeling Enterprise Ontologies

This is a good practice when various user groups cannot agree on specific definitions for terms. Find a common ground that everyone can align to, and meets enterprise regulations. Allow each group to add rela-tionships to other concepts or literal data entries as needed for their project in separate ontologies that do not need to be consumed by all other groups. Each layer inherits properties from above, but not necessarily across the faces of this inverted pyramid.

“Audience” is an example of a concept that would likely have different meaning to different groups within an organization. Let’s say you are checking a piece of content in to a digital asset management system, and one of the form fields asks you to indicate the intended audience. “Audience” is a well known concept: the con-sumers of a piece of content or performance. There are many kinds of audiences: employees, customers, news organizations, regulatory bodies, partners, competitors; these can be made even more granular. At an enterprise level, “audience” should be given a definition and a persistent location. All of the “kinds of” audi-ence that can be agreed upon at an enterprise level should also be given a definition and persistent location. Those that cannot be defined across business divisions should be defined in a local ontology, but mapped back to the parent “audience” in the enterprise ontology.

Why might it be useful to separate core terms from project specific annotations, properties, relationships and business rules? The impacts of various models can be judged individually rather than altogether. An example might be import/export regulations. While they may be critical to legal and regulatory compliance for your company, embedding them in an enterprise-wide ontology will prevent a business development organization from analyzing potential changes to the regulations without changing the model used by everyone. Impacts from proposed new rules need to be planned for; changes that an organization may like to lobby for can be analyzed thoroughly before committing resources to the effort. Predictive analysis and strategic forecasting become more detailed.

Using linked data principles and layered ontologies, strategic analysis can remain focused on core organiza-tion data assets by only linking datasets that are “behind the firewall” or link to sources outside the firewall for

Broadest Term Definitions

Localized Usage Annotation

Application Restriction

AnalyticalRules

Accentuate the Positive: Modeling Enterprise Ontologies 10

TriviumRLG LLC

Page 12: Accentuate the Positive: Modeling Enterprise Ontologies

a more open view, choosing to link to specific partner datasets for greater but known data, or accepting that there are unknowns in larger worldwide linked datasets.

GovernanceOne of the most accurate predictors of a model’s long-term success is whether resources have been allo-cated for its ongoing governance and maintenance. Many enterprise taxonomies have been created, em-bedded in content management and search systems and checked “Done.” Technologies change, business goals change, common phrases and acronyms change and are added. Keeping up with these changes is imperative. Ontologies can suffer the same fate: millions of wasted dollars and more frustration with “knowl-edge” projects. Avoid waste by establishing a governance council or editorial board.

The role of a governance council

The primary job of this team is to make decisions regarding additions, changes and deletions to core data models. Balancing group needs with enterprise impacts is their primary concern. Another benefit to a layered model is that this council does not need to say “no” to every request, they can say “no, this can’t go in the core model, but you may create it in a local ontology.” This mitigates some measure of frustration.

Examining each change for its impact can be supported by software that the council will be responsible for selecting and maintaining. Functional requirements and user requirements (interaction and experience) should be part of the mandate for the council.

Another task is to keep up to date with modeling standards, and ideally to contribute to the development of those standards by actively participating on development bodies or contributing use case scenarios to the committees.

The members of the team should include an executive level champion who understands the value of enter-prise knowledge organization systems. An expert in knowledge organization schemes should be appointed as the “Editor-in-Chief” of the enterprise ontology; this should be a full-time role in medium-to-large organiza-tions. Working members of the team should include a cross-section of the organization to ensure full and fair coverage, and should not be full-time on the council; in fact it is best if they spend the majority of their time using the solutions at a local level so they can bring real-world problems to the enterprise team. Information technology staff who will implement software and publish data sets should be present to ensure the solutions can be realized in a reasonable timeframe with reasonable resources. Persons responsible for group-level metadata and vocabulary management should be involved. Subject matter experts should be identified in key areas that they may be called upon to explain detailed concepts and help create definitions and relation-ships for these concepts.

The valueThe benefits we have discussed include minimized ambiguity, clearer communications, more detailed analyt-ics and more nuanced forecasting capabilities. Improved management of intellectual property, more success-ful searches, and enlightening browsing of content will result.

Accentuate the Positive: Modeling Enterprise Ontologies 11

TriviumRLG LLC

Page 13: Accentuate the Positive: Modeling Enterprise Ontologies

Savings will be seen in a reduction of storage costs - both servers and boxes - as redundant content is re-moved from the system and high value content is reused. Less money will be spent re-creating content over and over as it will be findable. Nickels and dimes inside the organization add up across all employees - do not ignore them. See less outlay to creative agencies who no longer have to create similar pieces over and over. Minimize the risk of fines from copyright violations, and pay royalties accurately.

Generate income by ensuring your content is accurately invoiced. Provide greater satisfaction to customers who can find what they are looking for, earn their loyalty and trust, and see higher sales and repeat business. Examine the ontology’s usage data to inform advertising and promotional campaigns.

ConclusionThere is no one “right way” of building an ontology. No “right tool” to build with for all applications. No “right way” to encode it. No “right way” to use it to link your data. Decisions must be made with respect to the ap-plication of the ontology and the stakeholders. There are lessons to be shared; best practices to consider. There is never a “right time” to start: start now.

Accentuate the Positive: Modeling Enterprise Ontologies 12

TriviumRLG LLC

Page 14: Accentuate the Positive: Modeling Enterprise Ontologies

Contact Information

Primary contact information for TriviumRLG LLC is below.

Christine Connors, Principal

T +1.910.874.8486

[email protected]

www.triviumrlg.com

Skype/Twitter TriviumRLG

Accentuate the Positive: Modeling Enterprise Ontologies 13

TriviumRLG LLC