€¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT...
Transcript of €¦ · development, computer-aided development . T-76.651 SEMINAR ON DISTRIBUTED PRODUCT...
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
Tuomas Korpilahti Distributed Development of Ontologies - Keeping the Architecture Consistent November 4, 2003
Tuomas Korpilahti
47972U
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
Abstract
Authors: Tuomas Korpilahti
Name of Report: Distributed Development of Ontologies – Keeping the
Architecture Consistent
Date: November 4, 2003 Pages: 24
This paper presents a study on methods to prevent ontology developers from
inserting conflicts into ontologies in collaborative, distributed development. A set of
problems encountered in distributed ontology development is presented. Current
methods to answer these problems are described, and the methods are evaluated
against the problems.
The study suggests a set of development environment integratable methods that
contribute most in preventing errors in collaborative, distributed ontology
development. These methods cover the issues of ontology design criteria, knowledge
transfer, ontology architecture, architectural support for collaborative development,
synchronization of concurrent development, development tool user interface and
ontology storage.
Keywords: Distributed development, ontology development, collaborative
development, computer-aided development
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
Table of Contents
Abstract ............................................................................................................................... 2
1. Introduction ..................................................................................................................... 4
1.1. Background .............................................................................................................. 4
1.2. Research Problem and Objectives ........................................................................... 4
1.3. Study Scope and Study Methods.............................................................................. 5
2. Definition of Terms .......................................................................................................... 5
3. Background on Ontologies.............................................................................................. 6
4. Problems with Collaborative, Distributed Ontology Development ................................... 7
4.1. General Problems..................................................................................................... 7
4.2. Problems with Axioms .............................................................................................. 9
5. Current Approaches and Techniques.............................................................................. 9
5.1. Design Criteria for Ontologies................................................................................... 9
5.2. Knowledge Transfer................................................................................................ 11
5.3. Development Environment ..................................................................................... 12
5.4. Architectural Strategy ............................................................................................. 13
5.5. Synchronization Strategy........................................................................................ 14
5.6. Ontology Storage Strategy ..................................................................................... 15
6. Suitability of Approaches to Problems........................................................................... 16
6.1. Design Criteria for Ontologies................................................................................. 16
6.2. Knowledge Transfer................................................................................................ 17
6.3. Development Environment ..................................................................................... 17
6.4. Architectural Strategy ............................................................................................. 18
6.5. Synchronization Strategy........................................................................................ 19
6.6. Ontology Storage Strategy ..................................................................................... 20
7. Discussion and Conclusion ........................................................................................... 20
8. References.................................................................................................................... 23
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
4
1. Introduction
1.1. Background
Today more and more software is being developed in distributed teams. The emerging
semantic web will introduce ontologies and architectures that have estimated lifetimes
from 10 to 50 years. Ontologies are to be extended and used together in non-anticipated
ways. Also their development is likely to be a highly distributed task.
The architectural model of any system should remain clear and consistent until software is
decommissioned. Ontologies and architectures are very similar in a sense that they both
define a framework, within which different actors and software components interact with
each other and reason about the world. But as in any logical model, any conflict in
ontology will render it completely useless. Is it possible to provide tool support for
combining the requirements of high correctness with distributed, collaborative, evolutionary
development?
1.2. Research Problem and Objectives
We seek to find out how is it possible to maintain architecture and model consistency in a
changing environment where different people are building software in distributed,
collaborative teams using incremental or evolutionary software development models.
Our primary study objectives are to
• Find methods to prevent conflicts from being inserted into the system.
• Find methods to minimize the number of inserted conflicts.
Our secondary objective is to
• Seek ways to integrate the methods found to an integrated ontology development
environment (IDE).
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
5
1.3. Study Scope and Study Methods
We have conducted a literature study on collaborative, distributed ontology development.
As it is a relatively novel area with only so many publications, we will also bring in ideas
from collaborative development in general.
The rest of this paper is structured as follows. We will first define some common terms and
give background information on ontologies. The we present some major problems in
distributed, collaborative ontology development. In section 5 we will give an overview of
currently used approaches and practices to answer these challenges. We will analyze the
suitability of current practices and methods to the problems in section 6. Based on our
analyses, we will introduce a set of propositions to help the development of distributed
ontology development environments, and end the paper with a summary of our findings.
2. Definition of Terms
Ontology is a closed world model of some part of the world around us. It specifies the
world’s relevant concepts and relations between those concepts. It also may include some
type of logic rules to reason about the concepts and relations. Ontologies are typically
written in RDF (Resource Description Framework) and RDF Schema languages.
Ontology axiom is a logic rule in the ontology. It serves as a given description of what is
fundamentally true within that ontology. As an example, we could have an ontology of
transportation vehicles such as cars, ships and airplanes. We might then include an axiom
saying that a car cannot fly.
Collaborative development in this paper means the type of software development where
several developers actively collaborate while building a common product together. One
example would be pair programming of extreme programming (XP). A less computer
science oriented example would be two construction workers building a house. As one of
them holds a piece of wood in place, the other nails it to its place. In collaborative
development the developers need not necessarily be co-located – the idea is that people
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
6
are actively influencing each other’s work and interacting together as they are building
their shared goal.
Distributed development is used to describe software development where development
work is distributed across different geographically distributed locations. The locations may
be in different continents or just a few floors from each other. The main point is that the
developers are not co-located, in other words they do not sit together while working.
3. Background on Ontologies
As defined previously, an ontology is essentially a domain model that explicitly specifies
domain’s concepts and the relations between them. Ontologies are used in knowledge
representation. Their advantage is that they make the knowledge both human and
machine-readable, thus creating new possibilities for applications in artificial intelligence
and context-sensitive information portals, to name a few. Recent research has investigated
their use in ameliorating information indexing in and information retrieval from the Internet;
these efforts are done under the semantic web research field.
Ontology is a logical model; in other words, it describes a closed world and a set of rules
that are in effect in that world. Should the ontology contain a contradiction, the entire
model is useless in automated reasoning because as we know from mathematical logic,
one can prove any statement to be true in a contradictionary logical model. This is why in
the case of a complex ontology, often a considerable amount of work must be invested in
its development to ensure consistency.
Ontology development requires domain knowledge on the field that the ontology will
model. If the use of ontologies becomes widespread, it is likely that there will be a serious
lack of experts on knowledge modeling. Therefore an ontology development team is likely
to be geographically distributed, and team members cannot meet in person to discuss
modeling decisions. During development the experts must be able to use development
tools to find consensus on how they see the world and on the differing modeling ideas they
have.
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
7
Ontologies are to be used together with other ontologies to reason about the world in
cross-domain issues. An example of this would be the classic producer – seller
relationship, where one company makes for example cars, and another sells them. They
both talk about the same concept – a car – but have very different points of view, and thus
use different ontologies. The manufacturer might connect car with properties like model
number, electrical system provider, engine provider and factories capable of producing
that model. Retailer might prefer associating it with properties such as color, power,
maximum speed, supplier and unit price. They both need a common concept of a car, but
think about it differently. This example is quite obvious, but ontologies may be used in a
variety of ways that cannot be anticipated at the time of their creation.
The interaction of ontologies and the high level of expertise needed to develop them imply
that they must be developed in a distributed, collaborative fashion. The challenge comes
from the fact that ontologies are likely to have relatively long lifetimes. In our projects at the
University of Helsinki, we have estimated realistic lifetimes of 10 to 50 years.
4. Problems with Collaborative, Distributed
Ontology Development
4.1. General Problems
As we presented in the previous chapter, ontologies are likely to be developed in a
distributed fashion for a considerably long time. The development is likely to follow an
evolutionary and incremental model, as new ontologies are created from existing ones to
adapt to different use cases. Ontology development must answer the traditional challenges
of distributed software development: concurrent modifications, division of responsibilities
and integration.
As is normally the case in any work related to modeling real world, there are often several
ways to represent the same concepts and relations in an ontology. Usually the choice
between these does matter, and should be done considering the possible use cases of the
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
8
ontology. Therefore, knowledge transfer becomes and issue in ontology development. It is
especially crucial in re-using exiting ontologies to create new ones. As Aitken has showed
(1998), the quality of the resulting ontology tightly correlates with how deep understanding
the developers have on the ontologies they are re-using. In order to achieve good results,
one must know well how the source ontologies work; otherwise one easily inserts semantic
errors. We believe the same also applies when referencing concepts from other
ontologies.
To connect ontologies together, it is possible to reference a concept in one ontology from
another ontology. This brings in additional challenge; if the referenced ontology is
modified, the referencing ontology might get broken. This is an issue because ontologies
can be too large or too cross-referenced to be integrated into a new ontology. (Farquhar
et al., 1996) presents an example of this by introducing two large ontologies, medical
ontology and sports ontology that cross-reference each other. In medical ontology we
might state, “Roller-blading (sports ontology concept) is a major cause of wrist fractures.”
Sports ontology might claim, “Some weight-lifters use anabolic steroids (medical ontology
concept).” We don’t want to include all of the concepts of the other domain into our
ontology, just the useful. Any ontology might have same kinds of cross-references with
several ontologies, for example sports – business, business – medical, medical – law, law
– business etc. Creating one combined ontology for each case would make maintenance
impossible.
(Herbsleb & Grinter, 1999, page 69) suggests that in distributed development one should
“only split the development of well-understood products where architectures, plans and
processes are likely to be stable”. Problem with ontologies is that there is no general
authority to divide the work; at the start of the modeling process the result is fuzzy at best;
there are lots and lots of users, and all of them have an opinion on how the ontology
should support their use case. Instability greatly increases the need for communication.
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
9
4.2. Problems with Axioms
Logical knowledge in ontologies is often encoded in the form of logical axioms. They are a
very powerful tool for reasoning, but at the same time introduce some sources of axiom
specific problems. (Sure et al., 2002) and (Sure & Studer, 2002) have classified the errors
related to axioms in three categories:
• Axioms contain typing errors like variables not specified by a quantifier, typos in
concept names or relationship names etc.
• Axioms contain semantic errors, i.e. the rules do not express the intended meaning.
• Performance issues, like axioms defined such that evaluation needs a lot of time,
which is not always easily recognizable by the users.
5. Current Approaches and Techniques
In the previous chapter we presented some problems that occur in distributed ontology
development. Next we will present some approaches and techniques currently used to
solve these problems. We have divided them into six categories. 5.1 discusses about
ontology design in general. In 5.2 we discuss issues related to communication,
documentation and collaboration between developers. We then present existing
enhancements to ontology development environments in 5.3. In section 5.4, we
concentrate on the possibilities to divide and guide work by ontology system architecture.
5.5 addresses the issues related to synchronizing concurrent, collaborative development.
Lastly, 5.6 presents possible solutions for combining the development results into a long-
term storage.
5.1. Design Criteria for Ontologies
Gruber has defined design criteria for ontologies (1993). These criteria aim to act as a
basis for delivering clear and expressive ontologies that are easy to use and re-use. They
were developed to support ontology development process in general. We present the
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
10
design criteria here because they provide a strong guideline in pursuit of better tool
support for distributed ontology development.
The design criteria for ontologies Gruber defines are:
• Clarity: one should use objective, formal and complete definitions.
• Coherency: the ontology and use of terms should be internally coherent. Also the
free text explanations should be coherent with the model.
• Extendibility: defining new terms should not require revisioning the original ontology.
• Minimal encoding bias: minimize the dependency on data type specification and
alike.
• Minimal ontological commitment: make as few claims about the world as possible.
Use vocabulary consistently.
Clarity and extendibility are quite obvious criteria. Coherency is notably stricter than one
might expect, as it poses a requirement even on the free text explanation fields that
normally are left out of the scope of most development methods and tools.
What minimal encoding bias means is actually much more than one might expect on first
look. Gruber proposes that ontologies should be independent not only of data types but
also of concepts such as units of measure or precision. For example, an ontology handling
speeds cannot expect them to be measured in kilometers per hour. The same ontology
must be capable of handling speeds also in any unit one could possibly invent – miles per
hour, millimeters per nanosecond, Roman legion march speed and so on. This way, the
ontology is more usable as other users can just define new units of speed and use them
with the original ontology.
Minimal ontological commitment is a very intuitive criterion, for one may not always
foresee how her ontology is used and re-used. Thus, one should leave as many
possibilities for further development as possible. This can be done by stating as little as
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
11
possible about the world in order that the statements would not prevent further
development. Minimal ontological commitment is this principle of creating an ontology with
a minimal set of concepts and relations needed to describe the world in that use case.
5.2. Knowledge Transfer
Knowledge transfer is an issue in all distributed development. Currently most ontology
development environments are used in academic community. They offer very little
communication support, and seem to relay mostly on methods similar to those of open
source projects; main media of collaboration are email and chat.
The need for organized knowledge transfer processes and means has been recognized.
Aitken suggests in (1998) that extensive documentation on design principles, intended use
and planned re-use methods of an ontology should be included with it. As he has showed,
understanding the functionality of source ontologies is a key to successful re-use.
(Herbsleb & Grinter, 1999) further underlines the importance of proper documentation. In
distributed development there is no informal communication. Important design decisions
are not spread through the parties working on one ontology if not explicitly recorded and
promoted to the developer community. However, mere documentation alone does not
solve the issue of daily collaboration but helps in a longer term. Other solutions are
needed for day-to-day knowledge transfer.
OntoEdit provides one solution for knowledge transfer. OntoEdit is a tool for ontology
development. It facilitates collaborative development by incorporating integrated support
for mind maps, as (Sure & Studer, 2002) explains. It has a plug-in to connect to
commercial software for building mind maps. Developers can sketch their ideas in mind
maps and share them with other developers. Currently, the relations in the mind map are
not connected to the ontology engine. They serve as external documentation on
developer’s goals and purposes.
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
12
5.3. Development Environment
As we noted in previous chapter, most ontology development environments in the market
today are results of academic research and meant for academic community. This often
means that they aim at providing freedom of expressiveness at relatively low development
cost. In this context, users are experienced and thus the development of and investment in
graphical user interface has been much smaller, with a few notable exceptions.
Traditionally, development environments have tried to provide support for conflict detection
and resolution. An example of this is presented in (Farquhar et al., 1996). In Ontolingua
system, when merging two ontologies, conflicts are detected and user is prompted for a list
of possible solutions to solve the problem. This approach tries to ease solving possible
semantic errors. The problem is that not all semantic errors can be catched nor solved
automatically (Klein et al., 2002).
Despite historically being editors for experienced people, a common agreement is that
ontology development environment should include a graphical user interface (GUI)
(Grosso et al., 1999; Sure et al., 2002; Sure & Studer, 2002). Ontology can be seen as a
directed graph and it often has some type of hierarchy, which can be used to visualize it.
Intuitive visualization together with drag and drop editing address particularly two problems
related to ontology axioms: typing errors and performance issues.
As the mappings between concepts are done by computer while developer clicks a
graphical representation of ontology, all typing errors can be prevented. Optimization
algorithms can be used to optimize complex axioms for performance; different algorithms
could be used depending on how much knowledge developer has on the system where
the ontology will be used.
In some cases, ontology axioms can be too complex for drag and drop editing. Also, GUI
tools often limit developers’ freedom to define relations and axioms. Thus, direct access to
ontology encoding is needed. (Sure et al., 2002; Sure & Studer, 2002) suggest that
whenever this is the case, the development environment should provide at least syntax
highlighting to help developer note possible typing errors.
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
13
5.4. Architectural Strategy
A widely used method to control complex and large entities in software industry is to divide
them into independent components that communicate via well-specified interfaces. Also
ontologies may in some cases be divided into independent modules. Modules can interact
in two different ways. One way is to connect a module to other modules by referencing
concepts defined in other modules. Another possibility is to combine the modules at query
level, querying each module separately and then combining the results according to some
logic.
According to (Ding & Fensel), most systems that aim to facilitate ontology re-use,
integration, or connecting ontologies have adopted modular organization of ontologies in
their ontology library. But as (Stuckenschmidt & Klein, 2001) notes, ontology
modularization requires that one is able to divide an ontology into independent modules
that can be queried individually. This is because modules should be independent of other
modules, and more specifically, one module should be able to answer a query without
querying other modules.
Another paradigm that can be used to organize ontology libraries is a standard upper level
ontology. It serves as a common framework that domain-specific ontologies extend.
Different ontologies are therefore connected via upper level ontology relations. This can be
used to divide work to different groups of specialists when creating the domain-specific
ontologies, and to relate cross-domain knowledge when querying ontologies. (Ding &
Fensel) states that standard upper level ontologies provide very important contribution in
these aspects.
Independent from ontology modularization and standard upper level ontologies, current
collaborative ontology development systems come in two architectural models. All systems
implement either peer-to-peer or client / server architecture to store and allow modification
of ontologies. Each model has its strengths and weaknesses. (Ding & Fensel) suggests
that client / server model seems to be critical for collaborative editing. It allows one to
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
14
specify a fixed connection point, where ontology can be accessed, and version and
concurrency control can be enforced.
Peer-to-peer architectures offer considerable scalability advantages. According to our
understanding, collaboration support seems to be somewhat more challenging to
implement to them. Existing peer-to-peer systems appear to focus more on the problem of
distributing large ontologies in a performance efficient manner. Systems with client / server
model attempt to tackle collaborative development.
5.5. Synchronization Strategy
In this section we will discuss different strategies to support collaborative editing in every
day tasks, and in merging the edits of different developers. One strategy is of course not to
support it at all. This is not necessarily a bad idea. As (Sure & Studer, 2002) explains,
ontology developers can re-use parts of existing ontologies to create a new one, and then
add the new ontology into the ontology library system. This method does not provide
support for collaborative editing but is very effective for re-using existing ontologies. It is
also easier to implement than other methods.
Collaborative editing can be understood in such sense that several developers develop the
same part of an ontology together. To support this approach, the development
environment can implement a laissez-faire collaboration strategy where the system does
not protect a developer from others’ modifications but all modifications are broadcasted to
all developers in real time, and their editing environments display the changes
immediately. The newest version of a development tool called Protégé from Stanford
University implements beta level collaboration support this way. Protégé is a client / server
based architecture where collaboratively edited ontologies reside on the server, and clients
access them via a network connection. More detailed information on Protégé can be found
from (Grosso et al., 1999), but so far there exist no papers on its collaboration abilities.
Quite the opposite strategy to laissez-faire is locking. System can lock parts of an ontology
as they are to be modified. Different approaches can be taken here. One approach very
similar to open source development is explained in (Farquhar et al., 1996) and in (Karp et
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
15
al., 1999). Developers edit local copies of ontology, and as they attempt to commit the
changes, the modifications are tested for conflicts. Should any conflicts be found,
developers are forced to remove them before the commit can be done. The system locks
the ontology only while integrating the non-conflicting changes to the original ontology.
This method aims to minimize the time any part of an ontology is locked, because locking
an ontology effectively prevents other developers from working on it.
Another approach would be to lock part of an ontology for the entire duration of
development. This would require either short commit intervals or well defined responsibility
areas in order not to block other developers and to minimize the number of conflicts at
commit time.
Locking requires the implementation of user and group access control. They can be used
to manage changes developers can make by assigning read, overwrite and new concept
creation rights to users. One implementation of access control is done in Ontoligua
(Farquhar et al., 1996).
Another way to divide development responsibility is modularization. If ontology can be
modularized, the modules can be used as a basis for assigning work to different
developers (Herbsleb & Grinter, 1999). This helps in preventing overlapping modifications,
but it may not always be possible.
During development the developers need to be informed on changes to other parts of
ontology, when these changes affect the parts they are developing. A notification
mechanism can be implemented to alert developers about others’ changes (Farquhar et
al., 1996).
5.6. Ontology Storage Strategy
Collaborative development can be supported with different ontology storage strategies.
The main issue in storing ontologies is version control. As (Ding & Fensel) remarks,
version control is very important, and most existing ontology development environments
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
16
could do it better. Existing version control systems use mainly methods similar to those
explained in (Karp et al., 1999). We discussed these methods the previous section.
In versioning it is important to separate the identity of ontologies from the identity of files in
which they are stored, as (Klein et al., 2002) explains. This way an ontology can have a
unique version identifier that can be used to refer to that particular ontology. It allows us to
change ontology location and name, adapting to changing organizations and correcting
possible typing errors.
One way to maintain consistency as the ontology evolves is to separate instances from
ontologies (Ding & Fensel). This allows system to use different ontologies to provide
different perspectives on the same data. To refer to our previous automotive industry
example, instances would be real, physical cars that are in stock. Manufacturer could use
one ontology to keep track of the different parts in a specific car and their providers,
whereas retailer might have another ontology for storing ownership history, condition and
price to provide buyers with a search application to find the best possible match for their
needs.
6. Suitability of Approaches to Problems
6.1. Design Criteria for Ontologies
In section 5.1 we presented design criteria for ontologies. As we mentioned, the criteria
are very general, and aim to help developer process her thoughts. Therefore they may not
completely integrate to an integrated development environment. If developer is able to
follow them, the resulting ontology is likely to contain less semantic errors than if the
criteria are not followed.
Clarity criterion could be partly integrated by using graphical tools and windows to add
concepts and relations. At the creation time the system could hint the developer to fill in
most important details. Development environment should not force her to do so, though.
As ontology modeling is a creative task, we do not believe that all information could be
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
17
entered at once. Developer might be still just drafting her model and might not yet know
the exact details.
Extendibility criterion should be completely intergratable by using standard ontology
description languages. The system could take care of all required namespace and
referencing issues, thus preventing lots of possible errors.
Other criteria depend greatly on developer’s will to commit her to them. This is why we
remain rather skeptic that the development tool could enforce them. However, we do
believe that careful user interface design can aid them by guiding user to follow the
criteria.
6.2. Knowledge Transfer
Current systems do not seem to support knowledge transfer very well. We believe that a
collaborative ontology development environment should strongly support communication
channels where developers could present different opinions, discuss on design
propositions and reach a consensus. One possible option would be integrating the IDE via
plug-ins to various communication tools, as the mind map case shows. Presence
information and real time communication might also be very useful.
Combining documentation with ontology is especially important when ontologies reach
long lifetimes. The development environment should encourage developers to keep the
documentation up to date to help maintenance and re-use at any later point in time. We
believe that implementing these strategies would greatly reduce the number of errors
inserted into an ontology during development.
6.3. Development Environment
Current publications all agree that graphical user interface tools are needed to help
ontology development. When using them, typing errors in concept names etc. no longer
introduce conflicts, as the tool creates all relations between concepts and updates them
automatically. We believe that good visualization combined with drag and drop editing
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
18
could reduce also semantic errors, as developers can clearly see the results of their
modifications. If developers want to directly modify the ontology description language
code, syntax highlighting could help them prevent typing errors.
So far, there has been a shortage of graphical development environments that allow
collaborative and concurrent development. We are happy to see that lately a lot of effort
has been invested in this area. As some results are already available in the form of for
example Protégé, efforts continue to produce even better quality development tools.
Meanwhile, as we are dealing with a relatively complex application area with lot of
expressive power, we suggest that a tool offering syntax highlighting be kept available for
dealing with issues that cannot be solved with graphical tools.
Semi-automated conflict identification and resolution seems very useful indeed, and we
suggest incorporating them into any development environment. They can provide valuable
help in recognizing problems and offering a set of possible causes and solutions. Having
said that, we would like to remind that while highly applicable, conflict resolution tools
cannot identify all possible problems (Klein et al., 2002). A combination of knowledge
transfer and design criteria support might therefore prove more valuable, as they actually
help prevent the conflicts from being inserted in the first place.
6.4. Architectural Strategy
We are strongly inclined to believe that standard upper level ontologies can be used with
success to organize ontology library systems and ontology developers’ work. This should
result in fewer semantic errors. In addition to that, they help relating different domain-
specific ontologies in ways that allow non-trivial, cross-domain associations between
concepts.
Modularity has been proven to work well in distributed development. It helps to divide work
into smaller fragments that are easier to understand. Less semantic errors are therefore
expected. Still, we would like to point out that dividing an ontology to completely
independent modules might limit the ability to relate concepts together via upper level
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
19
concepts. This in turn could limit the potential of finding related concepts using non-trivial
relations in the concept hierarchy.
A client / server model has convinced us about its ability to support collaborative editing.
We suggest that it be considered an option for any development environment attempting to
implement collaboration protocols. We would like to see it combined with (semi-)
automated conflict resolution and version control.
6.5. Synchronization Strategy
We believe that a real synchronization strategy is required to enable developers
collaborate effectively when developing an ontology. Just allowing one to re-use existing
ontologies to create new ones might decrease the number of modularity and referencing
issues, but would exponentially increase the number of ontologies. This would lead into a
versioning and maintenance catastrophe before ontology’s lifetime of 50 years has been
reached.
Locking effectively prevents others from working on any particular area of an ontology.
Still, it does not prevent misassumptions about locked concepts. We believe locking
together with user access control and notification mechanisms could be used to divide
development work to different organizations. In collaborative activities, we believe it would
pose too rigid limits.
For collaboration support on microscopic level we suggest laissez-faire style. Surely, it will
pose problems when one user spends a lot of effort to model a concept and another user
deletes it right away, but its strengths in communication outweighs the problems.
Laissez-faire strategy does exactly what communication tools aim to do: it allows
developers to quickly demonstrate their ideas to others. Another advantage is that as
conflict resolution cannot be fully automated, laissez-faire forces all developers have
identical models all the time. Fewer conflicts thus emerge at commit time. For coordinating
the work, users should use more traditional communication and versioning techniques.
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
20
6.6. Ontology Storage Strategy
As we already mentioned, proper versioning is very important in collaborative, distributed
development. Versioning should handle only “finished work”; it should omit the details of
collaborative editing done to modify one version to another. The modification details
themselves could be used to as hints to facilitate automated conflict resolution.
Separating instances from ontologies and the identity of ontologies from the identity of files
in which they are stored are good decisions. They let the system to work with concepts
such as ontology, instance and version in an independent manner, which allows one to
freely distribute the physical storage around the Internet. Same instance data can be
viewed through different ontologies to create different views on the same concepts.
Separating instances, ontologies and files can – and in our opinion, should – be supported
by development environment. The developer can then concentrate her energy to ontology
development itself, not on the possibly confusing details of storing ontologies, which could
introduce further conflicts.
7. Discussion and Conclusion
Most important steps to prevent errors in collaborative ontology design seem to be
• Sound ontology design principles
• Good knowledge of the domain and of the ontologies being re-used
• Ability to share design ideas with other developers
• Separating modeling from the dirty details of language syntax
• Version control to keep track of the evolvement of the ontology.
It is difficult to provide tool support for ontology design criteria. Still, we believe that any
investment on them will be high appreciated by the developers, as it is likely to decrease
the number of errors inserted during ontology development.
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
21
Collaborative development tools should support various communication methods, both
asynchronous and synchronous. Exchange of ideas and design sketches is also important,
as is the ability to derive initial ontology definitions from the sketches.
Development tools should offer collaboration support to allow multiple authors to edit the
same model concurrently, without local copies. Client / server model seems to suite this
purpose naturally. Laissez-faire style synchronization can be used to quickly draft ideas
and propagate changes to all developers in real time. More rigid conflict identification and
resolution schemes should be used when finished parts are being committed back to
repository. Conflict checking can also occur on user request. Real-time checking might
prove too heavy and could hinder rapid drafting.
Ontology repository should implement strong version control to help developers
incrementally build the ontology and derive evolutionary branches. If possible, ontologies
should be divided in a modular fashion. Careful consideration of pros and cons is required
if modularization would limit the reasoning one is able to do on the ontology.
Locking mechanisms and access right control might help to divide work between different
organizations and thus prevent errors rising from modeling conceptually overlapping items
independently. However, they require short commit intervals in order not to let the
development efforts stray.
Use of a development tool with a graphical user interface is a very powerful way to
eliminate problems rising from typing errors and syntax mismatches. The tool may also
use optimization algorithms to optimize ontology axiom descriptions for performance. In
case when it is not possible to implement a graphical user interface, the development tool
should provide at least syntax highlighting to help user locate typing errors and syntactical
errors as she modifies the ontology representation language.
Most of the steps described above help in preventing problems that rise from syntactic
errors or concurrent editing. Semantic errors are harder to prevent, as they depend on the
domain and the intended purpose of an ontology. We believe that more research on this
field would be beneficial in developing better distributed development tools.
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
22
We have presented common problems in collaborative, distributed ontology development.
We then gave an overview of different methods currently used to overcome these
problems. We analyzed the suitability of these methods to solve the problems presented
and found that most methods can be integrated into a distributed ontology development
environment. We noted that some methods concentrate on removing existing conflicts
from ontologies instead of trying to prevent them from being inserted. We introduced a set
of propositions for distributed development environments. We believe that those are the
key points in helping users better achieve high quality results when collaboratively
developing ontologies. The study suggests that more research on how to prevent semantic
errors during collaborative modeling could benefit development of distributed development
tools.
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
23
8. References
S. Aitken, Extending the HPKB-Upper-Level Ontology experiences and observations. In
Proceedings of the Workshop on Applications of Ontologies and Problem Solving
Methods(ECAI'98), Brighton, England, August 1998.
Ying Ding and Dieter Fensel, Ontology Library Systems: The key to successful Ontology
Re-use.
A. Farquhar, R. Fikes, and J. Rice. The Ontolingua server: A tool for collaborative ontology
construction. Technical report, Stanford KSL 96-26, 1996.
W. Grosso and H. Eriksson and R. Fergerson and J. Gennari and S. Tu and M. Musen,
Knowledge Modeling at the Millennium -- The Design and Evolution of Protege-2000.
Proceedings of the 12th International Workshop on Knowledge Acquisition, Modeling and
Management (KAW'99), Banff, Canada, October 1999.
T. R. Gruber, Towards Principles for the Design of Ontologies Used for Knowledge
Sharing, Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer
Academic Publishers, 1993.
Herbsleb, J.D. and Grinter, R.E. (1999) Architectures, coordination, and distance:
Conway’s law and beyond. IEEE Software, Volume: 16 Issue: 5, Sep-Oct. 1999 Page(s):
63-70
Peter D. Karp and Vinay K. Chaudhri and Suzanne M. Paley, A Collaborative Environment
for Authoring Large Knowledge Bases, Journal of Intelligent Information Systems, Volume:
13, Number: 3, Pages 155-194, 1999.
Michel Klein and Dieter Fensel and Atanas Kiryakov and Damyan Ognyanov, Ontology
versioning and change detection on the Web, 2002.
Heiner Stuckenschmidt and Michel Klein, Modularization of Ontologies - WonderWeb:
Ontology Infrastructure for the Semantic Web, 2001.
T-76.651 SEMINAR ON DISTRIBUTED PRODUCT DEVELOPMENT
Helsinki University of Technology
24
Y. Sure and S. Staab and J. Angele, OntoEdit: Guiding Ontology Development by
Methodology and Inferencing, Proceedings of the International Conference on Ontologies,
Databases and Applications of SEmantics ODBASE 2002.
Y. Sure and R. Studer, On-To-Knowledge Methodology - Final Version. Institute AIFB,
University of Karlsruhe, On-To-Knowledge Deliverable 18, 2002.