FGWM-2012 - DAI-Labor · Jochen Reutelsh ofer, Joachim Baumeister, Georg Fette and Frank Puppe...

Workshop Proceedings

FGWM-2012

Workshop on

Knowledge and Experience Management

Editors:

Kerstin Bach

Competence Center CBR,German Research Center forArtificial Intelligence DFKI GmbHand Institute of Computer Science,University of Hildesheim, Germany

Michael Meder

Competence Center InformationRetrieval & Machine LearningDAI-Labor - Technische UniversitatBerlin

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Kerstin Bach and Michael Meder

Long Papers

InKnowE: Enhanced Product Innovation by Application of KnowledgeManagement, Knowledge Networks, Business Intelligence, Web andSocial Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Mareike Dornhofer and Madjid Fathi

A Document-centered Authoring Approach for Ontology Engineering . . . . 13Jochen Reutelshofer, Joachim Baumeister, Georg Fette and FrankPuppe

Short Papers

Modeling the Structure of Spreadsheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Christian Liguda

Resubmissions

iTree: Skill-building User-centered Clarification Consultation Interfaces . . 26Martina Freiberg and Frank Puppe

Confidence in Workflow Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Mirjam Minor, Mohd. Siblee Islam and Pol Schumacher

Semantic Alliance: A Framework for Semantic Allies . . . . . . . . . . . . . . . . . . . 32Catalin David, Constantin Jucovschi, Andrea Kohlhase and MichaelKohlhase

PrefaceFGWM-2012:Workshop on Knowledge and Experience Management

Kerstin BachCompetence Center CBR, German Research

Center for Artificial Intelligence DFKI GmbHand University of Hildesheim, Germany

[email protected]

Michael MederDAI-Lab

Technische Universitat BerlinGermany

[email protected]

Knowledge and Experience ManagementThe use of existing experience based upon a shared knowl-edge is one of the main human approaches in problem solv-ing. Knowledge and Experience Management aims to en-able and support the conversion of this basic human ap-proach into intelligent systems. The approaches and tech-niques used to provide and manage the knowledge and togather and reuse experience are the subjects of this work-shop. To be able to use experience the first set of openquestions lies within the field of knowledge acquisition. Inthe view of the development of the Web 2.0 and especiallysocial networks a wide field of sources for experience isgiven. Additionally there exist rich web-based sources ofknowledge in the form of Linked (Open) Data. The ac-quisition of knowledge and experience can also reach intothe sources of, for example, best practices used in businessor text documents. However, the acquisition from all ofthese sources bears some challenges regarding a variety ofopen questions. These questions are given by the problemsof proper and efficient knowledge formalization especiallythe design of efficient forms of knowledge representationfor a given task, like for example the design of ontologies.Further the extraction of knowledge and experience fromcomplex data poses a complex problem too. To eases theseproblems the workshop examines new approaches in usingsemantic web technologies to aid the knowledge and expe-rience exchange and acquisition. Another set of problemsoccurring in the process of reusing knowledge and experi-ence is given by the tasks of fast and accurate retrieval ofknowledge and experience.

The use of knowledge and experience in collabora-tive environments like e-business or e-government involvesquestions about the authoring, maintenance and exchangeof knowledge and experience. Those questions deal withthe secure storage of knowledge for example in cloud stor-age environments, the proper integration of existing busi-ness processes into knowledge management systems andthe possibilities to capture knowledge just in time withinsuch e-collaboration systems.

The mentioned increase in variety and number of sourcesfor knowledge and experience and the rapid developmentof mobile technologies induce a set of advanced questionsregarding knowledge management and the gathering anduse of experience. These advanced topics the workshopdeals with, are the enhancement of knowledge formaliza-tion approaches to enable explanation capable and contextsensitive systems. Further the challenges a foreseeable de-velopment of ambient intelligent environments pose to justin time knowledge capture and retrieval in such a highlymobile environment. These challenges can be met by new

approaches as of agile knowledge and experience acquisi-tion and agent based systems for knowledge managementenriched with distributed on demand knowledge retrievaltechniques. These new approaches are to be examined anddiscussed within the workshop.

FGWM 2012The workshop is a part of the workshops organized at theLWA 2012 conference in Dortmund, Germany (namely the“German Workshop on Knowledge and Experience Man-agement”). The objective of the workshop is to providean opportunity for exchanging ideas related to the applica-tion of various approaches and techniques within the fieldof knowledge and experience management. The workshopaims at providing a forum for the discussion of recent ad-vances in this research field and to offer an opportunity forresearchers and practitioners to identify new promising re-search directions. The workshop also aims to provide aplatform for young researchers to present their work andreceive feedback from the knowledge and experience man-agement community.

This year we received 11 submissions and after a peerreview, we were able to accept 6 contributions for presenta-tions. Among the accepted papers we have one long paperby Mareike Dornhofer and Madjid Fathi discussing knowl-edge management driven product innovation, one short pa-per by Christian Liguda discussing an approach for repre-senting underlying spreadsheet models and four resubmis-sions of papers that have been presented on different inter-national conferences during this year. Further, we are veryhappy that Benno Stein gives an invited talk on “The Webas a Corpus”.

Like in previous years, the FGWM will have two jointsession with the Information Retrieval (FGIR) and theKnowledge Discovery and Similarity (KDML) workshop,which are both co-located at the LWA.

2012 Program Committee• Andreas Abecker, disy Informationssysteme GmbH• Klaus-Dieter Althoff, University of Hildesheim• Joachim Baumeister, denkbares GmbH Wurzburg• Ralph Bergmann, University of Trier• Simone Braun, Forschungszentrum Informatik (FZI)• Jorg Cassens, University of Lubeck• Andrea Kohlhase, Jacobs University• Christoph Lange, Jacobs University• Andreas Lommatzsch, Technische Universitat Berlin

• Heiko Maus, German Research Center for ArtificialIntelligence (DFKI)

• Mirjam Minor, Univerity of Trier• Ulrich Reimer, FHS St.Gallen• Jochen Reutelshofer, denkbares GmbH Wurzburg• Bodo Rieger, Univerity of Osnabruck• Thomas Roth-Berghofer, University of West London• Ralph Traphoner, Empolis

In closing this preface, we would like to recognize allthe colleagues who helped to review the submissions andto guarantee the quality of the papers that have been in-cluded. And last but not least, we would like to thank Prof.Dr. Katharina Morik, Nico Piatkowski, Hendrik Blom andTobias Beckers for hosting this year’s LWA at the Univer-sity of Dormund, Germany.

September 2012Kerstin Bach & Michael Meder

InKnowE: Enhanced Product Innovation by Application of KnowledgeManagement, Knowledge Networks, Business Intelligence, Web and Social

AnalyticsMareike Dornhöfer, Madjid Fathi

Universität SiegenD-57068 Siegen, Deutschland

[email protected], [email protected]

AbstractThe given paper focuses on the enhancementof product innovation by means of technologiesand disciplines like knowledge management, es-pecially knowledge networks, business intelli-gence, web and social analytics. Based on theconnectivity of product consumers via the inter-net, it is no longer possible for companies to de-velop their products from scratch without consid-ering the public customer feedback. The innova-tion process needs input not only from employ-ees of the company, who are tasked with creatinga new product or product generation, but also byapplying different knowledge sources, indicatorsor analyses. Therefore the paper proposes a con-cept idea for leading the innovation process evencloser to the customers needs by applying tech-nologies from the disciplines and methods men-tioned above in addition to innovation manage-ment methods.

1 IntroductionThe given paper proposes a concept called InKnowE forthe enhancement of product innovation through applyingmethods from the fields of knowledge management, es-pecially knowledge networks, business intelligence (BI),web and social analytics. In this context the aspect ofopen innovation, communities for innovation or innova-tion labs will be breached as well. The objective ofInKnowE is to improve the classical innovation processdetailed e.g. in [Vahs and Burmester, 2005] or [Müller-Prothmann and Dörr, 2006] and to make a product innova-tion even more market and consequently customer oriented.Today push/technology/company initiated product innova-tion and pull/market/customer initiated innovation have tobe brought together by companies to be successful on thecustomer market [Bergmann and Daub, 2008], [Nerdingeret al., 2010], [Völker et al., 2007]. Additionally methodslike open innovation, communities for innovation and in-novation labs have to be integrated into the innovation pro-cess.

Enterprises, which are selling their products via inter-net, involuntarily have to face the challenges, as well asthe advantages of positive or negative customer feedbacke.g. regarding the characteristics, application or dura-bility of their products. Sales platforms (e.g. Ama-zon, www.amazon.com), social networks (e.g. face-book, www.facebook.com), product comparison pagesor blogs, offer a huge amount of data, consisting of selling

numbers, consumer profiles, target groups, product evalua-tions, customer feedback, hints for improvement and evenprice comparisons with fellow competitors. Evaluating andusing these information may lead to more consumer ori-ented products, more positive feedback and less consumercomplaints, which consequently improves the companiesimage concerning their product quality.

In this context, the proposed concept of IKnowE facesthe challenge of bringing results from feedback analysestogether with e.g. strategic goals, product portfolio or theinnovation budget of the company. InKnowE focuses on in-tegrating the technologies mentioned in the first paragraphinto one framework to support the decision making processfor a new product innovation idea and the consequent re-alisation. Figure 1 depicts a mind map about the productinnovation environment in a company and the possible in-fluencing factors.

In the following section the disciplines and technologiesapplied in IKnowE are introduced and summarized, beforesection three gives a literature study of current interactionsbetween the different disciplines. Afterwards section fouranalyses the product innovation context based on the afore-mentioned mind map and introduces the concept idea forintegrating the disciplines into one framework. At the endsection five concludes the paper and gives an outlook onthe next steps.

2 InKnowE disciplinesThis section introduces the disciplines relevant for theInKnowE framework proposed in this paper for the en-hancement of product innovation through means of knowl-edge management especially knowledge networks, as wellas features from business intelligence and results from weband social analytics.

2.1 Innovation managementIn his work about how the worlds best companies man-age their innovation machine, Wentz [Wentz, 2008] definesfour important factors for innovation: 1) innovation has tobe consistent with the companies strategy, 2) there has tobe a synergy with the core competencies of the company, 3)there has to be a success potential for the approached fieldof innovation and 4) if there are different fields of innova-tion, they have to have synergies with each other. The focusof this work will be product innovation, although the pro-posed concept may also be transferred to a decision com-ponent for service innovation.

Product innovation is essential for the success of a (man-ufacturing) company on the market. In this context Vahsand Burmester [Vahs and Burmester, 2005] define product

Figure 1: Analysis of product innovation context

innovation as a process of creating new material and im-material products, which satisfy customer needs and have apositive impact on the companies profits. There are differ-ent strategies for companies, how to approach the releaseof a new innovation on the market. Bergmann and Daub[Bergmann and Daub, 2008] diversify the different formsof innovation strategies, e.g. pioneer, second to the marketor me-to-strategy.

To be even closer at the customer requirements, com-panies realised that the classical form of only inhouse in-novation teams without the input of external sources e.g.experts or customers is in most branches no longer suffi-cient enough to be successful on the market. Thereforethe methods of open innovation, community based inno-vation [Füller et al., 2004], [Bretschneider et al., 2012]and innovation labs emerged, aided by the further devel-opment of the internet. Open innovation allows the cus-tomer to give input towards the design, development andproduction of new, or enhancement of existing products.Reichwald and Piller [Reichwald and Piller, 2007] definethe process of open innovation as a value-added coopera-tion between companies, external experts and consumersfor the development of mass products. In the same con-text the aspect of product individualization and mass cus-tomization of products (e.g. individual customized shirts)is addressed. Howaldt, Koop and Beerheide [Howaldt etal., 2011] go even one step further. They declare the In-novation management 2.0 in reference to the Web 2.0. andthe Enterprise 2.0, thus hinting the importance of web tech-nologies for today’s innovation management. The con-cept of community based innovation [Füller et al., 2004]or virtual communities for innovation [Bretschneider et al.,2012] is another approach to be considered. "Virtual Com-munities for Innovation are becoming increasingly popularas platforms for firms to engage customers in generatingnew ideas" [Bretschneider et al., 2012]. Therefore the inputcustomers provide for improving or developing new ideasvia web communities should be part of the innovation man-

agement processes of a company, just like the classical in-house innovation or research department. Another type ofcommunity of innovation are the living labs. "Living labsoffer a collaborative partnership framework in which user-centred, innovation activities can take place" [Mulvenna etal., 2010]. They may be established for different innova-tion purposes and branches by research institutions, civicinstitutions or companies. From these short definitions andexamples, it is apparent that communities for innovationas well as innovation labs are part of the open innovationmovement.

2.2 Knowledge management includingknowledge networks

There is a lot of literature regarding the field of knowledgemanagement, knowledge work, knowledge workers as wellas knowledge networks available today. Based on the def-inition of Maier, Hädrich and Peinl [Maier et al., 2005]"Knowledge Management is defined as the managementfunction responsible for regular (1) selection, implemen-tation and evaluation of knowledge strategies (2) that aimat creating an environment to support work with knowledge(3) internal and external to the organization (4) in order toimprove organizational performance. The implementationof knowledge strategies comprises all (5) person-oriented,product-oriented, organizational and technological instru-ments (6) suitable to improve the organization-wide levelof competencies,education and ability to learn.".

Knowledge work summarizes different characteristics,which differ from traditional work and are e.g. accumu-lated by Maier, Hädrich and Peinl [Maier et al., 2005].According to the authors, knowledge work is among othercharacteristics communication, cooperation and network-oriented, it uses semi-structured data, uses different toolslike document/content management systems, experiencedata bases, newsgroups, mail folders or other groupware.In the context of enhancing innovation, the collaborationand communication aspect of knowledge work is an essen-

tial fact. In a former work, Dornhöfer, Holland and Fathi[Dornhöfer et al., 2012] propose a knowledge-based inno-vation detection and control framework. One aspect of theirconcept model is the set up of a collaboration communityon different levels. This collaboration community is oneform of a knowledge network or community for innovation(→ 2.1).

Back et.al. [Back et al., 2005] define knowledge net-works "as social networks between knowledge players,which allow the creation and transfer of knowledge amongindividuals, groups, organizations, and between hierar-chical levels." The authors furthermore stress the impor-tance of the interconnection of business and knowledge net-works. Knowledge networks have to be integrated in theknowledge management and business processes, as other-wise they are lacking up-to-date input and do not contributewith proper results to the knowledge management process.Based on a world of web technologies, the working toolsof knowledge networks lead directly to the field of socialnetworks and communities.

2.3 Business intelligenceSince the mid 1990s the topic of business intelligence (BI)and its components (e.g. data warehouse) came into fo-cus and is still under constant development and expansion.Business intelligence is a term, which doesn’t seem to havean exact definition in a mathematical sense. This opinionis concluded from different definitions as in [Kemper etal., 2010], [Gluchowski et al., 2008], [Chamoni and Glu-chowski, 2006], [Bauer and Güzel, 2009] or [Gabriel etal., 2009]. A definition which somehow summarizes dif-ferent approaches is the one of Gluchowski, Gabriel andDittmar [Gluchowski et al., 2008]. They define BI as anapproach, concept and application environment for the pur-pose of generating, representing and analyzing data to sup-port decision making on a management level. Furthermorethey define how close certain methods like OLAP, ad-hoc-reporting, balanced scorecard or data warehouse belong toan BI-system. Based on their approach OLAP and MIS/EIS(Management Information System/Enterprise InformationSystem) are the core components of a BI-system [Glu-chowski et al., 2008].

Kemper, Mehanna and Baars [Kemper et al., 2010] ap-ply this approach as well. Additionally the authors define aBI framework consisting of a data layer, a data processinglayer (e.g. in form of a data warehouse), an informationgeneration and distribution layer and a presentation layerin the form of a BI-portal. The information generation anddistribution layer allows the integration of knowledge andinformation management methods.

BI-tools as well as standard text, calculation and presen-tation tools are widely applied (commercial) tools and corecomponents for decision making of managers in today’scompanies. BI applies techniques for merging, calculatingad visualizing product sales numbers. With hindsight to theIKnowE concept proposed in section → 4, these character-istics of BI integrate the commercial or business point ofview into the knowledge flow for evaluating new innova-tion ideas. Based on existing sales numbers it is possibleto deduce top sellers, trends or selling failures and conse-quently change business strategies for new innovations.

2.4 Web and social analyticsDue to the strong development towards e-commerce it is es-sential for managers to consider not only commercial num-

bers, but web analytics results in their strategies and de-cisions as well. Web analytics itself is a technology forgathering data and establishing optimization recommenda-tions or target group analyses with the help of this data[Reese, 2008]. The core components of web analytics areclient based page tagging mechanisms, server based log-ging, multi-variant testing, online surveys, interviews or anobservation how a user interacts with a concrete website[Hassler, 2012]. The gathered data allows the analysis ofweb-traffic, to some extend user origin and behaviour aswell as the effectiveness of marketing on the web or even insocial networks. In this context the analysis of a company’sown web shop or specific product sites are especially in-teresting for a company. There are different commercialor open web analytics solutions available on the market.Additionally commercial sales platforms like ebay [ebay,2012] or Amazon [amazon, 2012] offer sales reports andevaluations for their sellers.

Social analytics is a special form of web analytics fo-cussed on the usage behaviour and content in social me-dia platforms. Today social media is not only used as atool for communication between private persons, but alsoby companies as a marketing tool, form of a news letter ordirect customer contact channel. Hassler [Hassler, 2012]defines two types of social analytics: 1) social web analyt-ics for measuring the social web activities of the company,2) social media analytics for measuring social media con-tent related to the company or the companies products, butgenerated through persons outside the company. The firsttype of analysis e.g. considers the activities on the appliedsocial networks, while the second one evaluates the com-panies or brands impact on the web and in social media[Hassler, 2012].

Before Web 2.0, the term social network analysis wasrelated to the analysis of the social interaction between dif-ferent stakeholders or social circles. "Social Network Anal-ysis (SNA) is the study of social relations among a set ofactors. The key difference between network analysis andother approaches to social science is the focus on relation-ships between actors rather than the attributes of individ-ual actors" [Mika, 2007]. Coulon for example executed astudy about social network analysis in innovation research[Coulon, 2005]. In the context of online social networksor social media not only the personal interactions, but alsothe content are part of the analysis process, thus the termof social network data analytics (e.g. [Aggarwal, 2011] isalso common. The large amount of content in social net-works, allows the application of social analytics in combi-nation with linguistic techniques like text mining or senti-ment analysis. The combination of these techniques visu-alizes opinions about a certain product in a condensed formand gives indicators about the acceptance of the product onthe market. On a larger scale the image of the company canbe made visible as well.

Aggarwal [Aggarwal, 2011] defines two "primary kindsof data which are often analyzed in the context of social net-works:" 1) "Linkage-based and Structural Analysis" and 2)"Adding Content-based Analysis". The first form of anal-ysis detects and analyzes the structure including the im-portant nodes and links of a social network. The secondmethod focuses on the analyses of the user content and theassociated media files. According to the author, researchareas like statistical analysis of social networks, data andtext mining, community detection, link prediction or socialinfluence analysis are part of social network data analytics

[Aggarwal, 2011].Scott [Scott, 2011] describes the potential of social ana-

lytics and especially data mining techniques advantageousfor "large-scale data sets of the kind that have not gener-ally been possible to investigate using conventional socialnetwork analytic techniques" before. Based on these defi-nitions the line between social web, data and media analyt-ics and social network analysis (in online contexts) blurs,therefore all of the aforementioned variants will be calledsocial analytics in the following sections. The results fromweb and social analytics will be integrated into the analy-itcal flow of the IKnowE concept proposed in section → 4to support funded decision making about new innovationideas. Especially interesting in this context is the customerfeedback about existing products and their lacks or negativepoints.

3 Literature study of current interactionsbetween the different disciplines

This section gives an introduction of literature and projectstudies, where two or more of the above mentioned dis-ciplines are interconnected, thus creating the context ofthe concept proposed in the next section. Stegebauer andHäußling [Stegebauer and Häußling, 2009] define a knowl-edge based innovation management, which supports the in-novation process as well as the knowledge exchange onthe different levels between individuals, groups or organ-isations.

The research of Gentsch [Gentsch, 2001] moves into asimilar direction. He analyses the interaction of innova-tion and knowledge management, with a special focus onknowledge discovery from text for enhancing innovationprocesses.

Völker, Sauer and Simon [Völker et al., 2007] focus theirwork on the interaction aspects of innovation and knowl-edge management as well. They propose a matrix orga-nization, where the innovation management is the verticaldimension, while knowledge management is the horizon-tal one. Next to different knowledge oriented models, theyalso consider the organizational requirements of a companyin the context of innovation.

Zhou and Uhlaner [Zhou and Uhlaner, 2009] executed astudy, where they evaluated the impact of enquiring exter-nal knowledge for the innovation behaviour of a firm. Theyconclude that "knowledge management can develop ab-sorptive capacity of a firm, which consequently contributesto innovation orientation and in turn, innovation behaviourof a firm." [Zhou and Uhlaner, 2009].

Another case study has been executed by Costa et.al.[Costa et al., 2009] in a company in Brazil, where theyanalyzed the benefit of web based social networks for fur-thering knowledge management.

Müller-Prothmann established a work, where he anal-ysed the advantages of knowledge management, especiallyknowledge communication and social networks for inno-vation. His core focus is about "social network analysis inresearch and development" [Müller-Prothmann, 2006].

A project bridging the gap between innovation and col-laborative networks at the MIT is called COIN (Collabora-tive Innovation Network). The core idea of COIN is set-ting up different collaborative networks, which contributetowards each other and further innovation and the interac-tion of knowledge workers in different organizations. Thestructure of COIN contains a core network, which is sur-rounded by a wider group, which is again surrounded by

an extended group in a concentric way [COIN, 2012].Another project funded by the German Ministry for Re-

search and Development is called Smarte Innovation [SInn,2012]. Smarte Innovation focuses on bringing the rele-vant stakeholders like experts of research institutions, econ-omy and social life together to further product innovation[SInn, 2010]. In this context the project aims to improve allsteps of the product lifecycle. The project brings togetherthe aspects of systems, people, anticipation, resources andtechnology and constantly feeds innovation impulses andknowledge in a counterflow towards the singular steps ofthe product lifecycle and especially the product develop-ment [SInn, 2010].

An Italian project called TasLab (Trentino as a Lab) ispart of the European Network of Living Labs [Liv. Lab,2012], [TasLab, 2012]. The core idea of TasLab is to bringindustry, research and user together to further collaborativeinnovation in a living lab environment. The TasLab por-tal "provides knowledge management facilities (e.g., com-petence matching) for collaborative innovation" [Shvaikoet al., 2010] thus integrating knowledge and competencemanagement features into the innovation management pro-cess. A current field of application is e.g. in eHealth.

MUSING (MUlti-industry, Semantic-based next Gener-ation business INtelliGence) was an EU-project runningfrom 2006 to 2010 bridging the gap between BI and Se-mantic Web. The focus of MUSING was "to integrate Se-mantic Web and Human Language technologies and com-bine declarative rule-based methods and statistical ap-proaches for enhancing the knowledge acquisition and rea-soning in BI applications towards industries with a deepsocio-economic impact" [MUSING, 2010]. From a com-pany’s point of view the innovative improvement of thethree areas: finances, risk management and international-ization were the main goals of MUSING [MUSING, 2010].

The examples from literature and practical projectsabove allow for two conclusions: 1) there are many dif-ferent approaches towards integrating one or more of thedisciplines of innovation management, innovation commu-nities, knowledge management, knowledge networks andBI with each other and 2) while there are literature andcommercial methods for web analytics available, runningscientific projects regarding this aspect as well as social an-alytics in online contexts are not yet as prominent.

4 Concept approach4.1 Analysis of the product innovation context in

today’s companiesFigure 1, introduced in section → 1, depicts a mind map ofthe product innovation context, environment and dependen-cies in today’s companies. Product innovation is influencedby different strategic and operational factors. The innova-tion process for a new product or product generation baseson the classical product lifecycle consisting of design, de-velopment, production, usage and recycling. Depending onthe usage of modern technologies the innovation processmay either be only internal and apply classical innovationmethods, apply concepts of open innovation and the usageof social networks to gather customer input or a combina-tion of both approaches. What kind of approach is cho-sen of a company often depends on the enterprise strategy.Then criteria like Make or Buy?, Do we want to be first-to-market? or What is the budget for research, develop-ment and innovation? play a central role and pave the wayfor the innovation process [Müller-Prothmann and Dörr,

2006], [Völker et al., 2007]. The strategic decisions againdepend on the sales and analytical numbers gathered fromthe previous product generation. The customer feedbackis a deciding factor. Next to the strategic stakeholders, theknowledge experts, like product designers and engineershave to be involved into the process. If the company hasgot an already established innovation community (→ 2.1)or knowledge network (→ 2.2), for example between inter-nal and external experts, research centers, or customers, anexchange or feedback about new product ideas gives addi-tional security for being on the right path.

As surmised, the creation of a new product innovationis no longer an isolated process, but depends on differ-ent factors, knowledge and stakeholders. The followingInKnowE concept introduces a knowledge based way, tointegrate these factors into one framework and support de-cision making for the most promising innovation idea.

4.2 InKnowE concept

This section illustrates the InKnowE concept approach forimproving innovation processes in Enterprise 2.0 with thesupport of innovation communities (→ 2.1), knowledgemanagement and knowledge networks (→ 2.2), BI fea-tures (→ 2.3) as well as web and social analytic results(→ 2.4). The target group for the framework are the headsof research departments during the innovation process aswell as financial managers deciding about new innovationideas. The concept is depicted in figure 2 as a modular in-teraction between the different disciplines. There are thefour flows of innovation, knowledge, business and ana-lytics, leading towards a founded decision making basedon knowledge and semantic technologies.

Business flow: The starting point of the innovation pro-cess is the problem analysis of given products or the ob-jective to create an entirely new product. This most of-ten origins from non satisfactory sales numbers, the wishto change or extend the companies product portfolio or achange in the innovation strategy or budget. The relevantnumbers and key indicators are stored in a BI solution ora general database and are the starting point for the in-novation process. During the acquisition of new knowl-edge, these indicators or reports are stored as facts insidethe knowledge base of the knowledge flow.

Knowledge flow: In addition to the business flow, aknowledge flow will be established. A knowledge base isthe core element of this flow. New or additional knowl-edge for the knowledge base origins from the BI flow asdetailed above, from knowledge acquisition sources, in-novation communities or knowledge networks, especiallyknowledge experts like product designers and developersinside the company. If the products, which are sold bythe company, require certain minimal standards, these arestored inside the knowledge base as well. The collabora-tion between internal knowledge networks and customersnot only influences the knowledge base, but also the cre-ation of new ideas during the innovation process. Depend-ing on the type of company, they may move more or lessin the direction of open innovation. The type of technologyfor the knowledge base has not yet been defined.

Analytics flow: Next to the knowledge flow, a parallelflow focusses on evidence from web and social analytics.The results are regularly stored in what we declare an an-alytics database. The gravitation of the technology (BI orknowledge management techniques) for this database de-pends on the type of acquired data and is not specified yet.

If the majority of data are numerical, the solution will tendtowards a BI tool or a SQL database. Should there be re-quirements form data or text mining features, the solutionwill most probably be a knowledge or semantic represen-tation method. Another alternative could be the built up ofseveral different databases and the merging of data flowsinside the decision engine.

Independent from the underlying database, the resultswill be provided towards the decision module. For the dataflows from knowledge base and analytics database an ex-change format like xml or even a domain ontology will beapplied.

Innovation flow and decision module: Inside the in-novation flow layer, the innovation process starts with theaforementioned problem analysis, which receives initial in-put from business, knowledge and analytical flow, to sum-marize the product aspects, which have to be improved ornewly invented. Based on this initial problem survey therelevant stakeholders start the creation of ideas, possiblyin cooperation with an innovation community. After thisphase is finished, there is a survey and rough filtering ofideas, which are not realizable or which don’t appear tobe profitable. The ideas, where there is a chance for suc-cess on the market are formalized and handed over to thedecision engine. The decision engine of the solution isstill somehow a black box, as there is no final decisionyet, which knowledge representation and inference methodwill be applied. A possible approach would be the applica-tion of semantic technologies. Maier, Hädrich and Peinl[Maier et al., 2005] define a bridge between knowledgework and semantic web technologies: "Knowledge Workrequires content- and communication-oriented modellingtechniques that define meta-data and provide taxonomies,ontologies, user models, communication diagrams, knowl-edge maps and diagrams that show what objects, persons,instruments, roles, communities, rules and outcomes are in-volved in the main knowledge-related activities." This kindof interconnection allows for semantic technologies as partof the decision engine, as they also provide inference pos-sibilities. Overall, the following alternatives are currentlyevaluated for application:

1. the creation of a domain ontology for the decisionframe with a rule based inference engine on top of it,

2. the application of business rules or classical rules inaddition to an inference engine,

3. the application of case based reasoning (CBR); CBRwould have the advantage of formalizing the inven-tions as a case and adding the relevant facts fromknowledge base and analytics database. This way ause case would already be there for the market launchof the possible new project.

4. Another possibility would be a combination of seman-tic methods and a numerical or statistical analysis; thisapproach would require a form of indicator, whichwould give a combined result and possible recommen-dation or refusal for the given invention.

The mentioned possibilities document a problem caused bythe different types of data brought together for the decisionprocess. A final decision about which singular or combinedmethods to apply for the aimed decision engine is only pos-sible after deciding which tools and database technologiesare applied for the other modules. Overall the aspirationis to create an innovation decision framework, containingdifferent modules supporting the decision process towards

Figure 2: InKnowE Concept

one or more innovation idea(s), leading to realization andmarket launch of the product.

Next to the inference and reasoning methods for the deci-sion engine, a visualization, e.g. in the form of dashboardsor condensed data views is aspired. The key aspect of thevisualisation is to create a solution, where the user may see,if the given innovation idea has potential for success on themarket or if there are factors and analysis results contra-dicting the new idea.

5 Conclusion & OutlookThe given paper proposes a concept called InKnowE forintegrating different technologies like innovation manage-ment, innovation communities, knowledge management in-cluding knowledge networks, business intelligence, weband social analytics into one solution for enhancing the in-novation process, especially the decision making for an in-vention, which should be realised and brought to market.

At the beginning of the paper the different disciplines areintroduced, before a literature and project survey of simi-lar approaches is given. After introducing the InKnowEconcept, the question is: How to proceed from concept torealisation? There are additional open questions regardingthe necessary technologies for databases, analytics tools aswell as the decision engine and its visualization. Thereforethe next step will be an intensive market research for toolsand methods available for the different disciplines and in-terfaces to answer how to combine them. An essential de-

cision to make is, which knowledge representation and in-ference methods are to be used for the decision engine andhow to visualize the results not only for product developers,but also for management levels.

References[amazon, 2012] amazon Sales reports tools. http://g-ecx.images-amazon.com/images/G/03/Webinar/20110224FBAWebinar_Alle_Bestellungen_Bericht.pdf, last checked,26.08.2012.

[Aggarwal, 2011] Charu C. Aggarwal. Social NetworkData Analytics. Springer Science+Business Media,LLC, Boston, 2011.

[Back et al., 2007] Andrea Back, Ellen Enkel and Georgvon Krogh (eds.). Knowledge Networks for BusinessGrowth. Springer, Berlin, 2007.

[Back et al., 2005] Andrea Back, Georg von Krogh, An-dreas Seufert, Ellen Enkel (eds.). Putting knowledgenetworks into action. Methodology, Development, Main-tenance. Springer, Berlin, 2005.

[Bauer and Güzel, 2009] Andreas Bauer and HolgerGünzel Data-Warehouse-Systeme. dpunkt, Heidelberg,3rd. ed., 2009.

[Bergmann and Daub, 2008] Gustav Bergmann and Jür-gen Daub Systemisches Innovations- und Kompetenz-management. Gabler Verlag, Wiesbaden, 2nd. ed., 2008.

[Bretschneider et al., 2012] Ulricht Bretschneider, BalajiRajogopalan, Jan Marco Leimeister Idea Generationin Virtual Communities for Innovation: The Influenceof Participants’ Motivation on Idea Quality. In: 45thHawaii International Conference on System Science(HICSS) 2012, pp. 3467-3479, 2012.

[Chamoni and Gluchowski, 2006] Peter Chamoni,Peter Gluchowski Analystische Informationssys-teme. Business-Intelligence-Technologien und -Anwendungen. Springer, Berlin, 2006.

[COIN, 2012] COIN Collaborative Innovation Net-works. http://www.ickn.org/innovation.html, last checked 26.08.2012, 2012.

[Costa et al., 2009] Ricardo A. Costa, Edeilson M. Silva,Mario G. Neto, Diego B. Delgado, Rafael A. Ribeiroand Silvio R.L. Meira Social Knowledge Managementin Practice: A Case Study . Springer LNCS - Collabo-ration Researchers’ International Workshop on Group-ware 2009, Vol. 5784, pp. 94-109, Berlin, 2009.

[Coulon, 2005] Fabrice Coulon The use of Social Net-work Analysis in Innovation Research. A literature re-view. Lund University, Division of Innovation - LTH,Sweden, 2005.

[Dornhöfer et al., 2012] Mareike Dornhöfer, AlexanderHolland, Madjid Fathi. Knowledge Based InnovationDetection and Control Framework to Foster ScientificResearch Projects in Material Science. Submitted to:IEEE International Conference on Systems, Man, andCybernetics, Korea, status in review, 2012.

[ebay, 2012] ebay Sales reports tools. http://pages.ebay.de/sell/tools/analyse/index.html, last checked, 26.08.2012.

[Liv. Lab, 2012] European Network of Living Labs.http://www.openlivinglabs.eu/, lastchecked, 26.08.2012.

[Füller et al., 2004] Johann Füller, Michael Bartl, HolgerErnst, Hand Mühlbacher Community Based Innovation.A Method to Utilize the Innovative Potential of OnlineCommunities. In: 37th Hawaii International Conferenceon System Science (HICSS) 2004, 10 pp., 2004.

[Gabriel et al., 2009] Roland Gabriel, Peter Gluchowskiand Alexander Pastwa Data Warehouse und Data Min-ing. W3L Verlag, Witten, 2009.

[Gentsch, 2001] Peter Gentsch. Wissenserwerb in Inno-vationsprozessen. Methoden und Fallbeispiele für dieinformationstechnologische Unterstützung. DeutscherUniversitäts-Verlag, Wiesbaden, 2001.

[Gluchowski et al., 2008] Peter Gluchowski, RolandGabriel and Carsten Dittmar Managment-Support-Systeme und Business Intelligence. Springer, Berlin,2nd. ed., 2008.

[Hassler, 2012] Marco Hassler Web Analytics. Metrikenauswerten, Besucherverhalten verstehen, Website opti-mieren. mitp, Heidelberg, 3rd. ed., 2012.

[Howaldt et al., 2011] Jürgen Howaldt, Ralf Kopp andEmanuel Beerheide (eds.). Innovationsmanagement 2.0:Handlungsorientierte Einführung und praxisbasierteImpulse. Gabler, Wiesbaden, 2011.

[Kemper et al., 2010] Hans-Georg Kemper, WalidMehanna and Henning Baars Business Intelligence -

Grundlagen und praktische Anwendungen. Vieweg +Teubner, Wiesbaden, 3rd. ed., 2010.

[Mika, 2007] Peter Mika. Semantic Web and Beyond -Social networks and the semantic web. Springer US,Bosten:MA, 2007.

[Maier et al., 2005] Ronald Maier, Thomas Hädrich andRené Peinl. Enterprise Knowledge Infrastructures.Springer, Berlin, 2005.

[Müller-Prothmann, 2006] Tobias Müller-ProthmannLeveraging knowledge communication for innovation.Framework, methods, and applications of social net-work analysis in research and development.. PeterLang, Frankfurt a.M., 2006.

[Müller-Prothmann and Dörr, 2006] Tobias Müller-Prothmann and Nora Dörr Innovationsmanagement..Hanser Verlag, München, 2nd edition, 2011.

[Mulvenna et al., 2010] Maurice Mulvenna, BrigittaBergvall-Kareborn, Jonathan Wallace, Brendan Gal-braith, Suzanne Martin Living labs as engagementmodels for innovation. In: Paul Cunningham andMiriam Cunningham (eds.), eChallenges 2010 Con-ference Proceedings, IIMC International InformationManagement Corporation, pp. 1-10, 2010.

[MUSING, 2010] MUSING MUlti-industry, Semantic-based next Generation business INtelliGence. 2006-2010, ftp://ftp.cordis.europa.eu/pub/ist/docs/kct/musing-presentation_en.pdf, last checked 26.08.2012.

[Nerdinger et al., 2010] Friedemann W. Nerdinger, PeterWilke, Stefan Stracke and Reinhard Röhrig Innovationund Beteiligung in der betrieblichen Praxis. Gabler Ver-lag, Wiesbaden, 2011.

[Reese, 2008] Frank Reese. Web Analytics - damit ausTraffic Umsatz wird. Die besten Tools und Strategien.BusinessVillage GmbH, Göttingen, 2008.

[Reichwald and Piller, 2007] Ralf Reichwald, FrankPiller. Interaktive Wertschöpfung. Open Innovation, In-dividualisierung und neue Formen der Arbeitsteilung..Gabler Verlag, Wiesbaden, 2009.

[Shvaiko et al., 2010] Pavel Shvaiko, Luca Mion, FabianoDalpiaz and Giuseppe Angelini. The TasLab portal forcollaborative innovation . ICE 2010, 2010.

[SInn, 2012] SInn. Smarte Innovation. http://www.smarte-innovation.de/, last checked26.08.2012, 2012.

[SInn, 2010] SInn Team (Carola Feller and SimoneHofer). Smarte Innovation News - Produktlebenszyklus-und wertschöpfungsnetz-übergreifende Innovation-sstrategien. http://www.smarte-innovation.de/downloads/SInn-Nachrichten-06.pdf,last checked 26.08.2012, 2010.

[Scott, 2011] John Scott. Social network analysis: devel-opments, advances, and prospects. In: Social NetworkAnalysis and Mining, Vol. 1, pp. 21-26.

[Stegebauer and Häußling, 2009] Christian Stegebauer,Roger Häußling (eds) Handbuch Netzwerkforschung.VS Verlag für Sozialwissenschaften, Wiesbaden, 2009.

[TasLab, 2012] TasLab. Trentino as a Lab. http://www.taslab.eu, last checked 26.08.2012, 2012.

[Vahs and Burmester, 2005] Dietmar Vahs and RalfBurmester. Innovationsmanagement. Von der Produk-tidee zur erfolgreichen Vermarktung. Schäffer-Poeschel,Stuttgart, 3rd ed., 2005.

[Völker et al., 2007] Rainer Völker, Sigrid Sauer, MonikaSimon Wissensmanagement im Innovationsprozess.Physica Verlag, Heidelberg, 2007.

[Wentz, 2008] Rolf-Christian Wentz. Die Innovationsmas-chine. Wie die Weltbesten Unternehmen Innovationenmanagen. Springer, Berlin, 2008.

[Zhou and Uhlaner, 2009] Haibo Zhou and Lorraine M.Uhlaner Knowledge Management as a Strategic Tool toFoster Innovativeness of Smes. ERIM Report, ErasmusUniversity, Rotterdam, 2009.

A Document-centered Authoring Approach for Ontology Engineering

Jochen Reutelshoefer, Joachim Baumeisterdenkbares GmbH

Friedrich-Bergius-Ring 15Wurzburg, Germany

<firstname>.<lastname>@denkbares.com

Georg Fette, Frank PuppeInstitute of Computer Science

University of WurzburgWurzburg, Germany

<lastname>@informatik.uni-wuerzburg.de

AbstractMost ontology development tools employ graph-ical user interfaces as interaction paradigm be-tween system and developer. While being ef-ficient for expert users, this is not always thebest form of interaction when novice users aresupposed to contribute to the development pro-cess. In this paper, we propose the use of thedocument-centered authoring paradigm for thishuman-computer interaction task where the on-tology is modified by editing source documentsusing suitable markup languages. This alter-native interaction paradigm shows several ad-vantages for collaborative development involvingparticipants of diverse expertise. We discuss theadvantages and challenges of this approach andderive the requirements for a corresponding au-thoring environment. Further, we present a pro-totype implementation of such a tool and reportabout its use in case studies.

1 IntroductionThe development of ontologies is a major challenge withinthe implementation of the semantic web. Today, expressiveontology representation languages including powerful rea-soning mechanisms are available. While these foundationaltopics can be addressed in a strictly formal way, the actualprocess of the manual creation of an ontology, capturing acomputer interpretable representation of a specific subjectdomain, strongly implies human factors. This knowledgeacquisition effort implies a process of human-computer in-teraction (HCI) that has been found to be nontrivial. Nev-ertheless, it appears to be indispensable for the wide-scaleemployment of semantic technologies in various applica-tions. Especially, the active involvement of domain spe-cialists from the respective subject domain, usually beingnon ontology engineering experts, within the entire ontol-ogy development life-cycle is strongly desired [1]. How-ever, enabling non expert ontology engineers to contributedirectly to an ontology using some ontology design tool inpractice shows to be challenging. Currently, most ontol-ogy engineering tools carry out this human-computer in-teraction task by providing graphical user interfaces [2; 3;4]. However, foundational literature of HCI provides vari-ous kinds of different user interface paradigms [5] availablefor different HCI tasks. The use of graphical user inter-faces, while being very prominent, is not necessarily thebest choice for any combination of task and user profile.It is known to have some disadvantages as it often forces

complex interactions to enter simple information and con-strains the kind of information that can be entered in manyways [6]. This makes contributions difficult especially forusers with little experiences with the tool. In this paper,we propose to employ an alternative authoring paradigmfor this HCI task which provides several advantages forthe development of ontologies. In the so called document-centered authoring approach the user interacts with the on-tology indirectly by editing documents, basically by usingsome common text editing interface. Segments complyingto some predefined syntax are automatically processed andadded to the formal model of the ontology while the useris provided with visual feedback accordingly. We claimthat this editing paradigm shows some considerable bene-fits and can be a beneficial alternative to using graphicaluser interfaces in many cases.

The contribution of this paper is an introduction ofdocument-centered development of ontologies including adiscussion of the benefits but also the challenges of this ap-proach. We derive a requirements specification for such anauthoring environment and present an implementation of acorresponding tool. Further, we report about our experi-ences in using that system in projects to develop ontologiesin collaboration with domain specialists. The rest of thepaper is organized as follows:

In Section 2, we explain the document-centered ontologyauthoring approach in more detail. An implementation ofa corresponding tool is presented in Section 3 by the useof examples. To demonstrate its applicability in practice,we report about real world case-studies in Section 4. InSection 5, we compare our approach to related work. Weconclude with a summary and outlook in Section 6.

2 Document-centered Ontology AuthoringDocument-centered authoring is an alternative authoringparadigm to graphical user interfaces providing differentbenefits and challenges for the development of complexdigital artifacts. There, the authoring environment providesaccess to a set of documents, that can be modified and ex-tended by the users in an unconstrained manner, employingsome basic text-editing interface. To actually create com-ponents of the ontology, the user has to employ a formalsyntax provided by the authoring environment. After eachdocument modification, statements complying to this syn-tax are then translated to the ontology repository as shownin Figure 1 and visual feedback is given to the user. Fig-ure 2 shows an example document from the pizza domain,where beside informal content the Manchester Syntax [7]is used to define an OWL class expression. The process ofcompiling the documents to the ontology repository decou-

Figure 1: Structure of the document-centered authoring paradigm.

ples the user from the machine readable representation ofthe ontology.

That way, the documents can be structured according tothe users’ needs and are forming a kind of human-orientedrepresentation layer of the knowledge. This freedom ofstructuring provides a number of benefits: It allows forthe simple inclusion of informal support knowledge, e.g.,documentation, comments, and figures, can be interwovenwith the formalizable syntax statements at any place and inany style. Due to the declarative nature of the most com-mon ontology representation languages, where the atomicparts (e.g., distinct axioms in OWL) are order indepen-dent of each other, the ordering of the statements can beleft up to the user. Also the partition of the documents(including their names) can freely be chosen. Documentscan freely be interlinked with others making interrelationsof content parts explicit. The possibility to structure thedomain knowledge allows for natural modularization intosub-domains. All these means of content organization canbe exploited to give the document base a structure that isreadable and memorable for the user considering his men-tal model of the domain. Studies in software engineeringhave shown that while working on programming code, theamount of time spent on reading the source compared tothe amount of time spent on actually editing is higher thanten to one [8]. We believe, that in ontology engineering,where also complex digital artifacts are developed, a sim-ilar ratio applies. Consequently, the aspect of readabilityplays a very decisive role with respect to development pro-ductivity. Considering the freedom of structuring discussedabove, the document-centered authoring approach providesexcellent possibilities to improve readability of the gath-ered knowledge.

Editing of ontology statements using formal syntax inmany cases is a challenging task for the user, implyingconsiderable complexity. However, this complexity is toa large extent rooted in the expressiveness of the targetrepresentation language (c. f. general purpose program-ming languages). It is intrinsic and independent of theauthoring paradigm, i.e., also contained in the GUI-basedapproach. For (subsets of) target representations of lowerexpressiveness also simple syntax can be introduced, alsocalled domain specific language [9] (DSL), which is thenalso simple to read and edit. DSLs always should be de-signed to have the lowest expressiveness possible to re-duce error rates and maximize productivity. There are alsostudies indicating that textual input can lead to higher pro-ductivity compared to GUI-based approaches if rich visualfeedback is provided [10]. In the following, we call a syn-tax (DSL), together with a mapping instruction which de-termines its translation to the target ontology representa-tion, a markup language (or in short markup). Hence, thespecification of a set of appropriate markup languages is afoundational task when designing a document-centered on-tology authoring environment. There should be expressive

markup languages allowing advanced users to create com-ponents of high complexity, but also markups of low com-plexity allowing novice users to create very simple com-ponents without being too error-prone. We claim this ap-proach has high potential to provide an authoring environ-ment of high usability, if a suitable structuring of the con-tent can be found considering the participating users andthe domain. While we regard the freedom of structuringto be the strongest advantage of document-centered author-ing, there are several other notable benefits when comparedto the strict GUI-based approach:

Low Barriers for Basic Contributions: The level ofthe technical skills of the participating users often is ratherdiverse in ontology engineering projects and especially thesupport of users with low expertise of this task needs con-sideration. Editing text documents is a rather simple edit-ing paradigm, when compared with some complex menu-and form-based user interfaces existing and already part ofthe daily work for a wide range of professionals in variousdomains. Therefore, it allows for basic contributions (e.g.,adding informal descriptions, proof reading) without anytraining of a new tool. Being used to these simple contribu-tions without difficulty, contributors often feel encouragedto explore more demanding activities.

Example-based Authoring: The actual ontology canonly be modified by using the provided markup languagein a proper way. The idea of example-based learning pro-poses, that initially markup statements are inserted into thedocument base (either modeling initial parts of the ontol-ogy or as some toy example). If a user can comprehendthe meaning of these statements, he can easily adopt it forhimself to express other parts of the ontology using simplecopy/paste&modify. Often only the entity names need tobe exchanged to create new valid ontology relations.

Incremental Formalization In the GUI-based ap-proach, usually the formalization of a new componentneeds to be performed by the user completely in one action.The process of incremental formalization on the other handaims to break up the formalization task into multiple steps.It starts with the insertion of informal content describingthe domain, such as text and figures, in a completely un-constrained way. This content is either created by domainspecialists or adopted from documents often already exist-ing in the domain context. At first, it serves as a startupfor the formalization process and later it forms the docu-mentation and context of the ontology components. In thenext step those content parts, which need to be formalizedto form the intended ontology, are identified. After that, atentative formalization is made, that is, the selected contentis transformed towards the markup language. This initial,potentially erroneous or incomplete, formalization can thenbe refined gradually. These distinct steps require different

Figure 2: An example document about Pizza Margherita in view (a) and edit mode (b).

degrees of expertise in the domain, in ontology engineer-ing, and usage of the employed acquisition tool. Thesedifferent kinds of competencies often are distributed het-erogeneously. Therefore, a decomposition of the formal-ization task into distinct steps, possibly involving differentpersons on different stages, simplifies the accomplishmentof the formalization task. Hence, the incremental formal-ization workflow helps different participants contribute ac-cording to their respective capabilities and expertise.

Version Control and Quality Management The docu-ments managed by the authoring environment can easily beput under version control. This not only provides backupsand undo-functionality, but also allows the straight forwardapplication of the quality management practice ContinuousIntegration1 (CI), that has been established in the contextof agile software engineering [8]. Hence, by employing aCI-based development workflow it is possible to continu-ously guarantee quality and transparency. Different kindsof automated tests can be executed regularly after modifi-cations, such as competency tests [11], consistency checks,or profiles checks2.

Challenges:Beside advantages the document-centered authoring ap-proach also bears some additional challenges when com-pared to the GUI-based one:

1http://martinfowler.com/articles/continuousIntegration.html2http://www.w3.org/TR/owl2-test/

• Authoring Assistance: Considering this most obvi-ous issue, one can take advantage of experiences andtechniques from software engineering which can beadopted to a far extent. It is very important to givethe user feedback explaining whether and what ontol-ogy component had been created from some markupstatement.

• Navigation and Search: While the freedom of struc-turing is beneficial for creating and maintaining acomprehensive structure of the content, it also im-poses the problem of finding the position of some con-tent piece or the appropriate location for some newcontent element. A system should provide full-textsearch as well as semantic search (based on the ontol-ogy entities and their interrelations) and efficient nav-igation mechanisms.

• Refactoring: It might happen that the structure of thecurrent document base is considered not to be the mostbeneficial one at some point. Then restructuring of thecontent, which retains the compiled version of the on-tology unaffected, is desired. A support for this re-structuring task (beyond manual cut&paste) would bedesirable.

• Redundancy Detection: In a document-centered on-tology authoring environment, that is providing amaximum in flexibility of structuring, any ontologyentity can be inserted at any location of any document.Therefore, it might occur more easily than in GUI-based tools that the same relations are defined/asserted

multiple times (possibly at different locations by dif-ferent users). These types of redundancies should bedetected by the system and pointed out to the users.

Requirements:Considering this discussion of advantages and challengeswe can derive the requirements for a document-centeredauthoring environment for ontology engineering: Mostimportant is the simple access to the documents and acommon way to create informal content (e.g., text, ta-bles, figures) by creating and modifying documents. Fur-ther, a set of convenient and well readable markup lan-guages, geared to the target ontology representation lan-guage, need to be supported including simple ones to beusable by non-expert users as well as expressive ones.For each markup, explained examples should be given atstartup. The challenges of document-centered ontology au-thoring mentioned above should be addressed and a test-ing framework for continuous integration, connected to thedocument versioning system, should be included. Furthercommon features necessary for effective ontology devel-opment, for instance considering visualization and debug-ging, need to be considered. These features can be im-plemented in a similar fashion as done by known exist-ing GUI-based tools. However, the visualization and de-bugging views have to provide links to the correspondingdocument locations to allow for quick modifications dur-ing browsing or debugging sessions. Therefore, the envi-ronment requires a bidirectional mapping, to be capable toidentify the text source which is the origin of a componentin the ontology repository.

3 A Document-centered OntologyAuthoring Tool

We have implemented an extension for the wiki KnowWE3

that forms a document-based ontology development envi-ronment as proposed in Section 2. KnowWE supports twodifferent modes for ontology engineering, either in RDFSor OWL. In either mode different markups are available toform ontology statements, including for example the TurtleSyntax4 for RDF and the Manchester Syntax for OWL [7].The design rationale of KnowWE follows the idea of low-ering the barriers for participation of low experienced usersvia document-centered authoring. Considering the user in-terface, the system is designed to be fully backwards com-patible to a normal wiki engine if no advanced ontologyengineering features (markups) are used. Hence, noviceparticipants use a standard wiki interface to access a set ofdocuments, possibly only browsing the content or addinginformal comments. They can step-wise commit to moreadvanced contribution activities when they feel capable,without changing to another tool or authoring paradigm.Also the design and user assistance of markups is intendedto primarily support non-expert users.

Entity Declaration and Compilation: Within the spec-ification of OWL 2 also the declaration of entities within anontology document has been addressed to detect spellingerrors within identifiers. This issue is even more rele-vant in the context of an ontology authoring environmentwhere the ontology expressions are intended to be fre-quently edited by human users employing a text editor.

3http://www.knowwe.de4http://www.w3.org/TR/turtle/

Therefore, KnowWE also requires new entities to be ex-plicitly declared. All referenced identifiers of a statementare automatically validated against the set of declared enti-ties and error messages in case of undeclared use of iden-tifiers (including misspellings) are generated. Error mes-sages are visualized by red underlining and the correspond-ing statement is not translated to the ontology repository. InKnowWE all declarations are valid all over the wiki. Anystatement can be defined at any position within the wiki,providing full flexibility to the user. A sophisticated com-pilation algorithm [12] after each modification calculateswhich statements have become valid or invalid consideringthe set of declared entities. Accordingly, those statementsare inserted into or removed from the ontology repositoryrespectively and corresponding error messages are gener-ated. The incremental nature of this algorithm, always con-sidering only the current document modification, guaran-tees instant response independently of the overall size ofthe wiki/ontology.

Basic Markups: The example page shown in Figure 3shows different basic markups provided by the systemKnowWE to create ontologies in RDFS. Declarations ofnew entities are highlighted in purple while references toexisting ones are rendered in green. Predefined vocabulary(RDFS/OWL) is rendered in bold black font. Individualscan be introduced by using the ’def ’ keyword (1), whileroles are defined using the keyword ’Class’ (2). Object-properties can be defined in a similar way and optionallyclass references as domain and range can be given in brack-ets (3). Further, entities can also be defined as triples(4). Simple triple relations (5) can also be asserted by’>’. Basic RDFS/OWL vocabulary is available from thebeginning. For more advanced users a version of the tur-tle markup is available to create more complex RDF ex-pressions introduced by ’ttl’. For the quick and simpledefinition of explicit class hierarchies, KnowWE providesa dash-tree markup (7). There each class of a tree nodeis considered as rdfs:subclassOf its dash-tree father class,i.e., its predecessor having one dash less in its prefix.

User Assistance: A click on any term name opens acontext menu providing additional information and options(8). Always a link to an overview page, describing the en-tity’s use within the overall document base, is given. Forreferences a direct link to the declaration of the entity isprovided. If a reference cannot be resolved an error mes-sage is rendered, indicated by red underline of the referenceterm. In this case, the context menu will present quick fixpropositions based on edit distances to declared entities asassistance in case of typing errors. To simplify editing, thecontext menu also provides an edit button that allows toedit the corresponding statement inline in the context ofthe page view.

Full RDFS does not give limitations in meta-modeling,i.e., intermixing properties, class and individual names.However, KnowWE aims to support less experienced de-velopers in creating simple ontologies. In this context,this kind of inter-mixture often is not an intended act ofmeta-modeling but basically a modeling error, sometimesleading to undesired undecidability. Therefore, the sys-tem shows warnings in different cases of ’suspicious’ en-tity inter-mixtures. For example, KnowWE asserts thatentities defined as range and domain, shown in Figure 3markup example (3), are actually declared as classes and

Figure 3: An example wiki page showing several markups to define ontology components in view (a) and edit mode (b).

shows a warning otherwise. Additionally, the employmentof classes or individuals in the predicate positions of tripleswill be marked. Another assistance mechanism is consid-ering the use of rdfs:domain and rdfs:range. The semanticsof these terms often appears to be unintuitive to novicesin RDFS-modeling, which tend to consider them as con-straints to the use of the respective property. KnowWEprovides a mode that renders warnings, if assertions leadto a class membership only being derived by the RDFS in-ference rules for domain and range (without being assert-ed/derived otherwise). Hence, the use of domain/range asconstraints for the use of properties is supported by the au-thoring system (without affecting the resulting ontology/in-ference).

Manchester Syntax: To make available the full expres-siveness of OWL, we have implemented the ManchesterSyntax5. The Manchester Syntax has been designed for im-proved readability of OWL statements [7] but also providesa convenient way for editing. The syntax is frame-based,that is each ontology entity is defined as a frame describ-ing the various characteristics of the entity. The keywordsfor the entity definitions (e.g., ’Class:’, ’Individual:’) in-dicates the use of a Manchester Syntax statement to the

5http://www.w3.org/TR/owl2-manchester-syntax/

KnowWE system. In KnowWE the slots of an entity framecan be distributed to different locations if convenient. Themain frame declares the entity and in other parts of the wikiadditional characteristics can be added to the entity usingthe ’EXTEND’ keyword followed by the term name of theentity. In Figure 2 an example wiki page from the pizzadomain6 is shown describing Pizza Margherita in the sys-tem KnowWE. On the lower part, the Manchester Syntaxis used to add the definition of a corresponding class to theontology (a). The right hand part (b) shows the correspond-ing wiki source text of the page.

Inspecting Ontology Competency: Keeping track ofthe competency of an ontology provided with an expres-sive reasoning mechanism, e.g., for RDFS or OWL seman-tics, is a challenge — for novice and experienced users.Therefore, in KnowWE different methods for inspectingthe current competency of the ontology by the use of theunderlying reasoning engine are provided. For classes andproperties an overview on the members/relations can be vi-sualized using the context menu (RDFS mode only). Inaddition to common query interfaces, also inline queries(in SPARQL for RDFS, in Manchester Syntax for OWL)can be embedded into the documents, always rendering the

6http://www.co-ode.org/ontologies/pizza/

current query result on page load. Further, for any knowl-edge statement to be inserted into the ontology an inferencediff (RDFS only) can be generated and visualized on de-mand. The diff for a statement is the set of triples whichis the result of the set subtraction of the inference closureof the entire ontology including and excluding respectivelythe particular statement. That way, the ”impact” of a par-ticular statement for the current version of the ontology ismade explicit for the user, showing an empty set for redun-dant assertions.

Quality Management with Continuous Integration:We have integrated the following test methods for theKnowWE Continuous Integration framework to supportontology engineering:

• Competency Test: The user can define competencytests for OWL ontologies by specifying a set of in-dividuals that are expected to be derived as membersof some class. In case of RDFS, a SPARQL query isdefined together with a set of resources as expectedresult.

• Consistency Test: Tests whether the connected OWL-reasoner discovers any inconsistencies in the currentversion of the ontology.

• Profile Check: Allows to test whether the current on-tology is in OWL DL.

Of course the capabilities of these tests strongly rely on theunderlying reasoner KnowWE is connected with. The testscan be attached to a CI-Dashboard that manages the ver-sions and test results. One dashboard is shown in Figure 4containing a consistency test and one competency test. Onthe left, the version history of builds is shown. The con-sistency test has failed within the most recent version andtherefore the current build is marked as failed by the redbubble. A dashboard can be configured to one of three trig-ger modes onChange, onSchedule, and onDemand.

Import, Export and Namespace Handling: In the ba-sic configuration, namespace handling is entirely managedby the system to make it easy for the user. However, thisimplies the unique name assumption for entity names forwithin the scope of the system. To develop an ontologyusing terms of other existing ontologies, those ontologiescan easily be imported into the system (and also connectedto the compilation mechanism, i.e. registered as existingdeclarations). To allow the created ontologies to be usedin other contexts, an export feature is given, delivering theentire ontology in RDF/XML format. Imported ontologieshowever cannot be modified within the system currently.

Implementation Overview: For the management andreasoning of the semantic data we utilized the follow-ing FOSS software components: OWLAPI [13], HermiT7,Sesame8, JSPWiki9, swiftOWLIM10.

The KnowWE software described in this paper isLGPL-licensed and the sources can be downloaded fromhttps://isci.informatik.uni-wuerzburg.de.

4 Case StudiesHermesWiki: In the eLearning platform Her-mesWiki [14], in addition to informal knowledge about the

7http://hermit-reasoner.com8http://www.openrdf.org/9http://jspwiki.org/

10http://www.ontotext.com/owlim

domain of ancient history, an ontology is built. This do-main ontology can be used for different advanced features,such as semantic search and navigation, augmented contentpresentation, or the generation of automated quiz-sessionsfor the students. The ontology is defined using differentmarkups, e.g., the turtle markup, dash-tree markup (forsubclass hierarchies) and the Manchester Syntax. TheHermesWiki currently contains about 900 wiki pagesforming an ontology containing about 600 individuals suchas persons or cities. During the last three years it has beendeveloped in cooperation of the department of AncientHistory and the department of Intelligent Systems at theUniversity of Wurzburg by more than 16000 edits. Abouta quarter of the edit operations originate from automatedrefactoring scripts converting the markup statementsaccording to the introduction of improved versions of themarkup languages. The members of the Department ofAncient History work autonomously on informal contentsand A-Box assertions. More complex extensions of the on-tology vocabulary are usually evolved in close cooperationwith the computer scientists. However, we also observedthat significant parts of the ontology have been developedby the domain specialists independently and withoutexplicit training sessions. These components, in partsshowing modeling flaws, indicate that the authors are onlylimited by conceptual ontology modeling expertise butnot by the usage of the tool. More information about theHermes ontology and its creation using special markupscan be found in [14].

Other Projects: The WISSKONT project considers thecreation of an intelligent information system in the med-ical domain of cataract surgery. The system is currentlyunder development and it will support the ophtalmologistduring the treatment process before, in-between, and af-ter the cataract surgery. Therefore, the system needs topresent relevant knowledge of the domain, which is inte-grated at varying degrees of formality. In order to makethis content accessible for a semantic search engine effec-tively, an ontology of the domain is created. It currentlyconstists of three major parts: (1) a concept hierarchy ofcurrently 340 concepts using dash-tree markup; (2) furthersemantic links between the concepts implemented as triple-markup; (3) annotations establishing relations between do-main concepts and content elements (e.g., textbook contentwith images describe particular aspects of a treatment pro-cess), implemented by a custom markup. The WISSKONTproject is part of the WISSASS project, a cooperation of theKarlsruhe Institute of Technology, Germany (KIT) and thedenkbares GmbH. It is funded as a ZIM-KOOP11 projectby the German Federal Ministry of Economics and Tech-nology (BMWI).

In the KnowSEC12 project, which is a cooperation withthe Umweltbundesamt13 and the denkbares GmbH, themain aim is the management of knowledge about sub-stances of high concern. New properties about substancescan be defined, when new knowledge arrives. Aggregationsand overviews on substances are flexibly created by spec-ifying SPARQL queries. About 50 properties are used toform relations between substances, substance groups andenvironment risks to create an ontology which is then usedin the decision process for the examination of substances

11http://www.zim-bmwi.de/12Knowledge on Substances of Ecological Concern13Federal Environment Agency Germany

Figure 4: CI-Dashboard with a competency and a consistency test.

of potentially high ecological concern.In both projects the initial experiences are promising for

an effective collaborative ontology development with thedomain specialists.

5 Related WorkIn this section, we compare the presented approach to dif-ferent classes of tools:

GUI-based Ontology Editors: Protege with its OWL-Plugin [2] is probably the most widely used and most ma-ture free ontology development tool available. As a richclient application it provides a sophisticated graphical userinterface. The ontology entities can be browsed using dif-ferent tab-based views. Clicking on an entity starts a form-based editor allowing to extend or modify the entity. Ad-ditionally, graph-based visualizations can be generated asalternative views. Other tools employing a similar editingparadigm are TopBraid-Composer14, the NeOn toolkit [15],and SWOOP [16]. The editing paradigm applied by thesetools is fundamentally different compared to the document-centered authoring paradigm proposed in this paper. Al-though, the tools are widely used, we argue that there arescenarios where the document-centered approach can bemore effective due to the advantages discussed in Section 2,especially when non-expert users are involved.

Semantic Wiki: A class of tools providing simple ac-cess to documents are (semantic) wikis. Semantic Medi-aWiki for example, being the most common semantic wikiimplementation, has been developed as a wiki extensionproviding a semantic model of the content. It has not beenintended as a general purpose ontology development en-vironment and most annotations correspond to simple A-Box statements [17]. Semantic MediaWiki imposes sev-eral restrictions on structuring the formal knowledge con-siderably reducing the freedom of structuring. For exam-ple, a property can only be defined on a separate prop-erty page and a class/category definition only on a categorypage. In recent years, numerous extensions for SMW havebeen created to allow for the definition of various kinds of

14http://www.topbraidcomposer.com

knowledge (c.f. SMW+15, [18]). However, to the best ofour knowledge until now there are no extensions provid-ing carefully designed markup languages posing low barri-ers including compilation and user assistance to support thedevelopment of ontologies in RDFS or OWL as proposedin this paper. These features implemented for SMW wouldresult in a well-suited document-centered ontology author-ing environment in the sense of this paper. There is anotherextension to SMW called Semantic MediaWiki OntologyEditor16. This toolkit however integrates form-based edit-ing components departing from the document-centered au-thoring approach.

Software Development Environments: Software is de-veloped in a way that is quite similar to the document-centered authoring approach described in this paper sincedecades. Todays sophisticated integrated development en-vironments (e.g., Eclipse, Netbeans) support the program-mers allover the entire development workflow. Many meth-ods of user assistance found in these IDEs can be trans-fered to document-centered ontology engineering. How-ever, compared to software engineering, there are also im-portant differences. Ontology engineering projects usu-ally bring together people with different backgrounds andstrongly differing expertise with respect to the subject do-main on the one hand and ontology modeling on the other.While in software engineering the source files usually areonly edited by professional software engineers, in this con-text also domain specialists, often with low technical back-grounds, are supposed to step into the author role. Further,informal content describing the domain is less important insoftware engineering as the code typically cannot be de-rived from domain knowledge by a formalization process.While IDE interfaces are designed for expert users we pro-pose to design the interface rather simplistic with advancedfunctionality being hidden for non-expert users to providea low entry barrier for domain specialists. Novice usersshould not be distracted by the complexity of possibilitiesbut be able to consider the tool as a basic document editor.

15http://www.ontoprise.de/de/loesungen/smw-plus/16http://www.phpkode.com/projects/item/semantic-media-

wiki-ontology-editor/

In contrast to common software development tools, wherecomments need to be marked as such and cause compileerrors otherwise, contents not complying to some markuplanguage should be considered as regular informal content(e.g., comments, documentation). Further, the technicalbarrier to access the tool should be as low as possible, forinstance providing a web-based application accessible us-ing a standard web-browser.

In sum, the document-centered ontology developmentapproach aims to combine the document-centered and tex-tual language-based authoring paradigm approved in soft-ware engineering with a simple user-interface (e.g., pro-vided by Semantic MediaWiki) incorporating support fordevelopment and maintenance of expressive ontologies asprovided by existing GUI-based tools such as Protege.

6 ConclusionCurrent ontology development tools perform the HCI taskby the use of graphical user interfaces. Novice users there-fore are forced to deal with a new tool of high complexity,demanding rather high introductory training even for verysimple activities. In this paper, we propose the use of thedocument-centered authoring paradigm, being an alterna-tive to graphical user interfaces, which allows less expe-rienced users to contribute according to their capabilitiesmore easily. We provide fundamental concepts how a cor-responding authoring environment can be built. A proto-type implementation of such a tool is presented. Further,we report about initial experiences of this ontology author-ing method employed in two case studies. According to ourexperiences this approach facilitates for untrained domainspecialists to express their knowledge and insert it intoa system initially as a starting point for collaborative in-cremental formalization. More experienced users are ableto formalize their knowledge using the markup languages.However, even if they are not restricted in the usage of thetool, missing ontology modeling skills can lead to unfavor-able ontology design. In this case, counteraction by ontol-ogy engineers is required and possible by driving an agileevolutionary development process including design refac-torings if necessary. The system KnowWE is under on-going development and a demo of the described version isaccessible17.

For the future, we try to gain still more experiences abouthow documents should be structured/organized and howmarkup languages can be designed and used to provide op-timal usability for the authors. An established catalog ofbest practices is one important task for further research.

References[1] Kotis, K., Vouros, G.: Human-centered ontology en-

gineering: The hcome methodology. Knowledge andInformation Systems 10 (2005) 109–131

[2] Knublauch, H., Fergerson, R.W., Noy, N.F., Musen,M.A.: The Protege OWL Plugin: An Open Devel-opment Environment for Semantic Web Applications.In: Third Internat. Semantic Web Conference - ISWC2004, Hiroshima, Japan (2004)

[3] Sure, Y., Erdmann, M., Angele, J., Staab, S., Studer,R., Wenke, D.: Ontoedit: Collaborative ontology de-velopment for the semantic web. In: International

17http://www.is.informatik.uni-wuerzburg.de/forschung/anwendungen/knowwe/

Semantic Web Conference. Volume 2342 of LNCS.,Springer (2002) 221–235

[4] Kotis, K., Vouros, G.: Human centered ontologymanagement with hcone. In: ODS 2003 Workshop onOntologies and Distributed Systems, Int. Joint Conf.in Artificial Intelligence (IJCAI-03), CEUR Work-shop Proc.(CEUR-WS.org) (08/2003 2003)

[5] Shneiderman, B., Plaisant, C.: Designing the UserInterface: Strategies for Effective Human-ComputerInteraction. 5. edn. Pearson Addison-Wesley (2009)

[6] Kleek, M.V., Bernstein, M.S., Karger, D.R., m. c.schraefel: Gui — phooey!: the case for text input.In: UIST 2007, 20th Symposium on User InterfaceSoftware and Technology, ACM (2007) 193–202

[7] Horridge, M., Drummond, N., Goodwin, J., Rector,A.L., Stevens, R., Wang, H.: The manchester owlsyntax. In: OWLED. Volume 216 of CEUR Work-shop Proc., CEUR-WS.org (2006)

[8] Martin, R.: Clean Code: Handbook of agile softwarecraftsmanship. Prent. Hall (2009)

[9] Fowler, M.: Domain-Specific Languages. Addison-Wesley Professional (2010)

[10] Miller, R.C., Chou, V.H., Bernstein, M., Little, G.,Kleek, M.V., Karger, D.R., m. c. schraefel: Inky:a sloppy command line for the web with rich visualfeedback. In Cousins, S.B., Beaudouin-Lafon, M.,eds.: UIST, ACM (2008) 131–140

[11] Vrandecic, D., Gangemi, A.: Unit tests for ontolo-gies. In Jarrar, M., Ostyn, C., Ceusters, W., Persidis,A., eds.: Proceedings of the 1st International Work-shop on Ontology content and evaluation in Enter-prise. LNCS, Springer (October 2006)

[12] Reutelshoefer, J., Striffler, A., Lemmerich, F., Puppe,F.: Incremental compilation of knowledge documentsfor markup-based closed-world authoring. In: K-CAP’11: Proc. of the sixth intern. conference on Knowl-edge Capture, ACM (2011) 81–88

[13] Horridge, M., Bechhofer, S.: The OWL API: A JavaAPI for OWL Ontologies. Semantic Web 2 (1) (2011)11–21

[14] Reutelshoefer, J., Lemmerich, F., Baumeister, J.,Wintjes, J., Haas, L.: Taking OWL to Athens – Se-mantic Web technology takes ancient greek history tostudents. In: ESWC’10: Proc. of the 7th Ext. Seman-tic Web Conference, Springer (2010)

[15] Haase, P., Lewen, H., Studer, R., Tran, D.T., Erd-mann, M., d’Aquin, M., Motta, E.: The neon ontol-ogy engineering toolkit. In: WWW 2008 DevelopersTrack. (April 2008)

[16] Kalyanpur, A., Parsia, B., Sirin, E., Grau, B.C.,Hendler, J.: Swoop: A web ontology editing browser.Journal of Web Semantics 4(2) (June 2006) 144–153

[17] Krotzsch, M., Vrandecic, D., Volkel, M.: Seman-tic MediaWiki. In: ISWC’06: Proc. of the 5th Int.Semantic Web Conf., LNAI 4273, Berlin, Springer(2006) 935–942

[18] Bao, J., Ding, L., Smart, P.R., Braines, D., Jones, G.:Rule modeling using semantic mediawiki. In: 3rdAnnual Conference of the International TechnologyAlliance (ACITA’09). (September 2009)

Modeling the Structure of Spreadsheets

Christian LigudaGerman Research Center for Artificial Intelligence (DFKI GmbH),

Cyber-Physical Systems, [email protected]

AbstractSpreadsheets are widely used in many differentdomains like business planning or science, e.g.for calculation, planning, statistical analysis ortest evaluation. For these and many other do-mains it is important that spreadsheets are er-ror free, easy to interpret, maintain and change.However, in the last years it has become moreand more evident that spreadsheets are highly er-roneous and hard to maintain. The main rea-son for this is the lack of high level structuresin spreadsheets, which rather allow a more struc-tured than a cell based view. In this paperwe present an abstract model for spreadsheets,which is able to represent the underlying struc-ture explicit and is independent of the concretelayout. This model is part of a project in whichthe abstract model is used to assist a user in creat-ing and understanding large and complex spread-sheets.

1 IntroductionSpreadsheets are a widespread tool for financial adminis-tration, science, business planning and much more. Themain reason for the great acceptance of spreadsheets istheir very simple and flexible structure. A user can placenames, values or formulae in cells and can thereby arrangethe data in any way he likes. On the one hand this easy andflexible structure leads to the application of spreadsheetsin many different domains. On the other hand this cellbased structure does not provide any support for abstractmodeling and semantic interpretation. While spreadsheetshave become more and more complex, professionalspreadsheets are highly erroneous which often result inexpensive misleading decisions. For example the WestBaraboo Village Board announced on 09.12.2011 that theyhave to pay $ 400,0000 more for their most recent 10-yearborrowing plan than originally projected. The reason forthis erroneous assumption was a wrong formula in theirspreadsheet which ignored a relevant cell (see [1] for thisand many more examples).

The aim of the SiSsI (Software Engineering forSpreadsheet Interaction) project[2], which is a cooperationbetween the DFKI and the Jacobs University Bremen, isto develop tools and methods for augmenting spreadsheetapplications with semantic techniques. These tools andmethods should help users to create, change, maintain andunderstand large and complex spreadsheets by bringing

ideas from software development and verification to thedevelopment of spreadsheets. As part of the SiSsI projectwe have developed an abstract spreadsheet model, whichabstracts spreadsheets from cell and layout informationand contains additional information about the underlyingstructure of spreadsheets.

Our intention for developing an abstract spreadsheetmodel is to use it as an intermediate layer between aconcrete spreadsheet and an ontology, which describesthe semantics of the concrete spreadsheet. Thereby itcan be used for verifying that the concrete spreadsheetsrepresents the semantics correctly or can help a user tounderstand the structure of a spreadsheet. Furthermorean abstract spreadsheet model can be used to assure thata rearrangement of the content in a spreadsheet does notdestroy the underlying structure.

In the next section we will analyse the problem of the cellbased structure of spreadsheets in more detail. In Section 3we will discuss some other approaches for formal spread-sheet models as well as the overall semantic frameworkwhich was developed within the SiSsI project. In Section 4we will introduce the main aspects of our abstract spread-sheet model. The mapping between a concrete spreadsheetand the abstract structures is then defined in Section 5. Inthe last section we will discuss our approach and give someoutlooks to future work.

2 Problem AnalysisThe high error rate of spreadsheets often results from theWYSIWYG principle of common spreadsheet programs.The problem of that principle is the lack of mechanismsfor representing the underlying structure of spreadsheets.For a better understanding we will analyse the examplespreadsheet from Figure 1 here in more detail and describesome underlying structures.

In Figure 1 we see that the years 1984 to 1988 appearin the spreadsheet and that 1984 to 1986 are labeledas Actual and 1987 and 1988 are labeled as Projected.Furthermore we see that each value of a cell in the areaB4:F15 is related to a year and to a label of the firstcolumn, e.g that the cell C7 is related to the year 1985and to the String Salaries. Furthermore we infer fromthe layout that Salaries are Expenses and that 1986 islabeled as Actual, whatever that means. Lets assume, thatthe values for Revenues and Expenses of the actual years1984 to 1986 are directly entered and the values for theprojected years are calculated from the years before. This

Figure 1: Spreadsheet from [8]

structural difference between actual and projected yearsis illustrated by using different colors for the expenses.But for the revenues this structural difference can just berecognized by inspecting the cells. The same problemoccurs for the cells that are related to Total Expenses andProfit (Loss) if their values are calculated from some othercells. Furthermore we can not see, whether the values ofall cells that are related to Total Expenses (B12:E12) arecalculated by the same formula or not without inspectingall cells.

As we have seen, the semantic structure is not partof the spreadsheet, but is partly given by the layout of aspreadsheet. Some semantic structures like the formulaeare just visible by inspecting all cells. Therefore a usermust be very careful if he wants to change the spreadsheet,because he must be aware of the underlying semantics ofa spreadsheet and assure that the layout of the changedspreadsheet still represents the underlying semantics.

To prevent such errors we will introduce a model whichseparates the structure from the layout of a spreadsheet.Therefore we define two maxims that must be satisfy byour model. Our first maxim is that the model should becompletely independent of the layout. More precisely, themodel should be invariant of all rearrangement operations,like exchanging two rows or columns or shifting tables toother places. Therefore the position, order or alignment ofelements should not be part of the abstract model. The sec-ond maxim is that the model should be able to representunderlying structures of a spreadsheet or at least be eas-ily augmentable to represent such structures. By defininga model which fulfills these maxims we are able to repre-sent the data from spreadsheets in a clearly structured wayas well as representing the structural relations between thedata.

3 Related WorkDavid et al. [3] developed a semantic framework andimplemented it as a semantic framework for spreadsheetswithin the SiSsI project. Their framework is able to“complement existing software applications with semanticservices and interactions based on a background ontology”.We integrated our model in their framework so that theabstract model could be used for spreadsheet interaction.

Our work is based on the previous work of [5] by usingsome of their terms. They introduce the term legendas those non-empty cells that do not contain input orcomputed values but contain text strings that give auxiliary

Figure 2: Spreadsheet from [4]

Figure 3: ClassSheet for Figure 2 from [4]

information on the cells that do. Furthermore they definea grid region as functional block, if that region couldbe interpreted as a function which maps elements from alegend to values. Thereby it is not critical, if the valuesare calculated or inputted, because the function is meantto be an intended function of the spreadsheet creator.For example, the region B13:F13 of Figure 1 could beinterpreted as a function, which maps years to the totalexpenses in that year and the region B4:F4 as a functionthat maps a year to the revenues of that year.

Their have been some other approaches for structuralmodels of spreadsheets from other authors. Engels andErwig [4] developed an object oriented model whichthey call ClassSheets. They define “ClassSheets consistof a list of attribute definitions grouped by classes andare arranged on a two dimensional grid”. Thereby theattributes of a ClassSheet can be distributed over thegrid. Figure 3 shows a ClassSheet for the Spreadsheetfrom Figure 2, in which the different colors are used torepresent different ClassSheets by what the carve-up of theClassSheets is illustrated. As we see in this example, thestructure and layout information are somehow dependingand ClassSheets are similar to real spreadsheets as theycontain the structural layout without all the data. Theabstract syntax of a ClassSheet, which can also be foundin [4], contains the layout information also and thereforethe definitions for ClassSheets are broken up into multipleparts like the visual representation.

An other model was developed by Paine [6]. Themain idea is that spreadsheets can be expressedas a set of equations e.g. Year[2000]=2000 orProfit[2000]=Sales[2000]-Expenses[2000].By replacing variables by cell addresses (e.g. [Year2000]by A3) a spreadsheet can be created from this equa-tions. In addition to those equations, Pain uses in [7]informations about tables to represent regions that aresimilar to the regions of the above mentioned functionalblocks. This tables just contain a name and the size,like Builds[2000:2010,1:20]. Furthermore Paineshows how these tables, equations and layout informationare enough to represent spreadsheet and rearrange themwithin a spreadsheet.

Even if ClassSheets and Paine’s model are able to repre-sent the structure of a spreadsheet in a more abstract way,they do not satisfy our maxims from Section 2. Thereforewe develop a new abstract spreadsheet model and use amathematical notation so that we are able to do proofs later,like that an abstract model is isomorphic to a spreadsheet.

4 Abstract Spreadsheet ModelWe start by defining an abstract data model in Subsection4.1 which is able to represent all the data from a spread-sheet without referring to cells anymore. In the next twosubsections we define some high level structures which canbe used to make the hidden structure of a spreadsheet ex-plicit.

4.1 Abstract Data ModelAs mentioned above we want to define a model which ab-stracts from cell and layout information. Therefore we de-fine three sets whose elements represents cells, the contentand underlying formulae of cell in λ-notation.Definition 4.1.

a) A is an infinite set 1 which is used to represent allelements of a spreadsheet, more precisely an elementof A is a proxy for one or more cells of a spreadsheet.2

b) V is a set of names and values (e.g. R ∪ Strings ∪Dates ∪ . . .).

c) Λ is set of λ expressions of the form λx1 . . . xn.φwhere φ is a spreadsheet function. We noteord(λx1 . . . xn.φ) = n.

Now we have defined some sets which can be used torepresent cells (A), content (V) and formulae (Λ). For rep-resenting the content of a concrete spreadsheet we have todefine, what the content of each proxy for a cell (an elementa ∈ A) is and if the proxy is related to a formula. We givea definition first and explain it in more detail afterwardstogether with an additional example.Definition 4.2. We call D = (vA, λA, pA) an abstract datamodel, where

• λA : A→p Λ maps an element to a spreadsheet func-tion.

• pA : A →p

∞∪n=1

A× . . .× A︸︷︷︸n−times

maps an element to pa-

rameters for λA. Thereby pA must fulfill the condition:∀ a ∈ A, if λA(a) is defined, pA(a) must be definedwith: ord(pA(a)) = ord(λA(a)).

• vA : A →p V maps an element to avalue or name. ∀a ∈ A, if λA(a) is de-fined, vA(a) must be defined with: vA(a) =λA(a)(vA((pA(a))1), . . . , vA((pA(a))ord(pA(a))))

The main idea behind this definition is to use elementsfrom A to represent the cells from a spreadsheet. The con-tent and formula of a cell is represent by mapping the cor-responding element from A to a value in V and a formula in

1It would also suffice to choose a finite set A that is big enoughto represent all elements from a spreadsheet. But then A mustbe changed whenever we add new elements to our spreadsheet.Therefore an infinite set is easier to handle for our theoreticalmodel.

2If an element from A is a proxy for more then one cell it mustbe assured that all cells have the same values and contain the sameformula like it is discussed in Section 5.

Λ. While formulae in spreadsheets are related to concretecells (like =A3+A4) a λ expression depends on variablesand not on concrete values. Therefore we need the func-tion pA which maps an element to a n−tuple which must beapplied to the λ expression. For example the λ-expressionfor =A3+A4 is λx1x2.x1 + x2. Obviously the informationabout the concrete values is now missing and therefore wedefine pA(a) = (aA3, aA4) where aA3 and aA4 are the ab-stract elements for the cellsA3 andA4 and a is the abstractelement which contains the formula. The constrain in thedefinition of pA means, that the size of the n-tuple must beequal to the number of variables in the lambda expression.The constrain for vA just says that the value of an elementis the result of the λ expression when it is applied to thevalues of the corresponding parameters like it is describedin the following example.Example 4.3. Let a1, a2, a3 ∈ A be abstract repre-sentations for the cells C4, C13 and C15 in Figure 1with vA(a1) = 4, 992, vA(a2) = 3, 291 and vA(a3) =1, 701. The formula from cell C15 is =C4-C13 and willbe represented as λA(a3) = λxy.x − y with parame-ters pA(a3) = (a1, a2). These functions fulfil the con-strains for the abstract data model, because vA(a3) =λA(a3)(vA(a1), vA(a2)) = λA(a3)(4, 992, 3, 291) =(λxy.x− y)(4, 992, 3, 291) = 4, 992− 3, 291 = 1, 701.

Because the parameters of a formula are given aselements of A our model is completely independent of theconcrete spreadsheet layout and is therefore very robustfor layout changes. Furthermore it is easy to determine, ifsome elements have the same underlying formula or not.

Another advantage is that elements are not differentiatedby their position alone. In the spreadsheet of Table 1 thecity name Hannover appears two times, but once referredto the city Hannover in Germany and once to Hannoverin North Dakota (USA). By using the abstract data modelthis could easily be modeled by two elements a1, a2 ∈ Awhich are mapped to the same value in V but to differentpositions in the concrete spreadsheet.

Until now we are able to represent all cells in an ab-stract way which does not depend on the cell position anymore. Therefore an abstract data model satisfies our firstmaxim from Section 2. However, the abstract data modelis not able to represent the structures that were discussedin Section 2 and therefore our second maxim is unful-filled. Therefore we introduce now some high level con-cepts which make some underlying structures of a spread-sheet explicit so that the second maxims will be fulfilled aswell.

4.2 LegendsWe augment A with a simple structure for representing leg-ends like Years or Costs.Definition 4.4. Let L ⊂ A be finite and l ∈ A\L. Wecall the tuple γ = (L, l) an L-Structure. We say that Lis the underlying set of γ and that two L-Structures γ1 =(L1, l1) and γ2 = (L2, l2) are disjoint, if l1 6= l2 andL1, L2 and l1, l2 are pairwise disjoint.

We use the set L to represent informations about the dif-ferent items of a legend and the element l to represent in-formation about the complete legend. Whether the givenname vA(l) of a legend appears in a spreadsheet or not willjust depend on the mapping between abstract and concrete

Country Germany USACity Hannover Berlin NYC Hannover

Gender m f m f m f m f

Table 1: Spreadsheet cutout of 3 legends. m stands for maleand f for female.

spreadsheet. In the spreadsheet of Figure 1 the names ofall the different legends (e.g Years) do not appear in thespreadsheet except the name of the legend Expenses.Example 4.5. To explain the above definition we use thecutout of a spreadsheet which is given in Table 1. Inthis spreadsheet we have three legends. We see that thecity name Hannover appears two times, but we wouldassume that they refer to two different cities. Further-more we see that m and f as abbreviation for maleand female appear several times and we assume thatthey all have the same meaning. The abstract represen-tation for this cutout can be modeled as follows: Wedefine γ1, γ2 and γ3 with L1 = aGermany, aUSA,L2 = aHannover1 , aBerlin, aNY C , aHannover2 andL3 = am, af to represent the items of each legend. Torepresent the headers we define l1 = aCountry, l2 = aCity

and l3 = aGender. For the reconstruction of the spread-sheet we just need to map each element of L1, L2 and L3

to one or more positions in the spreadsheet and l1, l2 andl3 each to one position.

Now we are able to represent legends in a more struc-tural way but we can not express the relations among leg-ends, e.g. that NYC is related to USA and not to Germany.Therefore we define:Definition 4.6. Let γ1, . . . , γn be L-Structures with under-lying disjoint sets L1,. . . , Ln. A subset δ ⊆ L1 × . . .× Ln

is called a legend relation of γ1 . . .γn.

Thereby the order of the sets is not important for the in-terpretation of a legend relation δ ⊆ L1 × . . . × Ln andwe just say that L1, . . . , Ln are related to each other. Forthe legends in Table 1 we could say that the city dependson the country but it would not make any sense to say thatthe gender depends on the city. Furthermore the order ofthe legends in the spreadsheet is exchangeable which is notcompatible with a fixed Li depends on Li−1 semantic andtherefore the order is not part of the semantic.Example 4.7. We explain the definition by representingthe relations between the 3 legends of Table 1. There-fore we can define one legend relation as a subset ofL1 × L2 × L3 or split it into two legend relations, oneas a subset of L1 × L2 and the other as a subset ofL2 × L3. We choose the first option and define δ =(aGermany, aHannover1 , am), (aGermany, aHannover1 ,af ), (aGermany, aBerlin, am), (aGermany, aBerlin, af ),(aUSA, aNY C , am), (aUSA, aNY C , af ), (aUSA,aHannover2 , am), (aUSA, aHannover2 , af ). Fromthis subset of the Cartesian product it is easy to infer, thatHannover1 is related to Germany and Hannover2 isrelated to USA.

4.3 Functional BlocksIn spreadsheets the dependencies between a cell in a func-tional block and the corresponding legend items are notexplicitly modeled but can usually inferred by the userfrom the structure of a spreadsheet (see Section 2). Theseimplicit dependencies should be modeled in the abstract

spreadsheet model explicitly. A functional block will berepresented as a more dimensional function which dependson the surrounding legends. For example, the functionalblock of Table 1 can be modeled as 2 or 3 dimensionalfunction with domains Costs× Y ears (×Y earType).Definition 4.8. An injective function ϕ : Σ ⊆ L1 × . . . ×Ln → A is called a functional block.

Example 4.9. Suppose we model the legends of the Wino-grad Spreadsheet by L1 = aActual, aProjected, L2 =a1985, . . . , a1988 and L3 = aSalaries, . . . , aOther.We decided to model the first column from the spread-sheet by different L−Structures, because the revenues, ex-penses and the profit have different structures and seman-tics. Therefore L3 just represents the different expenses.Now we model the functional block in B7:F11 by a func-tion ϕ : Σ ⊂ L1×L2×L3 → A. The function ϕ is definedon a subset of L1 × L2 × L3, because not every triple ap-pears in the spreadsheet, e.g. (aActual, a1988, aSalaries)does not appear.

Obviously the function ϕ should be injective, because notwo different elements from the domain should be mappedto the same element in A.

5 MappingIn Section 4 we have discussed a model to represent thecontent and structure of a spreadsheet in a more abstractway. Therefore our model ignores all cell informationand so the original spreadsheet could not be reconstructedfrom an abstract spreadsheet model alone. In this sectionwe discuss the main aspects of the mapping betweena concrete and an abstract spreadsheet, but we will notdiscuss all details which are necessary to proof that themapping is isomorphic.

We represent a concrete spreadsheet by using the defi-nition of an abstract data model, but replace the set A byS = N × N × N. Thereby a tuple (n, x, y) ∈ S representsthe cell position in the worksheet n, e.g (1,5,3) stands forcell D6 in the first worksheet.

Definition 5.1. We call S = (vS, λS, pS) a formal spread-sheet model, where vS, λS and pS fulfil the same conditionsas vA, λA and pA.

A very intuitive idea is to define a mapping between Aand S. While this is enough to map an abstract spreadsheetto a formal spreadsheet model, this mapping does not pro-vide enough information to build an abstract model froma formal one, because it does not say anything about L-Structures and functional blocks. We give a formal defini-tion of mappings for legends and functional blocks first andthen explain them by example afterwards.

Definition 5.2.a) A legend mapping is a triple ψ = (ψA, a, ω) with

a function ψA : Ω → A, Ω ⊂ S, elements a ∈A\ψA(Ω) ω ∈ (S)\Ω and the constraint ψA(ω1) =ψA(ω2)⇒ vS, λS and pS are equal for ω1 and ω2.

b) A functional block mapping is a tuple ψ = (ψA, ψΣ)of an injective function ψA : Ω → A and an bijectivefunction ψΣ : Ω → Σ with Σ ⊆ L1 × . . . × Ln andψA(Ω) ∩ Li = ∅, 1 ≤ i ≤ n.

We start by taking a closer look at the legend mapping byexplaining the constrain and how to create an L-Structure

(L, l) from a formal spreadsheet and a mapping. The givenconstraint for a legend mapping assures that whenevertwo different cells are mapped to the same element inthe abstract spreadsheet model, the cells contain thesame values, formula and parameters (e.g. the four cellscontaining “m” in Table 1 are mapped to a single abstractelement. Therefore all cells must contain the same value,formula and parameters). Given a formal spreadsheetand a mapping we can create an L-Structure by definingL = ψA(Ω)−a and l = a. In doing so we have a partialmapping between A and S and furthermore have enoughinformation for building an L-Structure. ω is used to mapl back to a spreadsheet.

A functional block mapping should provide enough in-formation to define a functional block ϕ : Σ → A froma cutout of a spreadsheet. Like we have mentioned above,the spreadsheet does not contain the information about theintended function of a functional block. The relation be-tween a cell and the legends (e.g that the cell D7 is relatedto the year 1986 and to the String Salaries) is provided bythe function ψΣ : Ω → Σ. The mapping from a cell in agrid to an element of A is provided by ψA : Ω→ A.

Example 5.3. We explain the mapping for the cellD7 of Figure 1. This cell is related to aActual ∈L1, a1986 ∈ L2 and aSalaries ∈ L3. We take an un-used element a ∈ A and define ψA(1, 6, 3) = a andψΣ(1, 6, 3)) = (aActual, a1986, aSalaries). Given thesemapping we can define a functional block for this elementby ϕ(aActual, a1986, aSalaries) = a. By repeating this pro-cedure for all cells in a given area we can map an unstruc-tured area in a spreadsheet to a functional block structurewhich makes the underlying structure in the spreadsheetexplicit.

6 DiscussionWe have developed a new model which is able to representthe abstract structure of a spreadsheet and which satisfiesour two maxims from Section 2. Even if we have notprovided a formal proof that a concrete spreadsheet andan abstract spreadsheet model are somehow isomorphic,the mathematical notations allows such formal proofs,whereby such proofs need assumptions about some prop-erties that are not defined here.

In future work we want to develop methods for asemi-automated extraction of the abstract structure fora concrete spreadsheet. Although an implementationof the abstract spreadsheet model is already integratedin the SiSsI project, future work will analyse how thisabstract model can be used in user interaction to preventcommon errors or to understand the structure of thespreadsheet by connecting it to an ontology with containsthe semantic of a spreadsheet. Instead of creating formulaein Excel which refer to different cells a user should beable to create a formula which refers to semantic objectsin the ontology, like Profit: Year -> Float,Profit(y)=Revenues(y)-Expenses(y). After-wards an area in a spreadsheet (like B15:F15 in Figure1) can be linked to that ontology term. Thereby thecorresponding formulas in the cells can created as concreteinstances of the given ontology term or the existingformulae can be validated against the ontology term byusing a theorem prover. For this kind of validation we need

the abstract spreadsheet model as an intermediate layer,because on the one hand it contains all explicit formulasfrom a spreadsheet and on the other hand contains theinformation about the relation to the legends. In Example,without the knowledge that the elements for the cellsB4, B13 and B15 are related to the year 1984 and torevenues, expenses and profit the formula in the cellB15 could not be validated agains the ontology termProfit(y)=Revenues(y)-Expenses(y).

AcknowledgmentsThis work was funded by the German Research Foundation(Deutsche Forschungsgemeinschaft, DFG) under grant HU737/6-1.

References[1] Eusprig horror stories. URL http://www.

eusprig.org/horror-stories.htm. last vis-ited: 29.06.2012.

[2] Sissi homepage. URL http://www.dfki.de/cps/projects/sissy/index.de.html. Lastvisited 29.06.2012.

[3] Catalin David, Constantin Jucovschi, AndreaKohlhase, and Michael Kohlhase. Semantic al-liance: A framework for semantic allies. In JohanJeuring, JohnA. Campbell, Jacques Carette, GabrielReis, Petr Sojka, Makarius Wenzel, and VolkerSorge, editors, Intelligent Computer Mathemat-ics, volume 7362 of Lecture Notes in ComputerScience, pages 49–64. Springer Berlin Heidel-berg, 2012. ISBN 978-3-642-31373-8. doi:10.1007/978-3-642-31374-5 4. URL http://dx.doi.org/10.1007/978-3-642-31374-5_4.

[4] Gregor Engels and Martin Erwig. Classsheets: Au-tomatic generation of spreadsheet applications fromobject-oriented specifications. In In 20th IEEE/ACMInt. Conf. on Automated Software Engineering, pages124–133, 2005.

[5] Andrea Kohlhase and Michael Kohlhase. Compen-sating the computational bias of spreadsheets withMKM techniques. In Proceedings of the 16th Sym-posium, 8th International Conference. Held as Partof CICM ’09 on Intelligent Computer Mathematics,Calculemus ’09/MKM ’09, pages 357–372, Berlin,Heidelberg, 2009. Springer-Verlag. ISBN 978-3-642-02613-3. doi: 10.1007/978-3-642-02614-029. URL http://dx.doi.org/10.1007/978-3-642-02614-0_29.

[6] Jocelyn Paine. Excelsior: Bringing the benefits ofmodularisation to Excel. CoRR, abs/0803.2027, 2008.URL http://arxiv.org/abs/0803.2027.

[7] Jocelyn Paine, Emre Tek, and Duncan Williamson.Rapid spreadsheet reshaping with excelsior: multipledrastic changes to content and layout are easy whenyou represent enough structure. CoRR, abs/0803.0163,2008.

[8] Terry Winograd. The spreadsheet. In Terry Winograd,John Bennett, Laura de Young, and Bradley Hartfield,editors, Bringing Design to Software, pages 228–231.Addison-Wesley, 1996 (2006).

Abstract

Developing web-based, knowledge-based sys-tems (wKBS) still challenges developers, mostly due to the inherent complexity of the overall task. The increased focus on knowledge base de-velopment/evaluation and consequent neglect of UI/interaction design and usability evaluation raises the need for a tailored wKBS de-velopment tool, leveraging the overall task while specifically supporting the latter activities. As an example for such a tool, we introduce the wKBS development tool ProKEt. With the help of that tool, we developed the novel UI concept interac-tive clarification tree (iTree) with skill-building ability, that specifically is suitable for clarificati-on consultation systems. Also, we report a recent case study, where iTree was implemented for knowledge-based clarification consultation in the legal domain.

Keywords: Knowledge-based Systems, Clarification

Consultation, UI Design, Skill-building UI, Usability Evaluation

1 Introduction Despite increasing distribution in many domains, web-

based knowledge-based systems (wKBS) still challenge developers: Development of appropriate knowledge bases alone is an effortful task in terms of time and money; thus, intentional UI/interaction design and usability evaluation activities remain rather neglected. Yet, wKBS are often applied in critical or specialized contexts—e.g., consulta-tion in the medical or legal domain—where the chosen UI/interaction style can contribute strongly to either the success or the failure of the system. Thus, UI/interaction design and usability evaluation should rather be a key factor for wKBS development. This increases the need for a tailored software tool that fosters experimentation and evaluation of novel wKBS styles. We propose the tailored software tool ProKEt, that supports efficient affordable, UI-/interaction design-focussed wKBS development while at the same time seamlessly integrating usability evalua-tion functionality. To the best of our knowledge there exists no previous work to date regarding similar tools.

1Originally submitted to KEOD 2012

Regarding consultation in contexts such as the legal or medical domain, it often can be valuable to not only have general consultation systems available that derive one or more diagnoses based on the user input, but additionally to have specialized clarification systems, for investigating only one distinct diagnosis—potentially pre-selected by general consultation systems or by the users themselves. In this paper, we introduce the interactive clarification tree (iTree) as a novel, hierarchical clarification UI/interaction style that we developed with the help of the tool ProKEt; iTree thereby is particularly suitable for a mixed, diverse user population and additionally provides skill-building ability. A first study in the course of a cur-rent project in the domain of legal consultation suggests general benefits of the proposed iTree UI style.

Related Work With regards to general KBS/wKBS

development there exist various tailored software tools—such as JavaDON (Tomic et al., 2006), or KnowWE (Baumeister et al., 2011)—and methodologies—e.g., MIKE (Angele et al., 1998), or CommonKADS (Schrei-ber et al., 2001). However, such approaches still mostly focus on the design and evaluation of the knowledge base; in contrast, we propose ProKEt as tailored wKBS develo-pment tool that seamlessly couples efficient, agile wKBS development, creative experimentation regarding KBS UI/interaction design, and semi-automated usability eva-luation activities. ProKEt can be further seen as user-centered prototyping tool for wKBS—a concept defined by (Leichtenstern and Andre , 2010) as an all-in-one tool solution for enabling efficient, effective and satisfactorily design, evaluation and analysis of developed artifacts.

Probably due to the numerous benefits of web-based systems—e.g., availability, acceptance, or maintainabili-ty—to date an increasing number of knowledge-based/expert systems seems to be developed for the web—i.e., integrated in websites or as separate, complex web applications; recent examples are (Patil et al., 2009) or (Rahimi et al., 2007). However, such wKBS apparently are being developed in a rather ad hoc manner, not follo-wing systematic methods or processes, and not (re)using (neither providing) any patterns or best practices especial-ly regarding the UI/interaction design—probably due to a general lack of scientific research in web-based expert systems, cf. (Duan et al., 2005). Similarly, wKBS develo-pers seem to be individuals, performing all tasks required for developing and distributing a wKBS by themselves. This further increases the need for a tailored tool that not only renders overall wKBS development an efficient,

RESUBMISSION1 iTree: Skill-building User-centered Clarification Consultation Interfaces

Martina Freiberg and Frank Puppe University of Würzburg

D-97074, Würzburg, Germany freiberg|[email protected]

pragmatic task, but equally important specifically supports design and experimentation with web-based UI/interaction forms and their usability evaluation.

Paper Structure The rest of the paper is organized as follows: In Section 2, we shortly introduce the tailored wKBS development tool ProKEt. Afterwards, we discuss iTree, a novel hierarchical UI concept for knowledge-based clarification consultation systems with skill-building ability in Section 3. We report on a recent case study in Section 4, where the proposed UI style was prac-tically implemented for a wKBS in the legal domain. We conclude with a short summary of the presented research and an outlook to prospective future work in Section 5.

2 ProKEt ProKEt is a tailored, Prototyping and Knowledge sys-

tems Engineering tool for web-based, knowledge-based systems (wKBS), that additionally provides integrated support for various usability evaluation related activities. Thereby, ProKEt specifically supports web-based consul-tation and documentation systems, which can be develo-ped equally well as (pure) prototypical demo systems and as fully-fledged systems for productive use. Thereby, extensible prototyping is put into action, facilitating a nearly effortless transition from prototype to productive system; for a more extensive introduction of the agile, extensible prototyping and engineering process with Pro-KEt, see (Freiberg et al., 2012). The main application logic is implemented in Java. The resulting artifacts are Servlet-based web applications, using HTML, String-Template, and CSS for UI creation, and JavaScript for interactivity. Regarding the knowledge representation, an XML-based specification is used for the pure prototypes, which can be directly cerated/edited with ProKEt itself. For productive systems, d3web (URL d3web, 2012) knowledge bases are integrated and (mostly) replace the XML specification; the latter, however, can not directly be edited with ProKEt, thus in that case an external d3web-supporting tool such as KnowWE (Baumeister et al., 2011) is required. Yet recently, we implemented a mechanism to couple KnowWE and ProKEt artifacts, thus drastically improving and easing the workflow of UI/front-end development, knowledge base development and their integration into a productive wKBS.

For supporting the straightforward evaluation of its arti-facts, ProKEt further allows for seamlessly integrating both qualitative and quantitative data collection both for prototypes and productive wKBS; this enables developers to assess the current development state in a favorable way at any time by conducting manifold, potentially iterative, evaluations. For qualitative data collection, ProKEt sup-ports both the integration of form-based question-naires/surveys—standards such as the SUS (Brooke, 1996) and the NasaTLX (Hart, 2006) are supported out of the box, but tailored own questionnaires can be added with no effort—and of anytime feedback—mechanisms for collecting free user feedback at any time during a wKBS session. Regarding quantitative data, ProKEt pro-vides a tailored, mouse click and keyboard event logging mechanism that records all relevant actions during wKBS sessions. Based on that data, ProKEt furthermore automa-tically can calculate a bunch of known usability metrics—such as Success Rate, or Average Task Time—proposed e.g. by (Constantine and Lockwood, 1999), but it is equal-

ly well possible to just export qualitative and quantitative data into a standard CSV format for further investigation with external tools, e.g., standard spreadsheet calculation or advanced statistical software. A more detailed in-troduction of that usability extension of ProKEt can be found in (Freiberg and Puppe, 2012).

3 iTree for Clarification ProKEt particularly supports the development of con-

sultation- and documentation systems. A consultation system thereby provides decision support in a particular domain based on given user input, whereas a documenta-tion system contrastingly focusses on supporting uniform, efficient and high quality data entry. In this paper, we propose the interactive clarification tree (iTree) UI style specifically for clarification systems as a sub-class of consultation systems.

3.1 Clarification Consultation As a subarea of classification, clarification relates to

hypothesize-and-test as follows: Separate, general multi-plex consultation systems can be applied first for narro-wing the complete set of potential diagnoses/hypotheses down to one or several most suitable elements (hypothesi-ze step); each of those hypotheses can then be further investigated by a corresponding clarification module (test step). As shortcut, users could alternatively start directly with a clarification system for a chosen hypothesis them-selves.

3.2 iTree: Skill-building Clarification We propose iTree as a novel UI with skill-building abi-

lity that fosters an efficient and usable user experience in the context of clarification systems.

Figure 1 presents a schematic drawing of iTree for cla-

rification systems. The core issue to be rated is presented as root element of the hierarchical tree structure (Figure 1, Core Issue). Its rating is derived from the ratings of any desired number of top-level questions, placed directly underneath the core issue (Figure 1, Quest.1, Quest.2, Quest.3). Questions are a tailored form of yes/no questi-ons with an additional value Neutral/Uncertain; provided answers further can be withdrawn/adapted at any time, indicated by the X button. The current implementation allows three possible abstract ratings for the core issue as well as for all questions: Confirmed, uncertain/neutral and rejected, which correspond to the answers Yes, ?, and No per default. Some domains may require to swap that mapping for particular questions in favor of a more under-standable question wording. For example, see Figure 2 which depicts the current implementation of the iTree UI style: The core issue is confirmed (rating: yes) if the cancellation was NOT prohibited due to time limitations;

Y N ? X Core Issue to Clarify…

Y N ? X Question 1 (Rating Core Issue)


Y N ? X Question 2.1 (Rating Question 2)

Y N ? X Question 2.2 (Rating Question 2)


D

D

D

D

D

D

Figure 1: Schematic drawing of the iTree UI style

in that case, the swapped yes/no mapping allows for re-wording the question as depicted, which is much clearer than its negated alternative. In case the user cannot answer a question directly, more detailed refinement questions—if available—can be retrieved for the current element, represented by the D button in the scheme and by the arrow in Figure 2; as example, the second top-level ques-tion in Figure 2 contains two refining questions, which list in more detail the conditions which confirm/reject its parent question. Question ratings are always propagated from inner levels of the hierarchy up to the topmost ques-tion(s) by either AND or OR connections. Let pn be a parent node and cnp a child node of pn; for calculating the rating of pn, the following rules apply:

This means, e.g., that the core issue in Figure 2 is rated

yes only if all of its children are rated yes, as those are connected to the core issue by AND (second rule above); likewise, cancellation prohibited due to time limitation is rated yes as soon as one of its children is rated yes (fifth rule above) due to the OR connection. One advantage of iTree is the suitability for a diverse user population—i.e., users with different background and expertise might be able to profit from the same system. This is achieved by the possibility to derive the solution rating both by an-swering more abstract top-level questions (domain spe-cialist level) or by stepping into more refined, elaborate questions (less expertise required) in iTree. By the visual representation of the knowledge base structure, moreover a form of focus-and-context view is created: Not only the currently active/processed question(s) are visible, but also surrounding elements are indicated—limited only by the display size. As the user thus can visually trace the result of an answer by the distinct presentation of the questions and their current state, that is propagated all throughout the tree, the core issue rating becomes more transparent. The chosen visual representation of the knowledge further supports users in gaining a thorough understanding of the investigated core issue and the coherences between its clarifying questions and the core issue itself. Thus, users acquire additional knowledge by means of the system, yet are also enabled to bring in their existing knowledge for

potentially shortening the clarification session or for focussing on only those parts in detail that are rather un-clear. Together with optional, auxiliary information that can be integrated for each of the elements (not contained in the scheme—see e.g. the auxiliary information panel in Figure 3, E), iTree specifically can serve as a skill-building UI type.

4 Case Study—JuriSearch At the beginning of 2012, the JuriSearch project was

initiated as a cooperation between the university of Würz-burg and the Reno Star corporation (partly founded by the Free State of Bavaria). JuriSearch aims at building a wKBS for the legal domain: The target system is intended to integrate both a standard consultation (entrance) modu-le—hypothesize—and various clarification modules for each potential core issue—test—as to provide encompas-sing advice on various legal topics, such as the right of cancelation or the law of tenantry. Potential target users are diverse, ranging from legal laymen—searching for a basic understanding/estimation of their case to (fresh) lawyers seeking for guidance regarding legal (sub)domain(s) that are not exactly their special field of work. Those framing conditions provided a perfect oppor-tunity to implement and evaluate the iTree UI style. Therefore, a comparative study with iTree and a more common, conversational UI style was conducted; the latter was implemented as a one-question UI style (oneQ). In contrast to the free, explorative interaction with iTree, oneQ is based on the metaphor of a conversation: The system always presents only the one suitable next questi-on at a time, thus imitating a strict dialogue between a user and the system. Yet, both UI types are based on the exact same knowledge base with their core difference being the presentation of the questions: Hierarchical tree (iTree) vs. single question (oneQ). Refinement questions are as well available in oneQ; yet there, the former current question is folded and the first of the refinement level questions is presented, thereby destroying much of the ’contextual knowledge’ that iTree facilitates by always presenting all questions of the current hierarchy level in addition to the surrounding, further structure (limited by display size only). Figure 3 presents the iTree implemen-tation in JuriSearch (A-F) as well as the alternative, con-versational oneQ UI (G).

4.1 User Study—Framing Conditions 21 members from our department—all male, mostly

between 25 and 35 years—participated in the first study; as computer scientists, they all were versed in general computer and web system usage, yet in most cases had little to no experience regarding the specific wKBS types, and no experience regarding the target domain at all. Two exemplary problem descriptions from the domain of cancellation were created, and participants were asked to solve one problem with iTree and the other with oneQ; to avoid biased results due to the sequence of using the UIs, that sequence was altered between participants. The study was conducted remotely: The test systems were de- ployed on a specifically configured server—enabling the integrated logging and feedback/questionnaire mecha-nisms—and the participants were given all required in-struction material per email.

Was the working contract terminated effectually?Core Issue

Is the cancellation prohibited due to time limitation?

Is the cancellation allowed despite time limitation?

Was the working contract a fixed-term contract?

Is the cancellation formally passable?

XYes No

No

Yes

Yes No

Yes

No

-?-

-?-

-?-

-?- X

X

XIf

And

If

Or

Figure 2: Exemplary iTree Implementation

4.2 User Study—Results & Discussion The collected log-data revealed a general applicability

of the iTree concept for implementing a clarification wKBS UI in the legal domain and specifically the following results: First, iTree exhibited an average task time of 13m 38s±6m 49s in contrast to 10m 39s±5m 49s for oneQ (by a narrow margin statistically not significant on a one-sided unpaired t-test, p=0,068). The higher task time of iTree could possibly be explained by its ability to pro-vide intuitively for free, extensive exploration of the sys-tem. Yet on the other hand, task time should not be over-rated at all, here; the extent of usage of the test systems depended in larger parts on a) the reading speed of the participants regarding the questions and explanations, b) the usage conditions (during daily job routine vs. after end of work) which, due to the setting of the remote study could not be controlled strictly, and c) the potentially already existing knowledge regarding the problem at hand, in turn leading to highly subjective task time results between users. Regarding the success- and error rate, a case was classified successful, if the correct rating of the core issue was derived by the user with the respec-tive system type, and not successful if either the wrong or no solution was found. For iTree, the success rate was 42,86% and 38,1% for oneQ (both: no statistical signifi-cance on a one-sided binomial test with p=0,11 and p=0,16); along with subjective user feedback, this clearly indicates the need to rework the knowledge base contents/structure for yielding better results. Furthermore, both anytime- and questionnaire-based feedback were collected as qualitative user data. The first remarkable finding was the fact, that iTree nearly concordantly was perceived more intuitively usable, and that it thus further was reported to be the preferred UI type by 81% of the

study participants, whereas oneQ only was preferred by 14% and no preference was stated by 5%; this is statisti-cally significant on a χ2 test with p<0,05 and with an anticipated distribution of 50% (iTree), 30% (oneQ), and 20% (both equally). One possible explanation might be the specific characteristics of the participant population, that—as computer scientists—might simply be used to tree representations and thus perceived iTree as naturally more intuitive to use. Regarding further subjective (ques-tionnaire) topics iTree scored better all over; on a scale from 0 (worst) to 6 (best) the results were: Comprehensi-bility of the system reactions 4.43±1.54 (iTree) vs. 2.76±1.45 (oneQ) or of the derived results 4.53±1.54 (iTree) vs. 3.33±1.85 (oneQ), and the mediation of domain knowledge to the user 4.05±1.32 (iTree) vs. 2.95±1.72 (oneQ); those differences are all statistically significant using an unpaired, one-sided t-test with p≤0,05. Especially the latter value affirmed our assumpti-on that iTree particularly evinces skill-building abilities. Additional insights from anytime feedback included: The wording of the questions was perceived as incompre-hensible/cumbersome in 11 cases (52%) due to often used duplicate negations and legal specialist language, pro-bably further aggravated by the fact that the chosen parti-cipants were legal laymen and thus not at all familiar with legal terms and language; also, the hierarchical structure and representation of the knowledge base—that followed the legal subsumption logic—was perceived unfavorable. In such a hierarchy/sequence, the questions most interes-ting for legal laymen appear far down while at the same time more abstract concepts are contained at the upper levels; this led to (laymen) users having difficulties to make sense of the concepts at the top/beginning of the hierarchy/questioning sequence. A solution to this issue might be a complete restructuring of the knowledge base,

IF

IF

AND

AND

IF

IF

IF

OR

Core Issue

Y

Y

Y

N

Y

N

N

Y

Y

N

N

N

reject

confirm

neutral(C)

(D)

(F)

(A)

(B)

J

reject

confirm

neutral

(G)

N

N

N

N

N

Y

Y

Y

Y

YDetails

Is the dismissal formally legal?

Is the dismissal legally correct regarding the contents?

Is the dismissal not prohibited due to timely limitations?

Is the dismissal not prohibited due to special laws?

Was the statutory period of notice adhered to?

(E)

(G)

J

reject

confirm

neutral

(G)

N

N

N

N

N

Y

Y

Y

Y

YDetails

Is the dismissal formally legal?

Is the dismissal legally correct regarding the contents?

Is the dismissal not prohibited due to timely limitations?

Is the dismissal not prohibited due to special laws?

Was the statutory period of notice adhered to?

Figure 3: JuriSearch clarification module as iTree (large) and oneQ UI (small). AND/OR rules for rating the (sub-)questions (A) are visually represented; reversed question example (underneath B); dummy node example, only serving for rule connection mo-deling (above B); four simple buttons (B) for rating the questions (Y:yes—N:no—?:neutral—X/empty:retract), rating highlighted by background color (C); core issue rating prominently displayed and updated continuosly (D); additional information displayed in separate panel when mouse-overing question (E); anytime feedback/data collection features integrated in UI header (F); clarifi-cation core component in oneQ style (G) always displays current active question with additional-information panel, previously answered questions remain presented in a more condensed view.

so that the most relevant questions and distinctions—from the users’ point of view, e.g., typical reasons for dismis-sal, size of company, etc.—also appear on rather top le-vels, sure posing a difficult trade off between legal cor-rectness/schematic thinking and understandability; yet it apparently could greatly contribute to tailoring the UI to the users in enabling them to bring in their own perspecti-ve and knowledge in the dialog. Thus a further refinement of the knowledge base with regards to a clear, easily un-derstandable language and structure turned out indis-pensable. Another interesting finding was the fact, that in 4 (19%) cases, the real meaning of the -?- button as an answer alternative was not grasped; users rather expected the system to display more elaborate explanations on the issue at hand or to open up the next refinement level of the questions instead of receiving just a rating of the cur-rent question. Similarly, the X/empty button—designated to clearing a previously entered answer—was not intuiti-vely understood in 3 (14%) cases.

5 Conclusions In this paper, we claimed the importance of a careful

UI/interaction design for web-based, knowledge-based systems. Regarding the consultation systems’ sub-class clarification systems, we suggested iTree as novel UI/interaction style for increased efficiency and usability. In a first comparative user study from the legal domain, an initial iTree prototype as well as an alternative, one-question style prototype were implemented using the prototyping and knowledge systems engineering tool ProKEt. The results suggest, that iTree generally is a favorable UI style for clarification systems, that supports free, explorative system usage and thus provides skill-building potential on the side of the users. Yet, the study also showed the need to rework the knowledge base of the system, regarding both the question wording as well as their structuring. One assumption requiring further studies is, that the legal iTree at its current state is satisfying for legal experts, whereas a restructured system could be more appropriate for non-expert users. Additionally, we plan on developing and evaluating similar iTree systems for the medical domain. This raises the requirement of more fine-granular rating options, e.g., by scoring rules. Finally, further experimentation with potential UI enhan-cements is intended to help improve the iTree concept; one such idea is the integration of an interactive system state preview that is overlaid when mouse-overing the respective answer option.

Acknowledgements We thank the RenoStar (Großwallstadt, GER) corpora-

tion for valuable discussions and cooperation.

References [Angele et al., 1998] Angele, J., Fensel, D., Landes, D.,

and Studer, R. (1998). Developing Knowledge-Based Systems with MIKE. Automated Software Engineering: An International Journal, 5(4):389–418.

[Baumeister et al., 2011] Baumeister, J., Reutelshoefer, J., and Puppe, F. (2011). KnowWE: A Semantic Wiki for Knowledge Engineering. Applied Intelligence, 35(3):323–344.

[Brooke, 1996] Brooke, J. (1996). SUS: A quick and dirty usability scale. In Jordan, P. W., Weerdmeester, B., Thomas, A., and Mclelland, I. L., editors, Usability evaluation in industry. Taylor and Francis, London.

[Constantine and Lockwood, 1999] Constantine, L. L. and Lockwood, L. A. D. (1999). Sofware for Use: A Practical Guide to the Models and Methods of Usage-Centered Design. Addison-Wesley Professional.

[Duan et al., 2005] Duan, Y., Edwards, J. S., and Xu, M. X. (2005). Web-based expert systems: benefits and challenges. Information & Management, 42:799–811.

[Freiberg and Puppe, 2012] Freiberg, M. and Puppe, F. (2012). Prototyping-based Usability-oriented Enginee-ring of Knowledge-based Systems. In Proceedings of Mensch und Computer 2012 (to appear).

[Freiberg et al., 2012] Freiberg, M., Striffler, A., and Puppe, F. (2012). Extensible prototyping for pragmatic engineering of knowledge-based systems. Expert Sys-tems with Applications, 39(11):10177 – 10190.

[Hart, 2006] Hart, S. G. (2006). Nasa-Task Load Index (Nasa-TLX); 20 Years Later. In Human Factors and Ergonomics Society Annual Meeting, volume 50.

[Leichtenstern and André, 2010] Leichtenstern, K. and Andre , E. (2010). MoPeDT: features and evaluation of a user-centred prototyping tool. In Proceedings of the 2nd ACM SIGCHI Symposium on Engineering Interac-tive Computing Systems, EICS ’10, pages 93–102, New York, NY, USA. ACM.



[Patil et al., 2009] Patil, S. S., Dhandra, B. V., Angadi, U. B., Shankar, A. G., and Joshi, N. (2009). Web based expert system for diagnosis of micro nutrients’ defi-ciencies in crops. In Proceedings of the World Con-gress on Engineering and Computer Science, volume 1, pages 20–22.

[Rahimi et al., 2007] Rahimi, S., Gandy, L., and Moghar-reban, N. (2007). A web-based high-performance mul-ticriteria decision support system for medical diagnosis: Research articles. International Journal of Intelligent Systems, 22:1083–1099.

[Schreiber et al., 2001] Schreiber, G., Akkermans, H., Anjewierden, A., de Hoog, R., Shadbolt, N., de Velde, W. V., and Wielinga, B. (2001). Knowledge Enginee-ring and Management—The CommonKADS Methodo-logy. MIT Press, 2 edition.

[Tomic et al., 2006] Tomic, B., Jovanovic, J., and De-vedzic, V. (2006). JavaDON: an open-source expert system shell. Expert Systems with Applications, 31(3):595 – 606.

[URL d3web, 2012] http://d3web.sourceforge.net/ (last checked Apr. 15th, 2012).

RESUBMISSIONConfidence in Workflow Adaptation

Mirjam Minor, Mohd. Siblee Islam and Pol SchumacherUniversity of Trier

Department of Business Information Systems II D-54286 Trier, Germany[minor|islam|pol.schumacher]@uni-trier.de

AbstractThis paper is on assessing the quality of adapta-tion results by a novel confidence measure. Theconfidence is computed by finding evidence forpartial solutions from introspection of a hugecase base. We assume that an adaptation resultcan be decomposed into portions, that the prove-nance information for the portions is available.The adaptation result is reduced to such portionsof the solution that have been affected by thechange. Furthermore, we assume that a simi-larity measure for retrieving the portions from acase base can be specified and that a huge casebase is available providing a solution space. Theoccurrence of each portion of the reduced solu-tion in the case base is investigated during an ad-ditional retrieval phase after having adapted thecase. Based on this idea of retrieving portions,we introduce a general confidence measure foradaptation results. It is implemented in the areaof workflow adaptation. A graph-based represen-tation of cases is used. The adapted workflowis reduced to a set of sub-graphs affected by thechange. Similarity measures are specified for agraph matching method that implements the in-trospection of the case base. Experimental re-sults on workflow adaptations from the cookingdomain show the feasibility of the approach. Thevalues of the confidence measure have been eval-uated for three case bases with a size of 200,2,000, and 20,000 cases each by comparing themwith an expert assessment.

RESUBMISSIONSemantic Alliance: A Framework for Semantic Allies

Catalin David, Constantin Jucovschi, Andrea Kohlhase and Michael KohlhaseComputer Science

Jacobs University Bremenhttp://kwarc.info

AbstractWe present an architecture and software frame-work for semantic allies: Semantic systems thatcomplement existing software applications withsemantic services and interactions based on abackground ontology. On the one hand, ourSemantic Alliance framework follows an inva-sive approach: Users can profit from semantictechnology without having to leave their accus-tomed workflows and tools. On the other hand,Semantic Alliance offers a largely application-independent way of extending existing (openAPI) applications with MKM technologies. TheSemantic Alliance framework presented in thispaper consists of three components: i.) a uni-versal semantic interaction manager for given ab-stract document types, ii.) a set of thin APIs real-ized as invasive extensions to particular applica-tions, and iii.) a set of renderer components forexisting semantic services.We validate the Se-mantic Alliance approach by instantiating it witha spreadsheet-specific interaction manager, thinAPIs for LibreOffice Calc 3.4 and MS Excel’10.

FGWM-2012 - DAI-Labor · Jochen Reutelsh ofer, Joachim Baumeister, Georg Fette and Frank Puppe...

Documents

Transcript of FGWM-2012 - DAI-Labor · Jochen Reutelsh ofer, Joachim Baumeister, Georg Fette and Frank Puppe...