HealthFinland —Finnish Health Information on the Semantic...

14
HealthFinland —Finnish Health Information on the Semantic Web Eero Hyv ¨ onen, Kim Viljanen, and Osma Suominen Semantic Computing Research Group (SeCo), Helsinki University of Technology (TKK), Laboratory of Media Technology University of Helsinki, Department of Computer Science [email protected], http://www.seco.tkk.fi/ Abstract. This paper shows how semantic web techniques can be applied to solving problems of distributed content creation, discovery, linking, aggregation, and reuse in health information portals, both from end-users’ and content pub- lishers’ viewpoints. As a case study, the national semantic health portal HEALTH- FINLAND is presented. It provides citizens with intelligent searching and brows- ing services to reliable and up-to-date health information created by various health organizations in Finland. The system is based on a shared semantic metadata schema, ontologies, and ontology services. The content includes metadata about thousands of web documents such as web pages, articles, reports, campaign in- formation, news, services, and other information related to health. 1 Introduction Health information on the web is provided by different independent organizations of varying levels of trustworthiness, is targeted to both laymen and experts, is available in various forms, and is written in different languages. The difficulty of finding relevant and trustworthy information in this kind of heterogeneous environment creates an ob- stacle for citizens concerned about their health. Portals try to ease these problems by collecting content into a single site [1]. Portal types include service portals collecting a large set of services together into a localized miniature version of the web (e.g., Yahoo! and other “start pages”), community portals [2] acting as a virtual meeting place of a community, and information portals [3] acting as hubs of data. This paper discusses problems concerning information portals when publishing health information on the web for the citizens. We consider both the publishers’ and the end-users’ viewpoints. A distributed semantic web 1 content publishing model has been developed for health organizations, based on a shared metadata schema, ontologies, and ontology services, by which the content is created cost-effectively by independent content producers at different locations. Our system aggregates the content and makes it semantically inter- operable to be reused in different applications without modifying it. To test and demonstrate the approach, we have created an operational prototype of the national semantic health information portal “HEALTHFINLAND—Finnish Health 1 http://www.w3.org/2001/SW/

Transcript of HealthFinland —Finnish Health Information on the Semantic...

Page 1: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

HealthFinland—Finnish Health Information on the Semantic Web

Eero Hyvonen, Kim Viljanen, and Osma Suominen

Semantic Computing Research Group (SeCo),Helsinki University of Technology (TKK), Laboratory of Media Technology

University of Helsinki, Department of Computer [email protected], http://www.seco.tkk.fi/

Abstract. This paper shows how semantic web techniques can be applied tosolving problems of distributed content creation, discovery, linking, aggregation,and reuse in health information portals, both from end-users’ and content pub-lishers’ viewpoints. As a case study, the national semantic health portal HEALTH-FINLAND is presented. It provides citizens with intelligent searching and brows-ing services to reliable and up-to-date health information created by various healthorganizations in Finland. The system is based on a shared semantic metadataschema, ontologies, and ontology services. The content includes metadata aboutthousands of web documents such as web pages, articles, reports, campaign in-formation, news, services, and other information related to health.

1 Introduction

Health information on the web is provided by different independent organizations ofvarying levels of trustworthiness, is targeted to both laymen and experts, is available invarious forms, and is written in different languages. The difficulty of finding relevantand trustworthy information in this kind of heterogeneous environment creates an ob-stacle for citizens concerned about their health. Portals try to ease these problems bycollecting content into a single site [1]. Portal types include service portals collecting alarge set of services together into a localized miniature version of the web (e.g., Yahoo!and other “start pages”), community portals [2] acting as a virtual meeting place of acommunity, and information portals [3] acting as hubs of data. This paper discussesproblems concerning information portals when publishing health information on theweb for the citizens. We consider both the publishers’ and the end-users’ viewpoints.A distributed semantic web1 content publishing model has been developed for healthorganizations, based on a shared metadata schema, ontologies, and ontology services,by which the content is created cost-effectively by independent content producers atdifferent locations. Our system aggregates the content and makes it semantically inter-operable to be reused in different applications without modifying it.

To test and demonstrate the approach, we have created an operational prototype ofthe national semantic health information portal “HEALTHFINLAND—Finnish Health

1 http://www.w3.org/2001/SW/

Page 2: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

Information of the Semantic Web”2. The content for the prototype (ca. 6000 web docu-ments) was created by the National Public Health Institute (KTL)3, the UKK Institute4,the Finnish Institute of Occupational Health5, the national Suomi.fi citizen’s portal6,and the Ministry of Justice7, and new organizations are joining in.

In the following, problems of finding and producing health information on the webare first outlined. After this the content creation model of HEALTHFINLAND and theportal itself are presented.

2 Problems of Mediating Health Information

A citizen searching for health information on the web faces many challenges:

1. Content discovery. The discovery of relevant content is difficult because it oftenrequires prior knowledge of the administrative organization providing the contents.

2. Outdated and missing linkage. After finding a piece of interesting information, it isoften tedious and difficult to find related relevant web resources. Furthermore, whenuseful links are given on a web page, they outdate quickly. When new informationis entered in a site or old information changed or removed, the links in existingpages cannot be updated automatically but refer to older information, or even non-existing information.

3. Content aggregation. Satisfying an end-user’s information need often requires ag-gregation of content from several information providers, which is difficult if het-erogeneous content is provided by several independent web sites. For example, ifa baby is born in your family, relevant information related to the situation may beprovided by health care organizations, social organizations, the church, legal ad-ministration, and others.

4. Quality of content. The trustworthiness of the information on the web pages varies.In many cases it is difficult know whether a content is based on scientific results orlaymans’ opinions and rumors, or whether it is motivated by commercial interests.

5. Matching end-users’ expertise level. There are lots of medical information availablethat is targeted to experts rather than ordinary citizens. Providing and finding theinformation on the right level of user expertise is a challenge that is very evidentin the medical domain where, e.g., the terminology used by doctors and contentproviders is very different from the terminology used by citizens in expressing theirneeds and interests.

From the viewpoint of the health organizations, creating health information to citi-zens is problematic in many ways:

2 http://www.seco.tkk.fi/applications/tervesuomi/3 http://www.ktl.fi/4 http://www.ukkinstituutti.fi/5 http://www.ttl.fi/6 http://www.suomi.fi/7 http://www.finlex.fi/

Page 3: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

1. Duplicated work. Several organizations create overlapping content, which is inmany cases a waste of time and money and confusing to the end-user. For example,in Finland the governmental citizen portal Suomi.fi has a section for governmen-tal health information containing material partly overlapping with those availablethrough the sites of the Finnish Centre of Health Promotion, and the health pagesof the national broadcasting company YLE. These organizations share the goal ofproviding free health information to citizens and are not competing with other. Inour vision, similar content should in such situations be created only once and reusedrather than recreated by others.

2. Difficulty of reusing content. Content in portals is usually annotated for the pur-pose of presenting it in a particular portal and for the particular purpose of theorganization managing the portal. This makes it difficult and expensive for otherorganizations to reuse content across portals even if the portal owners were willingto do this. For example, in our case, a newspaper would be willing to publish linksto the governmental HEALTHFINLAND portal to enrich their health related newsarticles, and the portal would definitely like to promote its health information tothe readers of the online newspaper. However, a cost-effective way to do this withminimal changes in current content management systems (CMS) is needed.

3. Internal and external link maintenance. The problems of maintaining links up-to-date is costly and tedious from the site maintenance viewpoint, especially whendealing with links to external sites to which the maintainer has no control.

4. Indexing (annotation) problems. Finding the right keywords and other metadatadescriptions for web pages and documents is difficult and time consuming for in-formation producers. The vocabularies used, such as MeSH8, UMLS9 or SNOMEDCT10, are very large and require expertise to use.

5. Quality control. There are several quality issues involved when publishing healthinformation: 1) Quality of the content creation process (e.g. regular reviews andupdates of published material). 2) Quality of the content itself (e.g., errors in themedical subject matter, is the content readable and written for the correct audience).3) Quality of additional information on pages (e.g., it is advisable to show the dateof publication on each page). 4) Quality of the metadata. For example, one indexermay use only few general keywords while another prefers a longer detailed list,which leads to problems of unbalanced metadata.

Much of the semantic web11 content will be published using semantic portals [4]based on web standards such as RDF12 and OWL13. In MUSEUMFINLAND14 [5], asemantic web model and portal was created in the cultural domain for distributed se-mantic content creation [6], aggregation, and provision to end-users using semantic

8 http://www.nlm.nih.gov/mesh/9 http://umlsinfo.nlm.nih.gov

10 http://www.snomed.org/snomedct/11 http://www.w3.org/2001/sw/12 http://www.w3.org/RDF/13 http://www.w3.org/TR/owl-features/14 http://www.museosuomi.fi/

Page 4: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

search and browsing services. This approach has been shown to be applicable in differ-ent domains [1, 7], and it was also applied to HEALTHFINLAND. In the following, weshow how HEALTHFINLAND develops the idea of semantic portals further and appliesit in practice to create a national publication channel for health information targeted tocitizens.

3 Overview of the HEALTHFINLAND Approach

In traditional web publishing, content creators publish web pages and link them togetherindependently from each other. Content management systems (CMS) and portals areused to aggregate related material collections within one site, and to provide local searchand linking services. Linking between sites is usually done manually. Search enginesare used to provide content aggregation services on the global cross-site level.

In HEALTHFINLAND we wanted to create a new kind of collaborative distributedcontent creation model for publishing health information on the web in order to solvethe problems listed in section 2.

The first idea of the model is to minimize duplicate redundant work and costs in cre-ating health content on the national level by producing it only once by one organization,and by making it possible to reuse the content in different web applications by the otherorganizations, not only in the organization’s own portal. This possibility is facilitated byannotating the content locally with semantic metadata based on shared ontologies, andby making the global repository available by a semantic portal and by open APIs forcreating mash-up web services. This is a generalization of the idea of “multi-channelpublication” of XML, where a single syntactic structure can be rendered in differentways, but on the semantic metadata level and using RDF: semantic content is reusedthrough multi-application publication.

The second key idea behind HEALTHFINLAND is to try to minimize the mainte-nance costs of portals by letting the computer take care of semantic link maintenanceand aggregation of content from the different publishers. This possibility is also basedon shared semantic metadata and ontologies. New content relevant to a topic may bepublished at any moment by any of the content providers, and the system should be ableto put the new piece of information in the right context in the portal, and automaticallylink it with related information.

The third major idea of HEALTHFINLAND is to provide the end-user with intelligentservices for finding the right information based on her own conceptual view of health,and for browsing the contents based on their semantic relations. The views and vo-cabularies used in the end-user interface may be independent of the content providers’organizational perspective, and are based on layman’s vocabulary that is different fromthe medical expert vocabularies used by the content providers in indexing the content.

Figure 1 depicts an overview of the HEALTHFINLAND system. The content providerson the left produce web pages, documents, and other resources of interest along theirorganizational interests as before for their own purposes (“primary applications” in thefigure). However, the content is annotated by using a shared metadata schema andontologies for the others to use, too. Selected content is then harvested into a globalknowledge base (center of the figure) to be reused in “secondary applications”. In this

Page 5: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

Fig. 1. An overview of the HEALTHFINLAND content creation and reuse process.

paper, we focus on one application in particular, the semantic portal HEALTHFINLANDthat provides citizens with information services to the global health information repos-itory. We will also briefly show how external organizations can reuse the semanticcontent cost-effectively with semantic mash-up service components called “floatlets”in the sprit of Google Maps15 and AdSense16, but generalized on the semantic level.The figure depicts an enhanced portal “Portal 2” in which the content of the primaryapplication is enriched by, e.g., semantic recommendation links to content pages inHEALTHFINLAND.

In the following, the metadata schema and ontologies used in the system are firstoutlined.

4 Ontological Infrastructure

The ontological infrastructure of HEALTHFINLAND consists of two major components:1) A metadata schema, i.e. an annotation ontology that specifies what elements areused for describing the web documents to be included in the system, and what kindof values the elements (properties) can take. The metadata schema is shared by allorganizations creating the content and ensures syntactic interoperability of the content.2) A set of ontological vocabularies whose concepts are used to fill in values of themetadata schema. The ontologies are also shared by the organizations, and their usageensures semantic interoperability of the content.

15 http://maps.google.com16 http://www.google.com/adsense/

Page 6: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

4.1 Metadata Schema

The HEALTHFINLAND portal requires the web documents used in the system to bedescribed in a uniform and machine-understandable manner. A metadata schema spec-ifies a set of fields (properties) which are used for presenting information about eachdocument. The values of the metadata fields are either human-readable text (e.g., title),structured strings (e.g., publication date) or shared, explicitly identified ontological con-cepts (e.g., the subject classification). Some fields are obligatory and some fields mayexist more than once. In addition to being a formal specification of what is requiredfrom the content producers, the schema can be used for, e.g., automatically generat-ing a user interface for creating metadata conforming to the schema, and for automaticcontent validation and feedback generation before publishing the content in the portal[8].

The metadata schema (Table 1) is based on the Dublin Core Element Set17, alongwith refinements introduced in DCMI Terms18. In addition, to allow a more detailed de-scription of the required metadata, we have introduced three extensions to Dublin Core17 http://dublincore.org/documents/dces/18 http://dublincore.org/documents/dcmi-terms/

Table 1. HEALTHFINLAND Metadata Schema. Mandatory fields are marked in bold. Cardinali-ties are presented in the column C.

Name QName C Value type Value range

Gen

eral

met

adat

a

Identifier dc:identifier 1 URILocator ts:url 0..1 URLTitle dc:title 1a Free text Non-empty string.Abstract dcterms:abstract 1a Free text Non-empty string.Language dc:language 1..* String RFC 3066Publication time dcterms:issued 1 String W3CDTF (ISO 8601)Acceptance time dcterms:dateAccepted 0..* String W3CDTF (ISO 8601)Modification time dcterms:modified 0..* String W3CDTF (ISO 8601)Publisher dc:publisher 1..* Instance foaf:OrganizationCreator dc:creator 0..* Instance foaf:Organization, foaf:Person or foaf:Group

Con

tent

clas

sific

atio

n

Subject dc:subject 1..* Concept YSO, MeSH and HPMulti OntologiesAudience dcterms:audience 1..* Concept Audience OntologyGenre ts:genre 1..* Concept Genre OntologyPresentation type dc:type 1..* Concept DCMI Type vocabularyFormat dc:format 1 String IANA MIME typesMedium dcterms:medium 1 Concept Medium OntologySpatial coverage dcterms:spatial 0..* String or

conceptDCMI Point, DCMI Box or Location Ontology

Temporal coverage dcterms:temporal 0..* String orconcept

W3CDTF, DCMI Period or Time Ontology

Rel

atio

ns

Part of dcterms:isPartOf 0..* Document URIRights dc:rights 0..* Free text or

documentURI or textual description

Source dc:source 0..* Free text ordocument

URI (e.g., ISBN) or bibliographical reference

Reference dcterms:references 0..* Free text ordocument

URI (e.g., ISBN) or bibliographical reference

Translation of ts:isTranslationOf 0..* Document URIFormat of dcterms:isFormatOf 0..* Document URI

a Multilingual values are allowed, but only one value in each language.

Page 7: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

elements: 1) The dc:type field has been refined with a ts:genre field19 to distinguishbetween the technical type of the document (presented using DCMI Type vocabulary)and the content genre, such as News item, Organizational information and Research (de-scribed in our Genre ontology). 2) The dc:identifier is extended with an (optional) ts:urlfield to distinguish between non-accessible identifiers and document locators. 3) Thefinal extension is the ts:isTranslationOf field which extends the dcterms:isVersionOf,and is used for presenting the relation between language translations of documents. Themetadata schema is specified in detail in [9].

Table 2. Examples of how metadata is presented in RDF/XML and XHTML.

RDF/XML XHTMLFree text <dc:title>Rokotteiden

havittaminen</dc:title><meta name="DC.title"content="Rokotteiden havittaminen" />

String <dc:language><dcterms:RFC3066><rdf:value>fi</rdf:value></dcterms:RFC3066></dc:language>

<meta name="DC.language"scheme="DCTERMS.RFC3066"content="fi" />

Concept <dc:subject rdf:resource="http://www.yso.fi/onto/yso/p123"/>

<link rel="DC.subject"href="http://www.yso.fi/onto/yso/p123"/>

The metadata in HEALTHFINLAND is intended to be presented using RDF, con-forming to the recommendations for expressing Dublin Core in RDF [10]. A subsetof the metadata can also be embedded in (X)HTML pages using META and LINK ele-ments based on the Dublin Core recommendation [11]. The HTML embedded metadatasolution has some limitations, because not all relevant documents are in HTML formatand advanced RDF metadata structures, such as defining an instance with a certain URI,can not be done using the HTML META and LINK tags. Therefore, the RDF presen-tation is recommended. Examples of how metadata is expressed in RDF and HTML isshown in Table 2.

The RDF and/or HTML embedded metadata is published for the HEALTHFINLANDportal by making it available on a public WWW server where it can be accessed re-gurlarly by the HEALTHFINLAND metadata harvester which fetches the content fromthe content providers to a centralized metadata repository (cf. Figure 1). During the har-vesting, 1) the content is transformed into RDF (if originally presented in HTML), 2)missing values are replaced with default values when possible, and 3) the RDF is vali-dated against the metadata schema and other validation rules. Each metadata producergets a report of warnings, errors and other problems that were encountered during har-vesting and validating the content. If some parts or all of the metadata is unacceptabledue to serious errors, the metadata is discarded until necessary corrections are made.Otherwise, the metadata is added to and published in the HEALTHFINLAND portal.

19 Namespace ts refers to the Finnish name TerveSuomi of HEALTHFINLAND.

Page 8: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

4.2 Ontologies

Semantic interorerability in HEALTHFINLAND is obtained by using a set of sharedontologies for filling in the values of the metadata schema. The ontologies include aMedium Ontology containing resources for representing different media types (Webpage, CD, DVD, etc.), an Audience Ontology representing categories of people, suchas genre groups, professional groups, risk groups, and age groups, a Place Ontologycontaining geographical places (e.g., Finland, Helsinki, etc.) in a part-of hierarchy, aGenre Ontology for genre types (news, game, etc.), DCMI type ontology containingmedia types (text, sound, video, etc.), and a Time Ontology. In the future, custom madeorganizational vocabularies can also be used, provided that they are linked with theHEALTHFINLAND ontologies.

The most important ontologies in HEALTHFINLAND are the three core subject do-main ontologies that are used for describing the subject matter of web contents:

1. The Finnish General Upper Ontology (YSO)20 that includes approximately 20 000concepts. The YSO ontology was created by transforming the General FinnishThesaurus YSA21 into RDF/OWL format using the Protege editor22 and by manu-ally crafting the concepts into full-blown rdfs:subClassOf hierarchies [12]. YSA iswidely used in Finland for indexing various kinds of content, e.g. in libraries.

2. The international Medical Subject Headings (MeSH) which includes approximately23 000 concepts. The Finnish translation of MeSH, FinMeSH, was developed bythe Finnish Medical Society Duodecim23 and was acquired for HEALTHFINLANDas a database. The vocabulary was transformed into the SKOS Core format24 with-out changing the semantics of the vocabulary or its structure [13].

3. The European Multilingual Thesaurus on Health Promotion 25 (HPMULTI), whichincluded a Finnish translation. HPMULTI contains approximately 1200 conceptsrelated specifically to health promotion. HPMULTI was transformed into RDFSKOS in the same way as FinMeSH.

All three ontologies were needed to cover the subject matter of the portal prop-erly. YSO is broad but too general w.r.t. detailed medical content. On the other hand,MeSH contains lots of useful medical concepts and is widely used in the health sector,but is focused on clinical healthcare. HPMULTI complements the two vocabularies byfocusing on health promotion terminology.

5 Distributed Semantic Content Creation

A major challenge in the distributed content creation model of HEALTHFINLAND ishow to facilitate the cost-effective production of descriptive, semantically correct high-quality metadata. In HEALTHFINLAND three ways of creating metadata are considered20 http://www.seco.tkk.fi/ontologies/yso/21 http://www.vesa.lib.helsinki.fi22 http://protege.stanford.edu23 http://www.duodecim.fi24 http://www.w3.org/2004/02/skos/core/25 http://www.hpmulti.net/

Page 9: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

and supported: 1) Enhancing existing web content management systems (CMS) withontology mash-up service components for producing semantic metadata. 2) Using abrowser-based metadata editor for annotating web content. 3) Automatic conversion ofmetadata. These approaches are outlined below.

5.1 Enhancing an Existing CMS with Ontology Services

Most content providers in HEALTHFINLAND use a CMS for authoring, publishing andarchiving content on their website. A typical CMS supports creation of textual meta-data about documents, such as title and publication time, but not ontological annota-tions. This would require that the system has functionalities supporting ontology-basedannotation work, e.g., concept search for finding the relevant concepts (identified withURIs), concept visualization for showing the concept to the user, and storing conceptsalong other information about the documents. The CMS should also be able to exportthe metadata, preferably in RDF format, to be used by semantic web applications.

Currently, ontologies are typically shared by downloading them, and each applica-tion must separately implement the ontology support. To avoid duplicated work andcosts, and to ensure that the ontologies are always up-to-date, we argue that one shouldnot only share the ontologies, but also the funtionalities for using them by centralizedmash-up component services. Such services, e.g. Google Maps, have been found veryuseful and cost-effective in Web 2.0 applications for integrating new functionalities withexisting systems.

We have applied the idea of using mash-ups to provide ontology services for thecontent producers of HEALTHFINLAND by creating the ONKI Ontology Server frame-work26 [14]. ONKI provides ontological functionalities, such as concept searching,browsing, disambiguation, and fetching, as ready-to-use mash-up components that com-municate asynchronously by AJAX27 (or Web Service technologies) with the sharedontology server. The service integration can be easily done by changing only slightlythe user-interface component at the client side. For example, in the case of AJAX andHTML pages, only a short snippet of JavaScript code must be added to the web pagefor using the ONKI services.

The main functionality addressed by the ONKI UI components is concept findingand fetching. For finding a desired annotation concept, ONKI provides text search withsemantic autocompletion [15]. This means that when the annotator is typing in a string,say in an HTML input field of a CMS system, the system dynamically responds aftereach input character by showing the matching concepts on the ONKI server. By se-lecting a concept from the result list, the concept’s URI, label or other information isfetched to the client application.

Concept browsing can also be used for concept fetching. In this model, the userpushes a button on the client application that opens a separate ONKI Browser windowin which annotation concepts can be searched for and browsed. For each concept entry,the browser shows a Fetch concept button which, when pressed, transfers the currentconcept information to the client application. ONKI supports multilingual ontologies,

26 http://www.seco.tkk.fi/services/onki/27 http://en.wikipedia.org/wiki/Ajax (programming)

Page 10: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

has a multilingual user-interface, supports loading multiple ontologies, and can be con-figured extensively.

ONKI is implemented as a Java Servlet application running on Apache Tomcat28. Ituses the Jena semantic web framework29 for handling RDF content, the Direct Web Re-moting (DWR)30 library for implementing the AJAX functionalities, the Dojo JavaScripttoolkit31, and the Lucene32 text search engine.

5.2 Browser-based Metadata Editor

Some HEALTHFINLAND content providers can not add mash-up ontology support totheir CMS due to technical or economical reasons. Furthermore, some content providersdo not even have a CMS or they may not have access to the CMS that contains the con-tent, e.g., if the content originates from a third party. To support metadata productionsin these cases, we have created a centralized browser-based annotation editor SAHA [8]for annotating web pages. SAHA adapts automatically to different metadata schemas.In this case the HEALTHFINLAND schema is used. The schema element fields in SAHAcan be connected with ONKI ontology service components, providing concept findingand fetching services to the annotator, as discussed above.

5.3 Automatic Conversion

The third content producing method in HEALTHFINLAND is automatic conversion oforiginal data to HEALTHFINLAND metadata. This method is currently used in caseswhere metadata exists in a CMS, but it is in an incompatible format, does not containontological annotations (URIs) and/or some minor information is missing in the meta-data. Because the HEALTHFINLAND metadata schema is strongly based on Dublin Coreand because many content providers in Finland use thesauri (e.g., the Finnish GeneralThesaurus YSA and the Medical Subject Headings MeSH), the content can in manycases be transformed fairly accurately into ontological form. For example, some ju-ridic content produced by the Finnish Ministry of Justice is harvested for HEALTH-FINLAND. The metadata, targeted originally for the govermental Suomi.fi portal, usesa Dublin Core based metadata schema (JHS 143 recommendation [16]) and is automat-ically translated into the HEALTHFINLAND metadata format.

6 Intelligent Services to the End-User

The HEALTHFINLAND user interface is based on the faceted browsing (a.k.a. view-based search) paradigm [17, 18], which has been found useful in our earlier semanticportals, such as [5, 1, 7], and in other systems, such as SWED33 and MultimediaN34.28 http://tomcat.apache.org/29 http://jena.sourceforge.net/30 http://en.wikipedia.org/wiki/DWR (Java)31 http://dojotoolkit.org/32 http://lucene.apache.org/33 http://www.swed.org.uk/swed/index.html34 http://e-culture.multimedian.nl/demo/search

Page 11: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

keyword search

Recommendationlinks grouped

by genre

Topic facet

Selectedcategory:Exercise(Liikunta)

Life eventfacet

Group ofpeoplefacet

Selectedcategory:

youth(nuoret)

Body partfacet

Search resultsabout exercise

and youth

secondary facets

Figure 2. Portal user interface with semantic search, browsing and recommendations

A challenge in publishing health-related information in a citizens’ semantic portalis the gap between the citizens’ information needs and the professional conceptualiza-tions and terminology used in medical vocabularies. To bridge this gap and to enablean intuitive facet-based user interface for the portal, we constructed the search facetsby using a card sorting method [19] to elicitate how users tacitly group and organizeconcepts in the health domain. The new user-centric facets organize the material fromcitizens’ point of view, and they are mapped by the portal to concepts in the medicalontologies.

Page 12: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

The HEALTHFINLAND portal, like typical semantic portals, provides the end-userwith two basic services: 1) a search engine based on the semantics of the content and2) dynamic linking between pages based on the semantic relations in the underlyingknowledge base. The main facets of the portal are Topic, Life event, Group of people,and Body part. The facets can be seen in the left column in Figure 2. In addition, sec-ondary drop-down facets for constraining the search with a set of additional choices areprovided for Genre, Publisher, Publication year and Audience.

Keyword searches can be initiated at any point and can be combined with categorybrowsing. Traditional keyword search functionality has been semantically enhanced bytargeting not only content titles, descriptions and body text but also the facet categoriesand underlying ontology concepts, including non-preferred concept labels. Thus, syn-onyms and abbreviations can be used in keyword searches provided they are known inthe ontology.

The portal also provides recommendation links at several stages: 1) individual con-tent items (pages) are linked to related material, 2) search result listings provide “bestpicks”, and 3) concept pages link to related content. Recommendations are generatedusing ontological knowledge and are grouped according to genre (e.g. statistics, re-search activities, news items, laws) or language (e.g. similar content in English).

One problem with a portal approach and distributed content creation in generalis that when search results are provided as traditional hyperlinks, users are forcedto navigate between different web sites that each have their own navigation systemsand styling. Also, providing recommendation links across sites is challenging. TheHEALTHFINLAND portal will integrate selected content items that have been retrievedfrom affiliated websites directly into the portal interface, providing seamless naviga-tion and recommendation links in the proper context of the content page. Our solutionrequires that the content is marked up using a small amount of RDFa syntax35, whichhelps the metadata harvester extract the body content of suitable web pages, skippingnavigation elements and styling.

The HEALTHFINLAND portal also incorporates an alphabetical index of conceptsas well as a concept browser that can be used to browse the subject ontology and forconcept-based search of the content.

The portal is implemented as a Java Servlet application running on Apache Tomcat.It is built using the Tapestry framework36 and uses Jena for RDF functionality. Searchand recommendation functionality has been implemented using the Lucene search en-gine, which has been enhanced to handle category and concept queries.

7 Discussion

This paper addressed the problems of the citizen end-users (cf. Section 2) as follows:1) Content finding is supported by cross-portal semantic search, based on concepts andfacets rather than keywords. 2) The problem of outdated and missing links is easedby providing the end-user with semantic recommendations that change dynamically as

35 http://www.w3.org/TR/xhtml-rdfa-primer/36 http://tapestry.apache.org/

Page 13: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

content is modified. 3) Content aggregation is facilitated by end-user facets that collectdistributed but related information from different primary sources. 4) Quality of con-tent is maintained by including only trustworthy organizations as content producers. 5)End-users’ expertise level is taken into account by the metadata element “Audience”.Separation of end-user vocabularies from indexing vocabularies makes it possible forthe citizen to search and browse content using layman’s vocabulary although the contentis indexed by using professional medical terminology.

At the same time, the problems of content providers (cf. Section 2) are eased, too:1) Duplication of content creation can be minimized by the possibility of aggregatingcross-portal content. 2) Reusing the global content repository is feasible, as demon-strated by the semantic portal HEALTHFINLAND. By using mash-up techniques, ex-ternal applications, such as the primary applications of Figure 1, can reuse the contentprovided by secondary applications, such as HEALTHFINLAND. 3) Internal and exter-nal link managament problems are eased by the dynamic semantic recommendationsystem of the portal and the content aggregation mechanisms. 4) The tedious contentindexing task is supported cost-effectively by shared ontology service components formash-ups. 5) Metadata quality can be enhanced by providing indexers with ontologyservices by which appropriate indexing concepts can be found and correctly enteredinto the system.

The content creation model presented is based on a shared metadata schema andontologies as in [20]. However, the idea of sharing ontologies through ontology ser-vice components for mash-ups is new. The user interface is based on the faceted searchparadigm [17, 18], but integrated with semantic web ontologies and reasoning with se-mantic recommendations [21], as in [22]. A new feature of the system is the separationof end-user facets from indexing ontologies [23, 19], which is crucial in the medicaldomain. The card sorting approach [19] was found useful in accomplishing this.

This work is a part of the national semantic web ontology project FinnONTO37

2003–2007, funded mainly by the National Funding Agency for Technology Innovation(Tekes) and the Ministry of Social Affairs and Health. The HEALTHFINLAND project isco-ordinated by the National Health Institute in Finland. We thank E. Hukka, M. Holi,P. Lindgren, and J. Eerola for co-operation.

References

1. Sidoroff, T., Hyvonen, E.: Semantic e-goverment portals - a case study. In: Proceedingsof the ISWC-2005 Workshop Semantic Web Case Studies and Best Practices for eBusinessSWCASE05. (2005)

2. Staab, S., et al.: Semantic Community Web Portals. In: Proceedings of the 9th InternationalWorld Wide Web Conference, Amsterdam, The Netherlands, Elsevier (2000)

3. Reynolds, D., Shabajee, P., Cayzer, S.: Semantic Information Portals. In: Proceedings of the13th International World Wide Web Conference on Alternate track papers & posters, NewYork, NY, USA, ACM Press (2004)

4. Maedche, A., Staab, S., Stojanovic, N., Struder, R., Sure, Y.: SEmantic portaAL — the SEALapproach. Technical report, Institute AIFB, University of Karlsruhe, Germany (2001)

37 http://www.seco.tkk.fi/projects/finnonto/

Page 14: HealthFinland —Finnish Health Information on the Semantic Webiswc2007.semanticweb.org/papers/771.pdf · Information of the Semantic Web”2.The content for the prototype (ca. 6000

5. Hyvonen, E., Makela, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S., Junnila, M.,Kettula, S.: MuseumFinland – Finnish museums on the semantic web. Journal of WebSemantics 3(2) (2005) 224–241

6. Hyvonen, E., Saarela, S., Viljanen, K., Makela, E., Valo, A., Salminen, M., Kettula, S., Jun-nila, M.: A semantic portal for publishing museum collections on the web. In: Proceedingsof ECAI/PAIS 2004, Valencia, Spain. (2004)

7. Kansala, T., Hyvonen, E.: A semantic view-based portal utilizing Learning Object Metadata(2006) Semantic Web Applications and Tools Workshop, ASWC 2006, Beijing.

8. Valkeapaa, O., Alm, O., Hyvonen, E.: Efficient content creation on the semantic web usingmetadata schemas with domain ontology services (system description). In: Proceedings ofthe ESWC 2007, Springer–Verlag, Berlin (2007)

9. Suominen, O., Viljanen, K., Hyvonen, E., Holi, M., Lindgren, P.: TerveSuomi.fi:nmetatietomaarittely (Metadata schema for TerveSuomi.fi), Ver. 1.0 (26.1.2007). (2007)http://www.seco.tkk.fi/publications/.

10. Dublin Core Workgroup: Expressing qualified Dublin Core in RDF/XML (2002)http://dublincore.org/documents/dcq-rdf-xml/.

11. Dublin Core Workgroup: Expressing Dublin Core in HTML/XHTML meta and link ele-ments (2003) http://dublincore.org/documents/dcq-html/.

12. Hyvonen, E., Valo, A., Komulainen, V., Seppala, K., Kauppinen, T., Ruotsalo, T., Salmi-nen, M., Ylisalmi, A.: Finnish national ontologies for the semantic web—towards a contentand service infrastructure. In: Proceedings of International Conference on Dublin Core andMetadata Applications (DC 2005). (2005)

13. van Assem, M., Malaise, V., Miles, A., Schreiber, G.: A method to convert thesauri to skos.In: Proceedings of the Third European Semantic Web Conference (ESWC’06), Springer–Verlag, Berlin (2006)

14. Hyvonen, E., Viljanen, K., Makela, E., et al.: Elements of a national semantic contentinfrastructure—case Finland on the semantic web. In: Proceedings of the ICSC 2007, IEEEPress (2007)

15. Hyvonen, E., Makela, E.: Semantic autocompletion. In: Proceedings of the first Asia Se-mantic Web Conference (ASWC 2006), Beijing, Springer-Verlag, New York (2006)

16. JHS workgroup: JHS 143: Asiakirjojen kuvailun ja hallinnan metatiedot (2004)http://www.jhs-suositukset.fi/suomi/jhs143.

17. Pollitt, A.S.: The key role of classification and indexing in view-based searching. Technicalreport, University of Huddersfield, UK (1998) http://www.ifla.org/IV/ifla63/63polst.pdf.

18. Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., Lee, K.P.: Finding the flow inweb site search. CACM 45(9) (2002) 42–49

19. Suominen, O., Viljanen, K., Hyvonen, E.: User-centric faceted search for semantic portals.In: Proceedings of the ESWC 2007, Springer–Verlag, Berlin (2007)

20. Hyvonen, E., Salminen, M., Kettula, S., Junnila, M.: A content creation process for theSemantic Web (2004) Proceeding of OntoLex 2004: Ontologies and Lexical Resources inDistributed Environments, May 29, Lisbon, Portugal.

21. Viljanen, K., Kansala, T., Hyvonen, E., Makela, E.: Ontodella—a projection and linkingservice for semantic web applications. In: Proceedings of the 17th International Conferenceon Database and Expert Systems Applications (DEXA 2006), Krakow, Poland, IEEE (2006)

22. Hyvonen, E., Saarela, S., Viljanen, K.: Application of ontology based techniques to view-based semantic search and browsing. In: Proceedings of the First European Semantic WebSymposium, Heraklion, Greece, Springer–Verlag, Berlin (2004)

23. Holi, M., Hyvonen, E.: Fuzzy view-based semantic search. In: Proceedings of the 1st AsianSemantic Web Conference (ASWC2006), Beijing, China, Springer-Verlag (2006)