Semantic Representation of Tourism on the Internet

Post on 16-Feb-2017

217 views 2 download

Transcript of Semantic Representation of Tourism on the Internet

http://jtr.sagepub.com/Journal of Travel Research

http://jtr.sagepub.com/content/47/4/440The online version of this article can be found at:

 DOI: 10.1177/0047287508326650 2009 47: 440 originally published online 13 November 2008Journal of Travel Research

Zheng Xiang, Ulrike Gretzel and Daniel R. FesenmaierSemantic Representation of Tourism on the Internet

  

Published by:

http://www.sagepublications.com

On behalf of: 

  Travel and Tourism Research Association

can be found at:Journal of Travel ResearchAdditional services and information for    

  http://jtr.sagepub.com/cgi/alertsEmail Alerts:

 

http://jtr.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://jtr.sagepub.com/content/47/4/440.refs.htmlCitations:  

What is This? 

- Nov 13, 2008 OnlineFirst Version of Record 

- Apr 10, 2009Version of Record >>

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

440

Semantic Representation of Tourismon the Internet

Zheng XiangUniversity of North Texas, Denton

Ulrike GretzelTexas A&M University, College Station

Daniel R. FesenmaierTemple University, Philadelphia

With the huge amount of information available on the Internet and the increasing importance of online search, understand-ing the tourism domain is essential for effective online marketing. This study focuses on the semantic representation of thetourism domain with respect to information provided on tourism-related Web sites and travelers’ information needs asexpressed through search engine queries. The results show that huge discrepancies exist between the domain ontologyderived from tourism Web sites and the one emerging from user queries. This study offers useful insights into the challengeof representing tourism products and services through Web sites and provides directions for developing Internet-based systems that can better support travel planning.

Keywords: search engines; semantic Web; ontology; semantic representation; online marketing

The Internet can be considered a virtual galaxy withentities representing various kinds of tourism infor-

mation, among countless other domains. As such, thereare many questions one may ask regarding the nature ofonline travel information. Arguably, one of the mostimportant questions is whether the information currentlyavailable online matches the information needs of travel-ers. Identifying structures and gaps in online travelinformation is important as it can provide input for thedesign of Web sites and other Internet-based technolo-gies to better support travel planning (Carroll andThomas 1982; Crawford 2003; Hevner et al. 2004;Norman 1999; Werthner 1996). In addition, such knowl-edge is essential for developing effective strategies foronline tourism marketing (Adomavicius and Tuzhilin2005; Fesenmaier, Wöber, and Werthner 2006; Riedl andKonstan 2002).

This article builds on a series of studies that havefocused on understanding the online tourism domain andthe information needs and search strategies of its users(Fesenmaier, Wöber, and Werthner 2006; Gretzel andWöber 2004; Pan and Fesenmaier 2006; Wöber 2006;Xiang et al. 2007; Xiang, Wöber, and Fesenmaier 2008).In particular, Pan and Fesenmaier (2006) and Xiang et al.(2007) demonstrated that the tourism domain is repre-sented differently on the Internet for tourism providers

and consumers. Their findings, however, were based onsmall experimental studies and provide very limiteddescriptions of tourism on the Internet. Therefore, thegoal of this study is to obtain a more comprehensiveunderstanding of the online tourism domain using larger,more representative data sets derived from actual searchengines queries. It is posited that the results of this studyprovide useful implications for the design of tourism-specific search engines as well as the development ofpotentially more effective strategies for online destina-tion marketing.

Online Tourism Domain Ontologies

Tourism is a field that manifests unique characteristicsin the provision (supply) and consumption (demand) ofproducts (Smith 1988; Woodside and Dubelaar 2002).Tourism businesses and tourism-related organizationssuch as destination-marketing organizations present andpromote their offerings online using a language that islargely industry specific (Dann 1997; Pan and Fesenmaier

Journal of Travel ResearchVolume 47 Number 4

May 2009 440-453© 2009 SAGE Publications

10.1177/0047287508326650http://jtr.sagepub.com

hosted athttp://online.sagepub.com

Authors’ Note: We would like to thank Dr. Bertrand J. Jansen atPennsylvania State University for his generosity in sharing the searchengine transaction log files with us.

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

2006). The demand side, on the other hand, consists oftravelers and their respective information needs asexpressed through searches conducted online (Wöber2006). Thus, the online tourism domain is extremelylarge and complex and includes objects related to both thetourism industry and the traveler (Pan and Fesenmaier2006; Xiang, Wöber, and Fesenmaier 2008).

In computer and information science, an ontology iscomprised of a representational vocabulary with precisedefinitions of the meanings of terms plus a set of formalaxioms that constrain interpretation and well-formed useof these terms (Campbell and Shapiro 1995). Recentstudies of Internet-based technologies have shown thatan important limitation to the use of the Internet relatesto the domain representations (i.e., ontology) used tosupport online search (Berners-Lee 1999; Kim 2002).These studies indicate that a common vocabulary isessential because it is used to bridge the various “lan-guages” of computer systems and users so that meaning(i.e., semantics) can be established and shared betweenthe system and the user. The notion of the “SemanticWeb” has been developed to describe the Internet-basedinfrastructure that supports the machine-understandablerepresentation of the world (Berners-Lee 1999). From aninformation search perspective, emerging technologiessuch as search engines that have developed the ability to“understand” the semantics of user queries are only nowbeginning to be able to answer questions such as, “Canyou identify all destinations that have beautiful beachesin the US?” As such, understanding the semantic repre-sentation of tourism and its use in information searchappears to be an important key to the further develop-ment of search-related technologies and online market-ing strategies in tourism.

There has been limited research focusing on onlinerepresentation of the tourism domain (Pan and Fesenmaier2006; Wöber 2006; Xiang et al. 2007; Xiang, Wöber, andFesenmaier 2008). Wöber (2006), for example, exam-ined the visibility of destination-marketing organizationsand individual hotel operations in Europe among sixpopular search engines. The findings of this studyshowed that many tourism Web sites suffer from verylow rankings among the search results, making itextremely difficult for online travelers to directly accessindividual tourism Web sites through these searchengines. Pan and Fesenmaier (2006) found that the “lan-guage of tourism” (Dann 1997) is extremely rich; fur-thermore, their study indicated that the vocabularies usedon destination-marketing organization Web sites differedsubstantially from those of potential users. As such, theyconcluded that the richness in language and the differ-ences in perspectives make it very difficult for Internet

users to have a satisfying online search experience.Following from Pan and Fesenmaier, Xiang et al. (2007)examined language use among consumers and suppliersin describing a specific type of tourism product, that is,dining experiences. The results of this study are consis-tent with Pan and Fesenmaier’s in identifying the sub-stantial differences in how users and producers perceivedand represented travel-related products on the Internet.However, both studies were conducted in an experimen-tal setting by using a very limited number of student par-ticipants. Therefore, the findings of these studies cannotbe generalized to provide a rich description of the natureof the online tourism domain to a meaningful extent.

It is argued that Pan and Fesenmaier’s (2006) concep-tual framework, however, provides a useful foundationfor understanding the ways travelers utilize the tourismdomain in constructing a trip plan in that it describes thetourism domain as exhibiting a hierarchical, intercon-nected structure that can be represented as a semanticnetwork with nodes and interrelationships. Importantly,Pan and Fesenmaier showed that quantitative textualanalysis, particularly semantic network analysis, pro-vides a strong theoretical and methodological foundationwith which to describe the semantic nature of the onlinetourism domain. Quantitative textual analysis, accordingto Krippendorff (2004), is underpinned by theories incommunication and linguistics with a fundamentalassumption that relative frequency signifies the impor-tance of a specific word, and the meaning of the word isa function of the proximity to other words. In semanticnetwork analysis, the semantic structure, that is, “mean-ings,” of text can be constructed and represented in avariety of ways by establishing the associations betweendifferent concepts (Woelfel 1993). Different measuressuch as proximity and centrality and the strengths ofidentical relationships can be derived to measure thestructure of a semantic network and to compare the dif-ferences between two semantic structures (Carley 1997;Popping 2000). Proximity is one of the most importantmeasures in that it can be used to measure the interrela-tionships between words within a semantic network.Another important measure is “centrality,” whichdescribes the “prominence” of a node within the overallnetwork (Wasserman and Faust 1994), whereby wordswith higher centrality values represent dominant“themes” in the text. The similarities (or distance) matrixgenerated in text analysis can be used as input into mul-tivariate analyses such as cluster analysis and multidi-mensional scaling to assess both the content andstructure of a semantic network. For example, iciclemaps in clustering analysis and coordination mapsderived from multidimensional scaling can be used to

Xiang et al. / Semantic Representation of Tourism 441

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

assess the meanings of textual data and are, thus, deemeduseful for describing the semantic representation of onlinetourism (Gretzel et al. 2008). Last, Quadratic AssignmentProcedure is an analytical procedure that can be used tocompare the structural properties of two networks andestablish similarity or dissimilarity (Krackhardt 1987).

Research Questions

This study posits that travelers’ use of search enginesestablishes the basis with which to understand thenature of tourism domain ontologies. Search engines,such as Google, Yahoo!, and MSN, index a huge numberof Web pages on the Internet and thus serve as the“Hubble” telescope with which people learn about theentire virtual “galaxy” (Castells 2001; Spink et al.2002). Typically, the interaction between a traveler anda search engine interface begins when a traveler with aninitial search task and a particular mental model entersa query into a search engine textbox. At this stage, thetraveler’s mental model consists of a set of factorsincluding the traveler’s understanding of how the sys-tem (search engine) works and knowledge of thedomain as well as the search task itself (Gretzel, Hwang,and Fesenmaier 2006) and where query formulationinvolves the mapping of one’s mental model with thesystem selected (Marchionini 1997). Thus, a query canbe seen as the expression of the user’s information needsin the context of a search task.

Based on the query, the search engine retrieves andreturns a number of search results that “match” the key-word string and displays them in a predefined format (Yuand Meng 2003). If the search results match the trav-eler’s expectations, he or she will most likely examinethe links displayed on that page; if not, he or she mayenter new keyword(s) to start another round of thequery–Web site selection process. Thus, this sequence ofinteraction with a search engine interface involves thetraveler’s reading and understanding of the results of thesearch with respect to a mental model and then navigat-ing back and forth between the search engine interfaceand the travel information space (Jansen, Brown, andResnick 2007). Based on this understanding, it can beexpected that a mapping process occurs between thequery terms used by a traveler and the semantics on Webpages that are considered relevant. This implies, then,that the traveler makes a series of decisions based on thedata (i.e., the descriptions included in the search resultsas well as the Web pages chosen by the traveler) to whichhe or she has been exposed. As such, the use of searchengines for travel purposes can be seen as the interaction

between the demand and the supply of tourism throughone of the most important channels on the Internet.

Of particular interest to this study is the representation ofthe tourism domain through search engine results, that is,the supply, and the queries people use to search for tourism-related information, that is, the demand. The followingresearch questions were formulated to guide this study:

Research Question 1: What constitutes the onlinetourism domain as represented through a searchengine?

Research Question 2: What are the commonalities anddifferences between the language used in searchqueries and the language on tourism Web sites?

Research Method

This study was conducted in three phases to addresseach of these research questions. As shown in Figure 1,phase 1 focused on understanding the online tourismdomain from the supply perspective based on its repre-sentation through a general purpose search engine; phase2, on the other hand, focused on gaining domain knowl-edge from the demand perspective through tourism-related user queries using similar search engines. Finally,phase 3 built on the results of phase 1 and phase 2 andcompared the derived tourism domain ontologies fromtourism Web sites and user queries to understand thecommonalities and differences between the representa-tion of tourism from the demand and supply perspec-tives. This section details the overall research design aswell as data collection and analysis corresponding toeach of these three phases.

Research Design

A travel-planning scenario was used to mimic travel-ers’ use of keywords to query a search engine to plan atrip to a specific destination. Specifically, the city ofChicago was chosen as a convenient case due to its sta-tus as one of the largest urban tourist destinations in theUnited States as well as its diversity in cultural and his-torical resources for tourists. Then, a set of nine prede-fined keywords (i.e., “accommodation,” “activities,”“area,” “attractions,” “events,” “information,” “places,”“restaurants,” and “shopping”) that are most likely to beused by travelers was identified. In phase 1, these key-words, in combination with the destination name (i.e.,“Chicago”), were used to query Google. Based on theURLs provided by the search results, the text-based con-tents from tourism Web sites were extracted to represent

442 Journal of Travel Research

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

443

Ph

ase

1

Ph

ase

2

Ph

ase

3

Go

al

To u

nder

stan

d la

ngua

ge r

epre

sent

atio

n of

the

tour

ism

dom

ain

in s

earc

h en

gine

s fr

omth

e su

pply

per

spec

tive.

To u

nder

stan

d la

ngua

ge r

epre

sent

atio

n of

the

tour

ism

dom

ain

in s

earc

h en

gine

s fr

omth

e de

man

d pe

rspe

ctiv

e.

To u

nder

stan

d th

e si

mila

ritie

s an

d di

ffer-

ence

s in

the

lang

uage

rep

rese

ntat

ion

ofth

e to

uris

m d

omai

n be

twee

n th

e de

man

dan

d su

pply

per

spec

tives

.

Res

earc

h D

esig

n

-C

reat

e a

trav

el in

form

atio

n se

arch

sce

-na

rio,

that

is b

y m

imic

king

tra

vele

rs’u

seof

key

wor

ds t

o qu

ery

a se

arch

eng

ine

to p

lan

a tr

ip t

o a

dest

inat

ion,

to

colle

ct t

extu

al c

orpu

ses

that

rep

rese

ntth

e do

mai

n fr

om t

he s

uppl

y an

dde

man

d-si

des.

-C

ritic

al c

onsi

dera

tions

incl

uded

:

•S

elec

tion

of k

eyw

ords

tha

t ar

e m

ost

likel

y to

be

used

by

trav

eler

s

•S

elec

tion

of s

earc

h en

gine

s fo

r da

taco

llect

ion

Dat

a C

olle

ctio

n

-U

se a

set

of

keyw

ords

,w

hich

are

mos

t lik

ely

used

by

trav

eler

s fo

rpl

anni

ng a

trip

to

a sp

e-ci

fic d

estin

atio

n, t

oqu

ery

Goo

gle.

-E

xtra

ct a

sam

ple

of 5

0lin

ks f

rom

sea

rch

resu

ltsfo

r ea

ch q

uery

and

the

nte

xtua

l con

tent

fro

mW

eb p

ages

by

follo

win

gth

ese

links

to

obta

in t

hesu

pply

-sid

e co

rpus

.

-E

xtra

ct a

ctua

l use

rqu

erie

s fr

om a

set

of

sear

ch e

ngin

e lo

g fil

esus

ing

the

sam

e de

stin

a-tio

n na

me

and

keyw

ords

set

to o

btai

n th

ede

man

d-si

de c

orpu

s.

-C

ompa

re t

he d

eman

dsi

de a

nd s

uppl

y si

deon

tolo

gies

bas

ed o

n th

epr

evio

us t

wo

phas

es.

Dat

a A

nal

ysis

-Id

entif

y w

ords

in W

eb p

age

cont

ent

(tha

t is

the

sup

ply-

side

cor

pus)

tha

t re

pres

ent

the

dom

ain

onto

logy

.

-S

eman

tic n

etw

ork

anal

ysis

base

d on

cen

tral

ity m

ea-

sure

s es

tabl

ishe

d th

roug

hw

ord

asso

ciat

ion.

-F

ollo

w t

he s

ame

data

anal

ysis

pro

cedu

re a

sP

hase

1 t

o id

entif

y th

edo

mai

n on

tolo

gy in

use

rqu

erie

s (t

hat

is t

hede

man

d si

de c

orpu

s).

-R

atio

s of

com

mon

and

diffe

rent

wor

ds b

etw

een

the

two

onto

logi

es.

-Q

uadr

atic

Ass

ignm

ent

Pro

cedu

re t

o m

easu

rese

man

tic s

imila

rity

betw

een

the

two

onto

logi

esba

sed

on a

set

of

com

mon

wor

ds.

Fig

ure

1A

n Il

lust

rati

ve V

iew

of

the

Res

earc

h M

etho

ds

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

444 Journal of Travel Research

the tourism domain from the supply perspective. Inphase 2, the same keywords, together with the destina-tion name (i.e., “Chicago”), were used as “seeds” withwhich to extract user queries from a number of searchengine transaction logs.

Two critical considerations guided the development ofthis approach. First, the predefined keywords mustreflect the tourism domain in a comprehensive way. Theselection of the keywords was guided by both the classi-fication schemes used by the tourism industry and actualqueries used by travelers. Specifically, Web sites of sev-eral destination-marketing organizations locatedthroughout the United States were used as sources toidentify these keywords. That is, textual labels for thenavigational menus on these Web sites were extracted toobtain the categories by which destination-marketingorganizations organize their information (e.g., “accom-modation,” “attraction,” “events,” etc.). Also, an analysisof the publically available query logs of a European-based search engine (visiteuropeancities.info) were usedto provide a triangulation with the types of keywordslikely to be used by travelers (Wöber 2006). After dis-counting cultural and geographic specifics (e.g., the des-tination-marketing organization of Chicago had“theatre” as one of the top-level categories while othersdid not; churches and historic architectures were amongthe most searched items among European travelers),these nine keywords consistently occurred in both thesupply and demand sources and thus were consideredgeneric categories that represent online tourism.

Second, a number of search engines were selected toobtain data describing travel-related search queries fromthe demand perspective. Google was used as the sam-pling frame for the data collection for phase 1 as it rep-resents a state-of-the-art search technology on theInternet. Also, it is ranked the most popular searchengine on the Internet with reportedly an index ofapproximately 25 billion Web pages and 250 millionqueries a day; this represents approximately half of allthe queries occurring on the Internet (Bertolucci 2007;Brooks 2004; Burns 2007). Furthermore, it is consideredone of the most comprehensive text-based searchengines on the Internet (Bertolucci 2007). Phase 2 uti-lized a convenience sample of transaction log files fromthree search engines that are available publically,namely, Excite, AltaVista, and AlltheWeb, to understandthe semantic nature of tourism-related queries (Jansenand Spink 2005b). While these search engines might bedifferent from Google in certain aspects (i.e., rankingalgorithms, etc.), all of them have an interface similar toGoogle’s. That is, the interaction between a user and thesystem is supported by both a typical textbox in which

users type in queries and the search results’ being repre-sented in a list format. As such, it is argued that thesesearch engines can be considered essentially equiva-lent to Google, at least from a user-system interactionviewpoint.

Data Collection

Data collection in phase 1 followed a two-step proce-dure. First, the nine keywords were used to form querieswith the destination name (i.e., “Chicago”) and then thesearch results were extracted from Google. According tothe information-retrieval literature (e.g., Spink et al.2002), most search engine users (greater than 85%) donot view search results beyond the first three pages(assuming each page contains 10 search results, as is thestandard practice of most text-based search engines). Assuch, the first 30 URLs, which constitute the first threepages of search results, are most likely to be viewed bytravelers. In addition, another two search result pages(i.e., with 20 search results) were extracted, with onepage from the middle of the search results and anotherpage at the bottom of the search result set, to provide asample with more depth. Then, a Web crawler programwritten in the Perl programming language was used toretrieve Web page content by following these 450 (50search results × 9 search terms) URLs. The textual con-tent in the body of the Web pages was then parsed andsaved as the corpus to represent the supply-side domain.

In phase 2, user queries from three major searchengines were used for the analyses (i.e., three sets of logfiles from Excite, one from AltaVista, and two fromAlltheWeb, dated from September 1997 to May 2002).The detailed descriptions of these transaction logs areprovided in a number of publications by Spink and hercolleagues (Jansen and Molina 2006; Jansen and Spink2005a, 2005b; Jansen, Spink, and Pedersen 2005; Spinket al. 2004). Altogether, these transaction logs includedabout 11 million distinct queries. However, only the ninepredefined keywords in combination with the destinationname “Chicago” were extracted, resulting in a total of3,020 observations (i.e., search queries).

Data Analysis

Quantitative text analysis, employed as the primaryanalytic approach, followed a three-step design. Phase 1focused on identifying the semantics that represent thetourism domain based on the textual data extracted fromtourism Web pages. Phase 2 focused on evaluating thesemantics that were used by search engine users lookingfor travel information for Chicago. Last, phase 3 com-pared the results from the previous two phases with a

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

focus on assessing the commonalities and differences inthe words contained in the two data files.

Phase 1: Assessing the Online Tourism Domain from aSupply-Side Perspective

The goal of this analysis was to understand the seman-tic nature of the tourism domain from the supply per-spective. It included two steps: step 1 aimed to identifythe words that truly represent the domain, that is, thedomain ontology, and step 2 aimed to understand thesemantic structure of this ontology, focusing on the cen-tral words that link other words together in the text. Instep 1, a preprocessing procedure was carried out with thegoal to identify the “stop” words in the data including arti-cles, prepositions, conjunctions, and transitive verbs thatdo not contribute to the meaning of the text (e.g., “a,”“an,” “the,” “and,” “but,” and “also,” etc.). Then, theaggregated text file was imported into the statistical soft-ware SPSS to calculate the frequencies of each uniqueword. An examination of the distribution of the frequen-cies of all the unique words indicated that there are a hugenumber of unique words in the data and there are a rela-tively small number of words that are used often. A cut-off value of the first 787 words with the highestfrequencies was used to include those that best representthe tourism domain for two reasons: (1) the cumulativefrequencies of these words represent approximately 60%of the total frequencies of all unique words, and (2) thelowest frequency among this word is 45. Considering thistext represents 450 Web pages (3 × 50), words that havefrequencies of less than 45 only occur, on average, lessthan once in 10 Web pages. As such, it was assumed thatthey were rarely used on tourism Web sites. These 787words were then manually examined with the goal toidentify words that are representative of the tourismdomain. Words that were informative about the documentitself such as “file,” “total,” and “copyright” were identi-fied and dropped from the pool. This resulted in a finalpool of 364 words, which were then used to represent theonline tourism domain for Chicago.

The next step of the analysis identified the semanticstructure of the aggregated text using the identified 364words. Neural network software CATPAC (Woelfel1993) was used to assess the semantic associationsbetween the respective words. In CATPAC, the “neuron,”which represents a prespecified word, is initiated bypassing a “scanning window” of n consecutive wordsthrough the text while ignoring the pre-identified stopwords (Woelfel 1993). The proximity between the neu-ron and other words is measured and recorded. The pro-gram then reads the next group of k words, depending on

the slide size (i.e., the number of words by which thescanning window moves). If the slide size is 1, forexample, CATPAC moves one word further in the textand then reads the next word, and so on, until the full texthas been examined. The software will continuously iter-ate through the full text with a new neuron until all wordsin the prespecified word set have been exhausted. Thestructure of the semantic association can be representedby a square matrix of numbers where each row and col-umn represents a neuron (word), while the value of eachcell (an updateable weight) represents the strength ofconnections between the two neurons.

The scanning window size is a critical facet of textanalysis in that it has the potential to affect the resultingsemantic structure. Thus, an exploratory experiment wasconducted to determine the appropriate window size touse in the analysis. Specifically, proximity matrices gen-erated by CATPAC using various window sizes (from 1to 7) were used as input into hierarchical cluster analysisbased on the criterion that the window size should notlead to too many clusters (i.e., small clusters are disasso-ciated with each other) or only one cluster (i.e., all wordsare lumped together). A visual analysis of the results ofthe cluster analyses indicated that a window size of 5words was appropriate.

Since CATPAC does not output statistical measuresfor cluster adequacy and does not provide informationabout the clustering process (i.e., how and when wordsare linked together), the proximity matrix was thenimported into the social network analysis programUCINET (Borgatti, Everett, and Freeman 1992) to fur-ther examine the clustering solutions. As part of theanalysis, centrality measures were calculated to identifythe most prominent words as well as the semantic struc-ture of the text.

Phase 2: Assessing the Online Tourism Domain fromthe Demand Perspective

The primary goal of this analysis was to describe thesemantic nature of user queries related to Chicagotourism through search words extracted from six searchengine transaction logs. Analysis followed proceduressimilar to those used in phase 1, with the following dis-tinctions after taking into consideration the unique char-acteristics of search engine user queries (Spink 2002):(1) the descriptive analysis not only identified the uniquewords in user queries but also examined the length ofqueries; (2) the preprocessing of the data included iden-tifying the stop words, deleting sex-related queries, andmanually fixing user typos and misspellings; (3) thesemantic association was measured using CATPAC with

Xiang et al. / Semantic Representation of Tourism 445

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

446 Journal of Travel Research

a window size of 2 as the majority of queries were fairlyshort (one to four words); (4) each query was treated asan individual case; and (5) all unique words in the dataset were included in the analysis.

Phase 3: Assessing the Commonalities between theOnline Tourism Domain and User Queries

The goal of phase 3 analysis was to compare thesemantic structures identified in the first two phases of thestudy. Following Pan and Fesenmaier (2006), it is arguedthat if there is a high degree of commonality betweenthese two structures, there is a “match” between the sup-ply and demand sides, that is, the information provided bythe industry should be useful in helping users find whatthey need. Comparing the words and the semantic rela-tionships can, therefore, reveal discrepancies betweenwhat is offered and promoted by the industry and what issearched for by travelers. This analysis consisted of twosteps. First, common words shared by the online tourismdomain and user queries were identified. A commonalityratio was then calculated based on the number of commonwords and the number of different words to show thedegree to which the online tourism domain and userqueries have things in common. Second, an N-by-N prox-imity matrix (where N stands for the number of commonwords) was constructed using CATPAC in the same wayas in the first two phases for both the online tourismdomain and user queries based on the common words.Quadratic Assignment Procedure was then used to assessthe correlation between these two proximity matrices(Borgatti, Everett, and Freeman 1992).

Results

The following sections present the results of the studycorresponding to the three phases of analysis: (1) thesemantic nature of the supply-side tourism domain basedon Web site texts derived from search engine results(phase 1), (2) the semantic nature of the demand-sidetourism domain based on user queries in search engines(phase 2), and (3) the comparison between the domainontologies identified based on tourism-related Webpages and user queries (phase 3).

The Semantic Nature of the Online TourismDomain from a Supply Perspective

Figure 2 shows the distribution of all the unique wordswith their frequencies after a natural logarithmic trans-formation. As one can see, the top 100 words represent

approximately 30% of the total frequencies of all uniquewords; the top 500 words represent more than 50% of thetotal frequencies of all unique words while the top 1,000words represent more than 60% of the total frequenciesof all unique words. Also, it is important to note thatthere are a large number of words that are singletons andrepresent more than one-third of the words used onceand about two-thirds of all unique words used three orfewer times. This indicates that the language describingthe tourism domain from the supply perspective is dom-inated by a small number of words, but the overalldomain is extremely rich and largely idiosyncratic. Afurther analysis of the less frequently used words showedthat while there were a large number of words that didnot exclusively belong to the tourism domain (e.g.,words that are part of the “natural language” such asadjectives such as “familiar” and “immediate,” adverbssuch as “specifically” and “commonly,” and nouns suchas “transition” and “threat”), many of these words wereproper nouns such as “Cabrini,” “Blackwell,”“Bloomingdale,” “Zenith,” and “Quiznos” and reflectedplace-specific concepts, thus representing the place-based foundation of the tourism experience.

Based on the distribution of frequencies, 364 uniquewords, which represented approximately 45% of allunique words, were used to represent the tourism domainontology (not listed due to limitation of space). Their fre-quencies ranged from the highest (12,565) to the lowest(38) frequency and represented a huge variety of busi-nesses and services in the tourism industry. The word“Chicago” had the highest frequency among all words asit was used to define the geographic boundary of thedomain. Some of the other popular words were genericwords as they related to tourism; these included “place,”“area,” “information,” “map,” “travel,” and “tour.” Manyother words were specifically related to attractions andactivities, including “attraction,” “events,” “center,” and“world,” while others related to location such as “west”and “downtown.” Last, some words suggested an interestin promotions such as “best,” “deal,” and “service.”

The Freeman Betweenness Centrality measure(Borgatti, Everett, and Freeman 1992) was calculated toidentify those words that were prominent in the semanticnetwork and played a central role by linking differentgroups of words. Intuitively, this approach can be under-stood as finding those words that were commonly sharedamong different Web pages. Table 1 lists the top 25words in the semantic network that have the highestbetweenness centrality values. As can be seen, a number ofthe words exhibited a relatively high centrality value (mean =164.3 with a standard deviation of 1,238), and the overall

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

Xiang et al. / Semantic Representation of Tourism 447

network centralization was high (29.2%). This indicatesthat certain concepts were highlighted in the representationof tourism services and products by the industry, and theywere commonly “shared” across different Web sites andsections of Web sites. Interestingly, it seems that thesewords can be placed into “layers” based on their centralityvalues. For example, while the word “shop” has the highestbetweenness centrality, words such as “music,” “experi-ence,” “famous,” “blues,” “theater,” “European,” “distance,”and “boutique” have centrality values much lower than“shop” but much higher than the rest of the words.

The Semantic Nature of the Online TourismDomain from a Demand Perspective

Most user queries were short, ranging from 1 to 4words, and were either very general (e.g., “Chicagohotel”) or very specific (e.g., “Chicago Wyndham Hotel”).Figure 3 shows the distribution of frequencies of uniquewords in user queries, indicating that the top 20 wordswith the highest frequencies represented more than half(52%) of the total frequencies of all words and that the top100 words represented 70% of the total frequency. As canbe seen, approximately two-thirds of all the unique words hada frequency lower than or equal to 2, and 45% of all uniquewords were used once by search engine users. Overall, thedistribution was highly skewed toward the small set ofhigh-frequency words, reflecting travelers’ general interestsin tourism-related information (e.g., “hotel,” “direction,”

“map,” “downtown,” “reservation,” “spa,” “discount,”“accommodation,” “lakeview,” etc.).

The Freeman Betweenness Centrality measure (Borgatti,Everett, and Freeman 1992) was calculated to identify thewords that are prominent in the semantic network. Table 2

0

1

2

3

4

5

6

7

8

9

10

Chicago: 12,565

City: 2,148

Feature: 388

Luxury: 138

Burger King: 54

WGN: 21

Yorkville: 8

Zenith: 3

Quiznos: 2

Tourism: 297

Figure 2Distribution of Unique Words on Tourism Web Sites

Note: X axis = number of unique words; Y axis = frequency of words after logarithmic transformation; WGN = a radio station in Chicago.

Table 1Top 25 Words with the Highest Betweenness

Centrality in the Supply-Side Ontology

Word Centrality Word Centrality

Shop 19316.9 Museum 57.7Music 5401.4 Cost 1.6Experience 5298.4 Activity 1.6Famous 5135.0 Jazz 1.6Blues 5033.9 Craft 1.6Theater 4745.3 Antique 1.6European 4745.3 Animal 1.6Distance 4518.0 Kids 1.6Boutique 4518.0 Amphitheater 1.6Premium 340.4 Airlines 1.6Contemporary 152.0 Diner 1.6Building 57.7 Apparel 1.6Art 57.7

N 364Mean 164.3Standard deviation 1,238.4Minimum 0Maximum 19,316.0Network centrality index 29.2%

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

448 Journal of Travel Research

lists the top 25 words with the highest betweenness cen-trality values and provides the descriptive summary ofthe betweenness centrality measure among the semanticnetwork of the 178 unique words. It is not surprising tosee that words such as “Chicago,” “area,” “Chicagoland,”“suburb,” “art,” and “downtown” have the highest between-ness centrality values because they were used most oftenin combination with other words. For example, the word“downtown” was often used together with words such as“Chicago,” “hotel,” and “shop” to form more specificqueries. The overall network centrality was 2.2%, whichindicates that the network has a limited degree of cen-trality, in turn indicating that the network was somewhatconnected and dominated by the words with high cen-trality values.

Comparing the Supply and Demand Ontologies

Table 3 lists the common words shared between thedomain ontology and user queries. As can be seen, therewere 208 words in user queries that were also found in thedomain ontology derived from the supply side. While mostof the common words apparently represented businessfacets in the industry, a number of adjective words (high-lighted in bold) including “fine,” “fun,” “unique,” “free,”“official,” “friendly,” “perfect,” “romantic,” “old,” “cheap,”

and “special” were also commonly used by both searchengine users and the industry. It seems that while thesewords were used by the industry with the intention to pro-mote their products and persuade potential visitors, searchengine users had also used these words to locate specificinformation about the products and services they wanted.

Table 4 provides the ratios of the frequencies of thesecommon words to the total frequency of all words inboth user queries and Web site results. As can be seen,there was a relatively small proportion (17.3%) of querywords that were actually represented in the Web siteresults, while there was a relatively higher proportion(57.1%) of the words reflected in the queries people usedto search. Taking into account that many of these over-lapping words had high frequencies in both ontologies,the actual ratios that more accurately reflected theirprominence were higher, that is, 33.1% for words in userqueries and 79.2% in the Web site text. However, a largeproportion of words in user queries (approximately two-thirds) still were not reflected in the supply ontology. Anexamination of these words revealed that (1) the wordsusually had low frequencies, suggesting that they werenot popular “items” people searched for in Chicago, and(2) many of the words were proper names of an industryentity such as the name of a restaurant, bar, or store. Last,Quadratic Assignment Procedure analysis (Krackhardt

0

1

2

3

4

5

6

7

8

9

Chicago: 2,609

Airport: 61

Hotel: 277

HardRock: 21

Show: 8

Wine: 3Dinner: 2

Imax: 1

Bulls: 7McCormick: 6

Kimpton: 1

Figure 3Distribution of Unique Words in User Queries (Logarithmic Transformation)

Note: X axis = number of unique words; Y axis = frequency of words after logarithmic transformation.

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

1987) was conducted using the proximity matrices con-structed for the top 50 common words to assess the cor-relation between the two ontologies (see Table 5). Thesewords with relatively high frequencies in both data setswere chosen to represent the core of the shared semanticspace between these two ontologies. The results indicatethat no significant relationship existed between the twosemantic structures based on the 50 common words inthe supply-side ontology and the demand-side ontologybased on user queries. Thus, this finding indicates thatthese two semantic ontologies were structurally differ-ent, especially given that one would expect some degreeof commonality among the most frequently used words.

Conclusions and Implications

The results of the study indicate that the onlinetourism domain represented through Web sites and trav-elers’ search engine queries includes an incredibly richamount of information about the tourism industry withina destination. While the identified domain ontology isrelatively small because it focused on one destination, itis clear that the entire tourism domain is rich and idio-syncratic with numerous destination specifics. That is,while there are a relatively small number of words thatdominate the tourism domain (e.g., travel, information,hotel, and attractions), there is also a “long tail” with ahuge number of words that reflects a wide range of

unique experiences that are offered at the destination.The findings also indicate that online tourism informa-tion exhibits certain structural properties in that wordsrepresenting the domain are semantically associated.This semantic space not only contains the various domi-nant facets of the tourism industry but also connotes themeanings embedded in the semantic relationships. Also,it seems that the tourism ontology contains core andperipheral spaces, with certain words being semanticallycloser to each other than others. For example, a numberof words (e.g., “shop,” “music,” “experience,” “famous,”“blues,” “theater,” “European,” “distance,” “boutique,”“premium,” and “contemporary”) were found to behighly central to the semantic structure of the Chicagotourism ontology, suggesting their prominence in thesemantic space of the domain and their roles of bridgingbetween clusters of words.

Analyses of tourism-related user queries from searchengine transaction files showed that the majority ofqueries are short and expressions of travelers’ informa-tion needs with the intention to effectively and efficientlyretrieve relevant information from search engines.Overall, there are relatively few words in user queriesthat represent the majority of tourism-related “things”(e.g., “Chicago hotel”). However, there is also a long tailof words that represents users’ heterogeneous informa-tion needs and their own mental maps of the tourismexperience. Importantly, these results appear to be con-sistent with previous studies of tourism informationsearch (Vogt and Fesenmaier 1998) whereby most infor-mation sought when planning travel is functional ratherthan hedonic. That is, travelers are much more likely tofocus on product attributes such as location, price, andavailability instead of more experiential ones that arebased on sensory and emotional aspects of the product(e.g., smell, atmosphere, sensation, etc.). Also, userqueries exhibit a strong semantic structure. That is, thetypes of information people search for appear to reflect aspectrum of information needs ranging from very gen-eral to highly specific. This finding is consistent withPan and Fesenmaier (2006), who indicated that themajority of users search for information that is very gen-eral (e.g., “Chicago hotel”), while a relatively smallnumber of them directly search for specific informationby including the name of the business (e.g., “ChicagoWyndham Hotel”).

The comparison between the ontologies of userqueries and the information derived from tourism Websites indicates that there are a small number of words(n = 208) common to the two ontologies. While themajority of the words in the supply ontology were repre-sented in user queries, these common words represent

Xiang et al. / Semantic Representation of Tourism 449

Table 2Top 25 Words with the Highest Betweenness

Centrality in the Demand-Side Ontology

Word Centrality Word Centrality

Chicago 351.2 Il 38.4Area 351.2 Search 32.0Chicagoland 177.7 School 29.4Suburb 173.3 Metro 24.4Band 143.9 Fabulous 18.9Art 140.2 List 9.7Downtown 116.7 Program 8.6Store 107.8 County 6.3Grade 98.2 Service 4.7Janes 98.0 Tribune 4.5Mirage 91.2 Hotel 3.9Fine 41.8Land 55.4

Mean 12.5Standard deviation 46.9Minimum 0Maximum 351.2Network centrality index 2.2%

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

450 Journal of Travel Research

only a small portion of all the words in user queries. Thedifferences in the words with high centrality values in thetwo semantic structures identified the different orienta-tions in the representation of tourism’s and users’ infor-mation needs. That is, the results suggest that while theindustry aims to promote businesses (such as “shop,”“boutique”) by using persuasive words (“famous,” “pre-mium”), travelers are more focused on information aboutspecific businesses or facts. In general, comparisonsbetween the ontologies indicate that although the supplyside does reflect certain aspects of users’ informationneeds, there is a substantial number of query terms thatare not captured by this ontology.

The results of this study offer important implicationsfor developing search technologies and strategies foronline marketing in travel and tourism. These technologies

can be seen as decision aides in travel planning, provid-ing the means by which online travelers can simplify thedecision-making process by identifying the destinationsand tourism businesses that meet the traveler’s specificneeds or desires (Fesenmaier, Wöber, and Werthner2006). For a culturally rich domain like tourism, the keyin developing such technologies lies in a better under-standing of the nature of the domain and, consequently,meaningful ways to organize and represent the domain.Specifically, the knowledge gained through this studysuggests that new design approaches should be identifiedwith the aim to bridge the gap between travelers’ infor-mation needs and the rich domain of online tourism. Asshown in the empirical analysis in this study, the major-ity of search engine users use very short and generalqueries to locate more specific and relevant information.

Table 3Common Words Shared between the Domain Ontologies

Chicago Travel Phone Expo Card DirectionsUniversity Concert Review Facility Catering FeatureHotel Inn Show Family Contact FreshArea Convention Association Food Designer FriendlyRestaurant Boat Bike French Dining GardenArt Reservation Blues Luxury Dinner IndianMall Clothing Bus Parking Fashion JapanesePark Tour Exhibit Play Flight JazzMap Jewelry Movie Report Free KitchenTheater Public Performance Steak Game LandmarkStore Shopping Activity Tourism Grill LocalMuseum Shore Address Zoo Group MexicanInfo Attraction Basketball Auditorium Health MusicalAirport Club Buy Business Hours ObservatoryWeather Festivals Company Cheap Kids OfficialEvents Package Gift College Lodging PerfectDowntown Ticket Golf Cost Medicine PlanetariumTrain Outlet Indoor Forecast Nature PlazaGuide Schedule Location Fun Pool PopulationSchool Spa Nightlife Gallery Resort QuarterShop Book Seafood Loop Sushi ResourceChurch Discount Visit Natural Unique RetailFootball Merchandise Apparel Network Urban RomanticDistrict Water Baseball Party Vegetarian RoomPizza History Bed Rate Weekend ShoesBar Holiday Breakfast Style Ads SpecialSports Old Children Traffic Amusement TemperatureBuilding Rental Chinese Wine Bakery ThingstodoLibrary Sites Cook Admission Books TripCity Aquarium Live Adventure Brunch TvField Conference Plan Africa Calendar VacationCafe Fine Price Antique Casino VideoPlace Market Sightseeing Arena Deal VillageProgram Motel Tourist Broadway DeliService Music Cultural Bureau Diner

Note: The words are displayed in vertical order ranging from high to low frequency in user queries; bold words are common adjectives used bythe industry and users.

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

Xiang et al. / Semantic Representation of Tourism 451

As such, a search system for tourism should establish adynamic, more flexible, modality of interaction to allowthe online traveler to articulate his or her needs.Particularly, system feedback not only needs to includesearch results that are highly relevant to a specific querybut also should provide suggestions to “inspire” theonline traveler by expanding his or her consideration set.Typical examples for such techniques include the recom-mendation mechanisms in Amazon’s search functionsbased on collaborative filtering and Google’s contextualadvertising based on mining user queries and searchhistory (Gretzel and Wöber 2004). Because of the hierar-chical structure within the supply-side domain ontology,vocabularies in the domain can thus be used in the formof keyword association, for example, to elicit travelers’information needs or preferences by providing transi-tions from the general (e.g., “Chicago hotel”) to the spe-cific (e.g., “Chicago downtown hotel with lakeview”). Inaddition, consideration should be given to the interfacethat incorporates useful design factors such as the visual-ization of the semantic structure of the domain, narrativelogic, and metaphors to facilitate and enhance the user-system interaction to support travel search (Gretzel andFesenmaier 2002; Xiang and Fesenmaier 2006).

Also, the online tourism domain is understood as thesymbolic transformation of tourism products and experi-ences in the online environment. That is, the meanings of

the domain represent the purposeful communicationsbetween the industry and their prospective customers toengender a positive image of the destination. In particular,the supply-side domain ontology not only comprisesvocabularies that represent various industry facets butalso contains the words that the industry uses to describeits products and services. However, as shown in thisstudy, this language is not necessarily used by travelerssearching for trip-related information. As such, thedevelopment of innovative search technologies shouldfocus on establishing functions that “understand” themeanings connoted in this representation and buildingmapping mechanisms between the supply- and demand-side perspectives. For example, an intelligent systemshould be able to differentiate a user query that asks fora “spotless” hotel room from those for a “reasonablyclean” room (Markoff 2006). Thus, understanding thelanguage used by travelers as well as the one used by theindustry and building an appropriate bridge betweenthese domain ontologies is necessary for successfullymapping user queries with industry Web site contents.

While search engines can play an important role inlinking the supply and demand ontologies of the tourismdomain, tourism marketers can contribute to the successof travelers’ queries by better understanding the lan-guage used by travelers and adjusting their Web site con-tents accordingly. That is, the best example of persuasive

Table 4Ratios of Common Words Shared between the Domain Ontologies

Number of Words Total Frequency

Common Words All Words Ratio (%) Common Words All Words Ratio (%)

Supply ontology 208 364 57.1 65,678 82,928 79.2Demand ontology 208 1,204 17.3 3,242 9,788 33.1

Table 5Quadratic Assignment Procedure Analysis of the Semantic Structures of Top

50 Common Words in the Supply and Demand Domain Ontologies

Value Significance Average Standard Deviation P(Large) P(Small)

Pearson correlation 0.184 0.239 0.003 0.302 0.084 0.675Simple matching 0.000 1.000 0.000 0.000 1.000 1.000Jaccard coefficient 1.000 1.000 1.000 0.020 1.000 1.000Goodman-Kruskal Gamma 0.000 0.000 0.000 0.000 0.000

Note: The program computes 500 (by default) correlations between the data matrix and the randomly permuted structure matrix. The two P values,i.e., P(Large) and P(Small), stand for the largest and smallest indicators for significance among all possible permutations. As shown in the table,the Pearson correlation significance value (0.239) represents the final value the program settled on, which indicates not significant in this case.

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

452 Journal of Travel Research

communication takes place on those Web sites thataddress the specific information needs of the travelerusing the “language” of the traveler. It is argued that thislanguage provides the foundation with which theprospective traveler interprets the informational productsoffered by the industry. To achieve this goal, furtherresearch and development is needed in a number of areas.Gretzel (2006), for example, suggested that tourism mar-keters can turn to many of the readily available consumer-generated contents, for example, blogs and reviews, tolearn about the language travelers use to describe travelproducts and their experiences. In addition, tourism Websites can be designed to incorporate tools (e.g., reviews,tagging, and digging) that enable travelers to directlyinteract with them so that more knowledge about the waytravelers communicate their perceptions and experiencescan be collected and learned. It seems that these con-sumer-driven communication channels provide new andpromising avenues for tourism marketers to understandand therefore better interact with prospective visitors.Thus, it is argued that “the language of tourism” asreflected by both the industry and the traveler provides anessential foundation necessary to guide the developmentof technologies needed to support travel planning on theInternet. However, much research is needed to examinethe potential impact of emerging technologies, changingdemographics, and the roles of consumer knowledge andperception on this language.

References

Adomavicius, G., and A. Tuzhilin (2005). “Toward the NextGeneration of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions.” IEEE Transactions onKnowledge and Data Engineering, 17 (6): 734-49.

Berners-Lee, T. (1999). Weaving the Web. San Francisco: Harper.Bertolucci, J. (2007). “Search Engine Shoot-out.” PC World, 25 (6): 86-96.Borgatti, S., M. Everett, and L. Freeman (1992). UCINET X: Network

Analysis Software. Columbia, SC: Analytic Technologies.Brooks, T. A. (2004). “The Nature of Meaning in the Age of Google.”

http://informationr.net/ir/9-3/paper180.html.Burns, E. (2007). “U.S. Search Engine Rankings and Top 50 Web

Rankings.” http://searchenginewatch.com/showPage.html?page=3625081.

Campbell, A. E., and S. C. Shapiro (1995). “Ontological Mediation:An Overview.” Paper presented at the IJCAI Workshop on BasicOntological Issues in Knowledge Sharing, Menlo Park, CA,February 1.

Carley, K. M. (1997). “Network Text Analysis: The Network Positionof Concepts.” In Text Analysis for the Social Sciences: Methodsfor Drawing Statistical Inferences from Texts and Transcripts,edited by C. W. Roberts. Mahwah, NJ: LEA, pp. 79-100.

Carroll, J., and J. Thomas (1982). “Metaphors and the CognitiveRepresentation of Computing Systems.” IEEE Transactions onSystems Man and Cybernetics, 12 (2): 107-16.

Castells, M. (2001). The Internet Galaxy: Reflections on the Internet,Business and Society. Oxford, UK: Oxford University Press.

Crawford, C. (2003). The Art of Interactive Design: A Euphoniousand Illuminating Guide to Building Successful Software. SanFrancisco: No Starch Press.

Dann, G. M. S. (1997). The Language of Tourism: A SociolinguisticPerspective. Wallingford, UK: CAB International.

Fesenmaier, D. R., K. Wöber, and H. Werthner (2006). “Introduction:Recommendation Systems in Tourism.” In DestinationRecommendation Systems: Behavioral Foundations andApplications, edited by D. R. Fesenmaier, K. Wöber, and H.Werthner. Wallingford, UK: CABI, pp. XVII-XXII.

Gretzel, U. (2006). “Consumer generated content - trends and impli-cations for branding.” e-Review of Tourism Research, 4 (3): 9-11.

Gretzel, U., and D. R. Fesenmaier (2002). “Building Narrative Logicinto Tourism Information Systems.” IEEE Intelligent Systems, 17(6): 59-61.

Gretzel, U., Y. H. Hwang, and D. R. Fesenmaier (2006). “ABehavioural Framework for Destination RecommendationSystems Design.” In Destination Recommendation Systems:Behavioural Foundations and Applications, edited by D. R.Fesenmaier, K. Wöber, and H. Werthner. Wallingford, UK: CABI,pp. 53-64.

Gretzel, U., and K. Wöber (2004). “Intelligent Search Support:Building Search Term Associations for a Tourism-specific SearchDomain.” Paper presented at the Eleventh InternationalConference on Information and Communication Technology inTourism (ENTER 2004), Cairo, Egypt, January 26-28.

Gretzel, U., Z. Xiang, K. Wöber, and D. R. Fesenmaier (2008).“Deconstructing Destination Perceptions, Experiences, Storiesand Internet Search: Text Analysis in Tourism Research.” InTourism Management: Analysis, Behaviour and Strategy, editedby S. Wood and D. Martin. Wallingford, UK: CABI, pp. 339-57.

Hevner, A. R., S. T. March, J. Park, and R. Sudha (2004). “DesignScience in Information Systems Research.” MIS Quarterly, 28 (1):75-105.

Jansen, B. J., A. Brown, and M. Resnick (2007). “Factors Relating tothe Decision to Click on a Sponsored Link.” Decision SupportSystems, 44 (1): 46-59.

Jansen, B. J., and P. R. Molina (2006). “The Effectiveness of WebSearch Engines for Retrieving Relevant Ecommerce Links.”Information Processing and Management, 42 (4): 1075-98.

Jansen, B. J., and A. Spink (2005). “An Analysis of Web Searching byEuropean AlltheWeb.com Users.” Information Processing andManagement, 41 (2): 361-81.

——— (2005). “How Are We Searching the World Wide Web? AComparison of Nine Search Engine Transaction Logs.”Information Processing and Management, 42 (1): 248-63.

Jansen, B. J., A. Spink, and J. Pedersen (2005). “A TemporalComparison of AltaVista Web Searching.” Journal of the AmericanSociety for Information Science and Technology, 56 (6): 559-70.

Kim, H. (2002). “Predicting How Ontologies for the Semantic WebWill Evolve.” Communications of the ACM, 45 (2): 48-54.

Krackhardt, D. (1987). “QAP Partialling as a Test Spuriousness.”Social Networks, 9:171-86.

Krippendorff, K. (2004). Content Analysis: An Introduction to ItsMethodology. 2d ed. Thousand Oaks, CA: Sage.

Marchionini, G. (1997). Information Seeking in ElectronicEnvironments. Cambridge, UK: Cambridge University Press.

Markoff, J. (2006). “Entrepreneurs See a Web Guided by CommonSense.” New York Times, November 12, A1.

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from

Xiang et al. / Semantic Representation of Tourism 453

Norman, D. A. (1999). “Affordances, Conventions and Design.”Interactions, 6 (3): 38-43.

Pan, B., and D. R. Fesenmaier (2006). “Online Information Search:Vacation Planning Process.” Annals of Tourism Research, 33 (3):809-32.

Popping, R. (2000). Computer-assisted Text Analysis. London: Sage.Riedl, J., and J. Konstan (2002). Word of Mouse. New York: Warner.Smith, S. L. J. (1988). “Defining Tourism: A Supply-side View.”

Annals of Tourism Research, 15 (2): 179-90.Spink, A. (2002). “A User-centered Approach to Evaluating Human

Interaction with Web Search Engines: An Exploratory Study.”Information Processing and Management, 38: 401-26.

Spink, A., B. J. Jansen, D. Wolfram, and T. Saracevic (2002). “Frome-Sex to e-Commerce: Web Search Changes.” IEEE Computer, 35(3): 107-109.

Spink, A., M. Park, B. J. Jansen, and J. Pedersen (2004).“Multitasking During Web Search Sessions.” InformationProcessing and Management, 42 (1): 264-75.

Vogt, C. A., and D. R. Fesenmaier (1998). “Expanding the FunctionalInformation Search Model.” Annals of Tourism Research, 25 (3):551-78.

Wasserman, S., and K. Faust (1994). Social Network Analysis.Cambridge, UK: Cambridge University Press.

Werthner, H. (1996). “Design Principles of Tourism InformationSystems.” In Information and Communication Technologies inTourism, edited by S. Klein, B. Schmid, A. M. Tjoa, and H.Werthner. Vienna: Springer, pp. 70-78.

Wöber, K. (2006). “Domain Specific Search Engines.” In DestinationRecommendation Systems: Behavioral Foundations andApplications, edited by D. R. Fesenmaier, K. Wöber, and H.Werthner. Wallingford, UK: CABI, pp. 205-226.

Woelfel, J. K. (1993). “Artificial Neural Networks in PolicyResearch.” Journal of Communication, 43:62-80.

Woodside, A. G., and C. Dubelaar (2002). “A General Theory ofTourism Consumption Systems: A Conceptual Framework and anEmpirical Exploration.” Journal of Travel Research, 41 (2): 120-32.

Xiang, Z., and D. R. Fesenmaier (2006). “Interface Metaphors onTravel-related Websites.” In Destination Recommendation

Systems: Behavioral Foundations and Applications, edited by D. R. Fesenmaier, K. Wöber, and H. Werthner. London: CABInternational, pp. 180-89.

Xiang, Z., S. -E. Kim, C. Hu, and D. R. Fesenmaier (2007).“Language Representation of Restaurants: Implications forDeveloping Online Recommendation Systems.” InternationalJournal of Hospitality Management, 26 (4): 1005-18.

Xiang, Z., K. Wöber, and D. R. Fesenmaier (2008). “TheRepresentation of the Tourism Domain in Search Engines.”Journal of Travel Research. 47 (2): 137-150.

Yu, C., and W. Meng (2003). “Web Search Technology.” In TheInternet Encyclopedia, edited by H. Bidgoli. Hoboken, NJ: JohnWiley, pp. 738-51.

Zheng Xiang, PhD, is an assistant professor in the School ofMerchandising and Hospitality Management at the Universityof North Texas, Denton. His research interests lie in travelinformation search on the Internet, destination marketing, anddevelopment of benchmarking systems for tourist destinations.

Ulrike Gretzel, PhD, is an assistant professor at theDepartment of Recreation, Parks & Tourism Sciences at TexasA&M University in College Station and the director of theLaboratory for Intelligent Systems in Tourism. Her researchfocuses on persuasion in human-technology interaction, therepresentation of sensory and emotional aspects of tourismexperiences, and issues related to the development and use ofintelligent systems in tourism.

Daniel R. Fesenmaier is the director of the NationalLaboratory for Tourism & eCommerce and the program direc-tor of the Tourism and Hospitality Management Program atTemple University in Philadelphia. His research interestsinclude travel behavior, information search and decision mak-ing, destination marketing, and the development of informa-tion systems for destination management organizations.

at UNIV OF VIRGINIA on June 24, 2014jtr.sagepub.comDownloaded from