Research Article Unsupervised User Similarity Mining in ...

12
Hindawi Publishing Corporation e Scientific World Journal Volume 2013, Article ID 589610, 11 pages http://dx.doi.org/10.1155/2013/589610 Research Article Unsupervised User Similarity Mining in GSM Sensor Networks Shafqat Ali Shad and Enhong Chen Department of Computer Science and Technology, University of Science and Technology of China, Huangshan Road, Hefei, Anhui 230027, China Correspondence should be addressed to Shafqat Ali Shad; [email protected] Received 29 December 2012; Accepted 26 January 2013 Academic Editors: Y.-P. Huang and M.-A. Sicilia Copyright © 2013 S. A. Shad and E. Chen. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Mobility data has attracted the researchers for the past few years because of its rich context and spatiotemporal nature, where this information can be used for potential applications like early warning system, route prediction, traffic management, advertisement, social networking, and community finding. All the mentioned applications are based on mobility profile building and user trend analysis, where mobility profile building is done through significant places extraction, user’s actual movement prediction, and context awareness. However, significant places extraction and user’s actual movement prediction for mobility profile building are a trivial task. In this paper, we present the user similarity mining-based methodology through user mobility profile building by using the semantic tagging information provided by user and basic GSM network architecture properties based on unsupervised clustering approach. As the mobility information is in low-level raw form, our proposed methodology successfully converts it to a high-level meaningful information by using the cell-Id location information rather than previously used location capturing methods like GPS, Infrared, and Wifi for profile mining and user similarity mining. 1. Introduction Successful mobility profile building is the basis of a wide range of applications which includes viral advertisement systems [1, 2], potential warning systems [3], city-wide mapping and sensing [4], pollution detection and exposure [5], social networking, and community finding [6]. All of the mentioned applications are based on mobility profile building where a low-level raw mobility information is interpreted into a high-level meaningful information which can be utilized for useful purposes. As the mobility profile building is based on two potential parameters, that is, dwell time extraction and significant location finding, spatial data-based applications use the discrete location and continuous time information over mixed model. As location extraction is a trivial task in mobility profile building, there are two broad classifications of location extraction methods: Active badge [7] and Active bat [8], where Active badge mainly represents the indoor technolo- gies like Bluetooth, RFID, and Infrared, while Active bat represents the outdoor technologies like GPS, assisted faux GPS, and GSM. As Active badge is limited in terms of its usage and implementations, Active bat is popular for location extraction in mobility. In case of Active bat, GPS and assisted GPS are not so encouraging because of high power consumption and extra equipment installation in the network. So, the only available and suitable method is GSM [9], where cell global identity (CGI) can be used for readily extraction of location. Cell global identity is a four-set header, that is, mobile country code (MCC) varies with country of the operator, mobile network code (MNC) binds with every network operator, location area code (LAC) assigned and arranged by the network operator for cells arrangement, cell ID given to every user connected to the network. MCC, MNC, LAC, and cell ID as a whole identify the user over its unique location in the network anytime. CGI represents the approximate location of user through its four-set header which can be converted into latitude and longitude coordinates using public cell ID databases. is location information can be used for the determination for significant places for mobility profile building of the user. However, extraction of significant locations is a trivial task due to many reasons like missing values, cell oscillation, and exact coordinate mapping for location. Additionally, the

Transcript of Research Article Unsupervised User Similarity Mining in ...

Page 1: Research Article Unsupervised User Similarity Mining in ...

Hindawi Publishing CorporationThe Scientific World JournalVolume 2013 Article ID 589610 11 pageshttpdxdoiorg1011552013589610

Research ArticleUnsupervised User Similarity Mining in GSM Sensor Networks

Shafqat Ali Shad and Enhong Chen

Department of Computer Science and Technology University of Science and Technology of China Huangshan Road HefeiAnhui 230027 China

Correspondence should be addressed to Shafqat Ali Shad shafqatmailustceducn

Received 29 December 2012 Accepted 26 January 2013

Academic Editors Y-P Huang and M-A Sicilia

Copyright copy 2013 S A Shad and E Chen This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Mobility data has attracted the researchers for the past few years because of its rich context and spatiotemporal nature where thisinformation can be used for potential applications like early warning system route prediction traffic management advertisementsocial networking and community finding All the mentioned applications are based on mobility profile building and user trendanalysis where mobility profile building is done through significant places extraction userrsquos actual movement prediction andcontext awareness However significant places extraction and userrsquos actual movement prediction for mobility profile building area trivial task In this paper we present the user similarity mining-based methodology through user mobility profile building byusing the semantic tagging information provided by user and basic GSM network architecture properties based on unsupervisedclustering approach As the mobility information is in low-level raw form our proposed methodology successfully converts itto a high-level meaningful information by using the cell-Id location information rather than previously used location capturingmethods like GPS Infrared and Wifi for profile mining and user similarity mining

1 Introduction

Successful mobility profile building is the basis of a widerange of applications which includes viral advertisementsystems [1 2] potential warning systems [3] city-widemapping and sensing [4] pollution detection and exposure[5] social networking and community finding [6] All of thementioned applications are based onmobility profile buildingwhere a low-level rawmobility information is interpreted intoa high-levelmeaningful informationwhich can be utilized foruseful purposes As the mobility profile building is based ontwo potential parameters that is dwell time extraction andsignificant location finding spatial data-based applicationsuse the discrete location and continuous time informationover mixed model

As location extraction is a trivial task in mobility profilebuilding there are two broad classifications of locationextraction methods Active badge [7] and Active bat [8]where Active badge mainly represents the indoor technolo-gies like Bluetooth RFID and Infrared while Active batrepresents the outdoor technologies like GPS assisted fauxGPS and GSM As Active badge is limited in terms of

its usage and implementations Active bat is popular forlocation extraction in mobility In case of Active bat GPSand assisted GPS are not so encouraging because of highpower consumption and extra equipment installation in thenetwork So the only available and suitable method is GSM[9] where cell global identity (CGI) can be used for readilyextraction of location Cell global identity is a four-set headerthat is mobile country code (MCC) varies with country ofthe operator mobile network code (MNC) binds with everynetwork operator location area code (LAC) assigned andarranged by the network operator for cells arrangement cellID given to every user connected to the network MCCMNC LAC and cell ID as a whole identify the user over itsunique location in the network anytime

CGI represents the approximate location of user throughits four-set header which can be converted into latitude andlongitude coordinates using public cell ID databases Thislocation information can be used for the determination forsignificant places for mobility profile building of the userHowever extraction of significant locations is a trivial taskdue to many reasons like missing values cell oscillationand exact coordinate mapping for location Additionally the

2 The Scientific World Journal

Users similaritymatrix and

identification

Cell IDs to latlong points

Retrieve missingvaluesremove

outliers

Clustered cellswith projected

stay points

Userrsquos frequentpatterns

User profile withfrequent patterns

Retrieval oflocation

information

Outliers and Cell oscillationresolution and

stay pointsidentification

User patterndiscovery

User profilebuilding

Cell IDsUserrsquos

similaritymeasure

Userrsquos similaritymatrix and

identification

missing valueshandling

Figure 1 The proposed framework for userrsquos similarity measure

semantic information about the locations visited by the usercan also be used for the mobility extraction through itsmapping with physical location coordinates

As the low-level mobility data cannot be used for thehigh-level potential mobility applications we introduced acomplete framework in this paper to describe how thisinformation can be used to develop a mobility profile usingthe unsupervised clustering approach So the paper presentsthe extraction of spatiotemporalmobility trends andmobilityprofile building approach using the cellphone low-level logdata The contributions of the paper are (1) missing valuesextraction and removal of outliers using public cell ID andsemantic information (2) cell oscillation resolution andextraction of significant locations (3) extraction of significantlocations using overlapped area over time span and (4)semantic information usage for finalmobility profile buildingand userrsquos similarity finding

2 Related Work

Over recent years mobility data has become a rich sourceof human life trends and a lot of work has been done inthe area of spatial information extraction This motive is themain basis for many applications like city-wide sensing [10ndash14] where the privately held sensors were used through adeveloped model while personal sensors like mobile phonesand cameras are used for traffic monitoring system [15]through capturing the location information while socialbehavior is studied in their work [16ndash18] where informationis exploited through identification of significant places whereusers are active and later similarity analysis between themWhile route prediction and recommendation is studied byHull et al [19] through GPS installed sensors in a taxiusing the technique of opportunisticmessage forwarding Onthe other hand their work [20ndash22] is a cell-based locationawareness for user mobility analysis

In their work Zonoozi and Dassanayake [23] proposedtime optimization technique over cell residence for humanmobility analysis while Markoulidakis et al [24] proposed

prediction model through cell handover residence based onMarkov model by introducing Kalman filter for future visitprediction Akyildiz et al [25 26] proposed a predictionmodel that is based on motion speed position and historyMusolesi and Mascolo [27] categorized the mobility modelsinto traces and synthetics where they suggested that trace-based mobility models are more easy to implement ascompared to synthetic based due to its public data gatheringGonzalez et al [28] studied the spatiotemporal nature of usermobility based on pattern analysis through extraction of topK locations frommobility data of 100KNurmi andKoolwaaij[29] proposed a clustering technique for the extraction ofsignificant places using a graph-based transitionalmodel overcell tower location data

All of the above-mentionedwork is related to themobilityanalysis done over complete information about locationwithout consideration of missing values and change innetwork structure fall under data preprocessing and focusedon location extraction either all dependent on semanticawareness or otherwise ignoring the semantic informationall together In our work we have tackled with data pre-processing where outliers have been eliminated and data ismade consolidated for analysis and further cell oscillationissue is resolved for complete mobility profile building thenall this clustered information has been used for the mobilityprofile building through a proposed clustering techniquewhich is a mixture of semantic information and GSMnetwork property usage Our work is mainly focused onsuccessfulmobility profile building based on anaive approachthat is a mixture of both semantic information and rawlocation information where prior to profile building outlierextraction of location information from GSM cell globalidentity (CGI) and cell oscillation phenomenon are welldealt by experimenting them onMIT reality mining mobilitydataset

3 Methodology

Figure 1 shows the overall process of proposed methodology

The Scientific World Journal 3

Internet MSC BSC

BTS

BTS

BTS

BTS

MS

MS

MS

MS

Figure 2 Architecture of functional GSM network

(a) (b)

Figure 3 (a) Network topology hexagonal view (b) Cells overlapped view

31 Location Information Retrieval and Outlier RemovalThe basic GSM structure is shown in Figure 2 As shownthe base transceiver station (BTS) is a basic unit being arepresentative of the location area where multiple cells fallin This distribution is dependent on mobile operator and ishidden from users or public use

Mobile station (MS) moves in the network and gets itsconnection through base station controller (BSC) WhileBSC is connected to mobile services switching center (MSC)which connects different BSCs and MSCs over network Oneimportant identity is the location area which all BSCs shareconnected to common MSC

Each cell conceptually has polygon shape (Figure 3(a))but actually it has overlapping bubble shape as shown inFigure 3(b)

Now there are two main concerns regarding the extrac-tion of information from dataset firstly as mentioned earlierthe dataset is taken from MIT reality mining which haspartial information of cell global identity (CGI) about userlocation that is LAC and Cell ID so it is apparently hardto determine whether this partial set of information isenough for location extraction secondly the GSM network ischanging over time so LAC is reorganized or thrown awayby the operator so it is obvious that there will be missingvalues and outlier issues in dataset beside shift of GSM to 3Gtechnologies nowadays which disable us from determiningmost of the location information usingMIT dataset collectedin 2008

We proposed the methodology to deal with all theseproblems in our work [30] where we used the basic network

4 The Scientific World Journal

information to solve the missing values issue and the outlierresolution along with precise clustering of cells for futurelocation extraction and mobility profiling We proposed andused the clustering methodology where LAC and cell IDprovide all set of information for mobility profiling basicbuilding using the open source Google location API reverseengineering And formissing values we utilized the semanticinformation provided by the user in the dataset for preciselocation extraction and mobility profile building

32 Cell Oscillation Resolution and Role of Semantic Informa-tion As stated earlier in the problem statement cell oscil-lation is a common phenomenon in GSM network wherea user can be assigned multiple cell IDs while static whichleads to a fake mobility during mobility profile building dueto change in cell IDs over time We presented a methodologyin our work [31] for cell oscillation resolution using thesemantic tagging information and introducing the time stayphenomenon where overlapping cells represent the locationof interest rather than mobility In our mentioned work wenot only resolved the oscillation phenomenon successfullybut also we clustered the cells on the basis of semanticinformation provided by user for example home lab airportclub and so forth and overlapping time stay area so thatlater during mobility profiling this clustered informationcan be utilized for stay location identification We usedthe overlapped location information for identification ofsignificant places which can later be utilized for mobilityprofiling

33 Mobility Profile Building Mobility profile building is oneof the trivial tasks in any of the location base service (LBS)where the mobility profiling is done through extraction ofsignificant placesThe significant places can be defined as theplaces which are important for a user over geocoordinatesand most of the time the user stays on these locations Userusually semantically tags these locations or spends significantamount of time on these location or visits them frequentlyover the period of observation So it is clear that a significantplace can be a place where user spends most of time (homework etc) or user visits it frequently over a period of time(supermarket club) or user spends time significantly withoutfrequent visit (conference seminar travel) So it makes thediscovery of significant places valuable for LBS where userbehavior is the main source of stimulation As describedin previous sections the cell ID is the only viable solutionfor user profile building where the coverage is wide low inenergy consumption no data plan is required and availablein all kinds ofmobile phones so usermovement is available indata as set of different cell IDs are distributed over a networkBy observing the precise transition over these cells and usingthe effective technique this information can be used formobility profile building But construction of mobility profilebuilding is a complex process as it must deal with some ofthe following questions like when user moves over thousandsof cell IDs during period of time many of them cannot beavailable to extract their location information through cellID databases for example Google Open Cell ID so these

cellswill lead tomisinterpreted profiling there are dark placeswhere user lost the connection or user switched off the cellwhich seems to be significant places due to time spent bythe user and there is a lot of cell oscillations during usermovement which seems to be mobility even when user isstatic We have divided it into three parts (1) clustering of thecells for path finding and their fingerprinting (2) grouping ofsimilar patterns for projection of trends (3) profile buildingand (4) similarity measure between different users throughsharing property as adopted from [32] The whole process ofmobility profiling can be elaborated as follows

Let 119879 = 1199051 1199052 119905

119899 is set of towerslocations visited

by the user during the mobility we are interested in iden-tification of pattern group PG = pg

1 pg2 pg

119899 which

satisfies the THgroupcount which is group threshold wherePG is extracted from frequent mobility user history (FUH)defined over transition threshold THtransition location areathreshold THloctaion area and semantic tag information SEMFUH is retrieved through the visiting history VH retrievedthrough oscillation removal method [31] and time-stampingmethods

Algorithm 1 Frequent pattern discovery from userrsquos mobilityhistory

(1) Select complete list of user mobility in terms of celltowers visited 119879

(2) Apply cell oscillation technique on it and retrievedclustered cells 119862

(3) Apply proper time stamping on the clustered cellsretrieved after oscillation removal as visit history ofuser VH with complete spatiotemporal information

(4) Identify the frequent user mobility patterns FUHdefined over THloctain area and THtransition Group theidentified patterns G on the basis of their spatiotem-poral nature and semantic tags information usingprefix-span algorithm

(5) Repeat step (i) to (ii) until only supported group ofpatterns PG is identified

(i) if size of group 119892 satisfies the group supportthreshold THGorupcount

(ii) assign the group to supported group pattern set

Let the set of users119880 = 1199061 1199062 119906

119899 with the complete

information of pattern groups PGs we are interested tobuild an M[119894][119895] on the basis of each set of patterns 119901

119894and

119902119895 where two patterns belong to two potentially similar

users The similarity between two patters is calculated overLnCSS (longest common subsequence) and CoL (colocationprobability measure)

Algorithm 2 Userrsquos similarity measurement

(1) Select the users(2) Select the pattern group of two users(3) Repeat step (i) to (ii) for each 119901 and 119902 belongs to two

different users

The Scientific World Journal 5

(i) calculate the LnCSS and CoL property of twogiven patterns

(ii) if two patterns satisfy the minimum support ofsimilarity than compute M(119901 119902) as 1 otherwisecompute it as 0

(4) Calculate the userrsquos similarity based on similaritymatrix

331 Clustering of Subsequences in Mobility History Data Asmentioned earlier we retrieved the clustered cells by imple-menting the cell oscillation algorithm where it is guaranteedthat the data has no oscillation problem We adopted naıveclustering approach which clusters the cells on the basis ofcircular subsequences The baseline algorithm works on thefact that common cell IDs can be merged together to makea circular subsequence which can represent user behavior[31] We implemented two proposed algorithms successfully[31] for the resolution of cell oscillation and retrieval ofcommon clustered cells to represent the significant placefor the user using the overlapped area however details ofthe algorithm are given in our previous work It is alsoof due importance that as mentioned earlier the proposedalgorithm also resolved the problem of missing values whichcan occur due to network change or nonavailability of cell-ID information in open cell-ID databases For examplethe mobility sequence 119862

1 1198622 1198623 1198624 1198625has 119862

3and 119862

4no

retrievable through cell-ID database which obliviously willlead to a mislead mobility of the user our algorithm willreplace these two cells using the majority voting mechanismwith the most likely cell in the cluster so that mobilityremains traceable with the most likely authentication Theretrieved clusters carry the information about significantplaces as a semantic tag like home lab club and so forth asthe derived algorithm can infer the significant place based ontime spent over an overlapped area between different cellswe tagged such location with a convention of ldquostay pointrdquoAfter the retrieval of the clustered cells we implemented thefingerprinting on them to represent the percentage of timespent by the user on particular cells and as a whole on onecluster that is stay points Each cluster is being representedas a sequence of cell IDs that is

Cluster1[1198621 1198622 11986215]

Cluster2[1198623 1198625 1198627 119862

19]

Cluster119899[1198627 11986218 119862

24]

(1)

Each cluster represents the stay point which may or may nothave user defined semantic tag associated with it

332 Time Stamping of Clustered Cells We bound the timepercentage with each of the cells in the cluster alongwith totaltime spends on that particular cluster with time stamp Thetime stamping is the most important part of fingerprinting

where user behavior can be determined easily over timedateslicing The structure of fingerprint is as

[11986211198791 11986221198792 119862

119899119879119899

TotaltimeTime stamp Semantic tag] (2)

where each cluster represents the stay point of the user alongwith percentage time on it as 119879 date and semantic taginformation if available After this process we have a set ofall the stay points or significant places user had visited overtime so we converted the given history of user distributedover cell IDs to stay points in form of clusters as follows

VH = Cluster1Cluster

2 Cluster

119899 (3)

where VH represents the complete mobility history of theuser in terms of stay points in formof spatiotemporal clustersthat is Cluster

1Cluster

2 Cluster

119899 This information can

be used to identify the Trajectory patterns against a particularuser to build up the mobility profile by grouping themdistributed over time After the extraction of patterns weneed to group the patterns and discard infrequent patternthrough usage of minimum support count otherwise theresultant pattern will be in millions for such a huge amountof data which will be a burden in the memory So weare interested in only frequent pattern through usage ofspatiotemporal information and support count So we candefine group G as subset of VH such that it satisfies thefollowing two rules

(i) 1198751 1198752 119875

119899isin 119892 if 119892 sdot Distance le THlocation area and

119892 sdot Time le 119892 sdot THstay time

(ii) |119892| ge THgroup support

333 Extraction of Mobility Patterns from Clustered StayPoints After the retrieval of all the stay points in the givenspatiotemporal history of mobile user we can transform itinto trajectory pattern which the user follows over time Wecan define a pattern 119875 as a trip over two or more consecutivestay points SPs in ordered set of VHor trip between stay pointSP and firstlast point of user mobility history VH We candefine the pattern 119875 as

119875 = SP119898 SP

119899 where 0 lt 119898 le 119899

119875119894(VH119894 SP

119898 cap VH

119894+1SP119899 ) Or

119875 = SP1 SP

119898 where 0 lt 119898 le 119900

119875119894(VH119894SP119898 ) Or

119875 = SP119899 SP

119900 where 0 lt 119899 le 119900

119875119894(VH119894 SP

119899)

(4)

Further for the extraction of the true patterns from usermobility we introduced the transition time threshold toensures the continuity of the trip where this transitionthreshold ensure the smooth transition of user from onestay point to another and differentiates one visiting pattern

6 The Scientific World Journal

from another For the extraction of mobility pattern from thesemantically arranged stay points we used prefix-span algo-rithm [33] on user mobility history The result of algorithmgave semantic pattern of the user over time for example⟨Home LabGoogle Grand parents⟩ along with all ofits subsequences As the result of the algorithm gave redun-dant patterns we adopted maximal trajectory pattern [34]technique for the representation of user mobility pattern Sothe resultant extracted patterns are true representatives offrequent mobility of user (FUM) called over time as

FUM = Ptn1 ⟨Home Stay point OfficeGoogle

Stay point ⟩

Ptn2 ⟨Home Super marketTopo hub

Grand parentsClub⟩

Ptn119899 ⟨OfficeBank Stay PointPark⟩

(5)

334 Projection of Similarity between Different UsersThrough Pattern Matching After the successful extractionof user pattern we determine if two patterns aresimilar through the longest common subsequence(LnCSS) measure For example if there is pattern119875 = ⟨Home Stay point OfficeGoogle Stay point⟩and 119876 = ⟨Home OfficeBank Stay point⟩ their LnCSSwill be = ⟨Home Office Stay point⟩ which we candefine as

Ratio ( LnCSS (119901 119902) 119901)

=sum|119901|

119894=1sum|LnCSS(119901119902)|119895=1

M (119901119894 LnCSS

119895)

10038161003816100381610038161199011003816100381610038161003816

M (119901119894 LnCSS

119895) =

119901119894isin LnCSS

119895

10038161003816100381610038161199011003816100381610038161003816

if LnCSS119895is matching to 119901

119894otherwise = 0

(6)

But the longest common subsequence (LnCSS)measurementis not only based on merely semantic tag based as same placeis tagged with different names against different users forexample Media Lab is tagged as Lab Media Lab Work laband so forth so we introduced the time threshold for thetransition that is THtrans between two stay points (as everysemantically tagged location consists of cell towers) furtherwe introduced the spatial property threshold using locationTHlocation area area so it together joins the spatiotemporalproperty of user mobility

So we can define the similarity between two users giventwo frequent user mobility (FUH) FUH

1= Pattern

11

Pattern12 Pattern

1119899 and FUH

2= Pattern

21

Pattern22 Pattern

2119899 and defined time threshold

THtime cover and location area threshold THlocation area wecan define similarity between users as

Similarity (FUH1 FUH

2THtime coverTHlocation area)

= Location area (Ptn11Ptn2119894)

+ Distance (Ptn1119899Ptn2119895le THlocation area)

CoL (Ptn11Ptn2119894) le 1 where

119894 119895 belongs to 119873 | 0 lt 119894 le 119895 le 119898

(7)

where CoL is colocation property of that find is user 119909 and 119910visit location 119897 at the same time to exploit their spatiotemporalproperty beside semantic tags as mentioned semantic tagscannot assure the true accuracy of similarity measure dueto different conventions used by different users for the sameplace Colocation rate idea is adopted from their work [35]where the most likely location of the user 119909 is defined as

119871 (119909) = arg119897isinLoc119875 (119909 119897) (8)

where 119871 is the most likely location of user 119909 and Locrepresents the cell towers or locations set user traversed overtime during mobility while 119875 is the probability of the user 119909to visit location 119897 and can be defined as

119875 (119909 119897) =

119899(119909)

sum119894=1

120575 (119897 119871119894(119909))

119899 (119909) (9)

where 120575(119909 119910) = 1 if 119909 = 119910 otherwise it is 0 Furtherdistance between user 119909 and 119910 can be defined as 119889(119909 119910) =dist(119871(119909) 119871(119910)) to represent physical distance between theirfrequent locations So on the basis of it colocation rate can bedefined as follows

CoL =

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816) 120575 (119871

119894(119909) minus 119871

119895(119910))

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816)

(10)

As (10) shows that colocation rate counts both time andlocation simultaneously for two users that is 119909 and 119910 So itbinds both spatiotemporal trends together normalized overthe number of times the users visited the location whereΘ isHeaviside function and Δ119879 is equal to 119879stay time

On the basis of the above similarity measuring valuewe can calculate the similarity between the given user pairsthrough comparing their patterns over three basic unitssemantic tag spatial value and temporal value For this weconstruct a similarity matrix which gives a clear picture ofsimilarity measure against every frequent pattern pair oftwo different users After calculating the similarity betweenpatterns of two users these values are used to calculatethe overall similarity measure between two users to inferif they are related to each other or not For examplewe have two userrsquos FUH

1= 119875

11 11987512 and FUH

2=

11987521 11987522 to find their similarity with a given pair of

The Scientific World Journal 7

Office

School

Bank

Stay point

User 1

User 211987521

11987511

11987512

11987522

Figure 4 User mobility pattern

patterns where 11987511= ⟨Home⟩⟨BankOffice⟩⟨Stay point⟩

and 11987512

= ⟨Home⟩ ⟨School⟩ ⟨Office⟩ while 11987521

=

⟨Home⟩⟨BankOffice⟩⟨Stay point⟩ and 11987522

= ⟨Home⟩⟨Office⟩ Figure 4 shows the spatial incidence of thesepatterns We can construct a similarity matrix through it asshown in Table 1

So the similarity measure can be concluded as = Sumof all extracted similarity weightsTotal number of patternswhere the high value represents the most similarity amongtwo users while the lower value represents the dissimilaritybetween the two users

So on the basis of the above similarity measure we candefine the profile sharing measurement of the given users as

Sharing value (FUH1 FUH

2THlocation areaTHtransition)

=

1003816100381610038161003816119901 isin FUH1 | exist119902 isin FUH2 sdot similarity (119901 119902CoL)10038161003816100381610038161003816100381610038161003816FUH1

1003816100381610038161003816

(11)

4 Dataset

As mentioned earlier the selected dataset is taken frommining project group of MIT Media labs [36] This datasetis collected from 100 people who are students for a period of9 month with total activity span of 350K hoursThe collecteddata is logged on Symbian mobile that is Nokia 6600 whichhas no GPS in it so all of the information related to userlocation is identified by cell ID only While in this datasetcell global identity header has partial information where onlyLAC and cell ID are available for location tracking of the userHowever users have provided semantic tagging informationto the most important locations over mobility history indata logs But overall this semantic tag information variesa lot in terms of annotations and usage from user to userAnd among these users only 94 gave their full informationregarding similaritymeasure through online survey for socialinteractions From these 94 users 7 of the users do not havecell logs and 10 have no cell annotation logs So for ouranalysis there are only 77 users available for analysis andevaluation

Table 1 Similarity matrix

11987511

11987512

11987521

11987522

11987511

mdash mdash 1 011987512

mdash mdash 0 111987521

1 0 mdash mdash11987522

0 1 mdash mdash

5 Experiments and Results

As mentioned we have chosen the reality mining dataset forour experimental purposes The results are as follows

51 Retrieval of Location Information and Removal of OutliersUsing Raw Cell ID As the data is quite old in its nature andthere are frequent changes in mobile network so presencesof outliers and missing values are obvious in the datasetso we applied our clustering approach [30] on the raw datato remove spatial outliers from the data and extract theirlocation information through Google APIs [30] The resultof applied technique is shown in Table 2

Figure 5(a) shows the cells retrieved from the GoogleAPI and Figure 5(b) shows the consolidated data withoutoutliers while Figure 5(c) shows the complete effect of spatialclustering over the data

52 Observation of Semantic Tags in Dataset After theextraction of outlier free data points we applied our spatialclustering techniques [30 31] on the clean data to cluster themin terms of stay points which may or may not have semantictags in them As Table 2 shows that each semantic locationobserved can carry multiple cell IDs so these cells can beclustered together to define a common location As most ofthe time stays at known places are usually tagged so thesesemantic locations are of more importance than untaggedstay points [31] as shown in Table 3

53 Cell Oscillation Resolution and Discovery of Stay PointsAs defined previously cell oscillation is a phenomenonobvious in GSM dataset where user is assigned multiple

8 The Scientific World Journal

Latitude (deg)

Longitude (deg)

(a)

Latitude (deg)

Longitude (deg)

(b)

(c)

Figure 5

Table 2 Locations retrieved with respect to user mobility

of cells Total number of unique cells against subject X 1744 100 of cellrsquos location retrieved through GoogleAPI 680 39

of cellrsquos location retrieved through semantictagged algorithm 88 3

Total of cellrsquos location retrieved 768 42

Table 3 Semantic location with of representative cells

Semantic location Number of cells incidenceMgh 5Office 4Airport 4Home 4Gregrsquos apt 4Grand parents 3Google 3Redhat 3Chicago OrsquoHare 2Topo hub 2

cell IDs while being stationary for load balancing Table 2shows that semantic location can be identified with multiplecell IDs which make it clear that beside the semantic taginformation we are bound to use spatiotemporal analysiswith location for pattern building and finding similar usersas this semantic tag is limited in the dataWe have applied ourspatial clustering technique [31] on the dataset for the removalof cell oscillation and using the overlapping area analysis weidentified stay pointswhich are not semantically tagged by the

Table 4 Overlapping of locations

Cells (LAC Cell ID) of locations represented11986230

711986240

611986223

611986214

511986256

411986244

411986225

311986247

211986227

211986239

2

user otherwise And as per our previous assumption GSMcells are distributed in bubble form that they overlap witheach other this assumption is evident from resultant Table 4

As result we retrieved all the locations as a clustered cellswhich are representative of user mobility history rather theraw cells In Figure 6 of the tagged locations are plotted overgeographical map which shows the vicinity of tagged placesfurther Figure 6 shows our assumption is correct as ChicagoOrsquoHare airport is tagged twice with overlapped cells by theuser which shows overlapping of cells and oscillation oversame place

54 Discovery of Mobility Patterns After the extractionof mobility history in form of clustered cells where eachcluster represents the stay point with or without user definedsemantic tag we applied time-stamping methodology onit defined in Section 332 On this time-stamped clusteredhistory we applied the mobility pattern extraction techniqueproposed in Section 333 We used the THstay time of 20minutes along with THtransition of 10 minutes which are

The Scientific World Journal 9

119899 2Chicago OrsquoHare

Figure 6 Semantic location plotted over geographical map

0

10

20

30

40

50

60

Pattern length

Log

cove

rage

2 3 4 5 6 7 8 9 100

Figure 7 Mobility pattern length over log

adopted formourwork [31] After extraction of thesemobilitypatterns we implemented the maximal trajectory patterntechnique proposed in Section 333 for the representationof the most frequent mobility pattern We plotted graphin Figure 7 representing the log time coverage of the usermobility with respect to the length of pattern to analyze themost frequent pattern length

The figure clearly shows that most of the patterns whichcover the log of user mobility are of length 3 after whichthe coverage is declined tremendously This result givesinformation about the transition phase length set for LnCSSand trend of user over time which we will be using laterduring similarity measure

As the explored patterns carry all the basic informationlike location semantic tag and time stamping we can plotthe user trend quite easily As shown in Figure 8 we plottedthe user location visit history against log history period andwe have selected top locations only due to of space limitationWe selected only two-month user history for this plottingThe figure clearly shows that user spent most of his time atknown places and rarely explored new places and this alsoshows one important fact about mobility data that user spentmost of his time on the locationwhich he tagged semanticallythat is home MGH office and so forth This satisfies ourassumption in Section 3 about trend of the use mobility Thistrend also shows some facts like that the user visits somelocations like home everyday regardless of weekdays andweekend and some locations like office every weekday onlywhereas some locations like Gregrsquos home and grandparentsuser usually visit once in a week and on weekends

1 2 3 4 5 6 7 8 9 10 11 120

10

20

30

40

50

60Home

Office

Grand parents

MghGoogle

Mob

ility

cove

rage

Time spent0

Gregrsquos apartment

Figure 8 Locations visit over log

We plotted usermobility ratio in term of exploration withtime for further analysis in Figure 9 which shows that uservisit average is of 2ndash5 places daily while in exception casesuser visited more than 7 locations at a single day We plottedthe location visiting frequency over a data of 30 consecutivedays

55 Similarity Measure between Users For the calculation ofsimilarity measure we constructed the similarity matrix onthe basis of two main parameters that is semantic patternand spatiotemporal similarity For this we use LnCSS and setthe THlocation area to 10 being a Euclidean distance betweenthe two points of location that belong to two different usersto ensure if they fall in same area or not further we set theTHtimecover as 20 minutes as described in Section 334

After formulating the similarity matrix we plotted sim-ilarity measure between user 119909 and other users along withother user similarity measurement models that is spatialcosine similarity (SCS) and extra-role colocation rate (ERCR)[37] for evaluation Spatial cosine similarity can be defined assimilarity of visitation frequencies of user 119909 and 119910 assignedby cosine of angle between the two vectors with respect tonumber of visit at each location And extra-role colocation isbased on probability of two users 119909 and 119910 to colocate in thesame hour at night or on weekends This relationship servesas a great indicator to determine the friendship between twousers

We plotted the similarity measure between users usingour proposed methodology named HA (Hybrid approach)SCS and ERCR using the two metrics that is mean averageprecision (MAP) and normalized discounted cumulative gain(NDCG)

In Figure 10 we plotted the user similarity based onMAPmatrices where MAP can be defined as

MAP = 1

119873

119873

sum119894=1

sum119870

119879=1(119875119894(119903) times rel

119894(119903))

10038161003816100381610038161198771198941003816100381610038161003816

(12)

where119873 is number of test users and |119877119894| indicates number of

similar users for test user 119909 119903 donates cut-off rank and 119875119894(119903)

represents precision of 119880119894over binary function rel

119894(119903)

Weplotted the similaritymeasure betweenusers using theNDCG matrices as defined by [38] in Figure 11

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Unsupervised User Similarity Mining in ...

2 The Scientific World Journal

Users similaritymatrix and

identification

Cell IDs to latlong points

Retrieve missingvaluesremove

outliers

Clustered cellswith projected

stay points

Userrsquos frequentpatterns

User profile withfrequent patterns

Retrieval oflocation

information

Outliers and Cell oscillationresolution and

stay pointsidentification

User patterndiscovery

User profilebuilding

Cell IDsUserrsquos

similaritymeasure

Userrsquos similaritymatrix and

identification

missing valueshandling

Figure 1 The proposed framework for userrsquos similarity measure

semantic information about the locations visited by the usercan also be used for the mobility extraction through itsmapping with physical location coordinates

As the low-level mobility data cannot be used for thehigh-level potential mobility applications we introduced acomplete framework in this paper to describe how thisinformation can be used to develop a mobility profile usingthe unsupervised clustering approach So the paper presentsthe extraction of spatiotemporalmobility trends andmobilityprofile building approach using the cellphone low-level logdata The contributions of the paper are (1) missing valuesextraction and removal of outliers using public cell ID andsemantic information (2) cell oscillation resolution andextraction of significant locations (3) extraction of significantlocations using overlapped area over time span and (4)semantic information usage for finalmobility profile buildingand userrsquos similarity finding

2 Related Work

Over recent years mobility data has become a rich sourceof human life trends and a lot of work has been done inthe area of spatial information extraction This motive is themain basis for many applications like city-wide sensing [10ndash14] where the privately held sensors were used through adeveloped model while personal sensors like mobile phonesand cameras are used for traffic monitoring system [15]through capturing the location information while socialbehavior is studied in their work [16ndash18] where informationis exploited through identification of significant places whereusers are active and later similarity analysis between themWhile route prediction and recommendation is studied byHull et al [19] through GPS installed sensors in a taxiusing the technique of opportunisticmessage forwarding Onthe other hand their work [20ndash22] is a cell-based locationawareness for user mobility analysis

In their work Zonoozi and Dassanayake [23] proposedtime optimization technique over cell residence for humanmobility analysis while Markoulidakis et al [24] proposed

prediction model through cell handover residence based onMarkov model by introducing Kalman filter for future visitprediction Akyildiz et al [25 26] proposed a predictionmodel that is based on motion speed position and historyMusolesi and Mascolo [27] categorized the mobility modelsinto traces and synthetics where they suggested that trace-based mobility models are more easy to implement ascompared to synthetic based due to its public data gatheringGonzalez et al [28] studied the spatiotemporal nature of usermobility based on pattern analysis through extraction of topK locations frommobility data of 100KNurmi andKoolwaaij[29] proposed a clustering technique for the extraction ofsignificant places using a graph-based transitionalmodel overcell tower location data

All of the above-mentionedwork is related to themobilityanalysis done over complete information about locationwithout consideration of missing values and change innetwork structure fall under data preprocessing and focusedon location extraction either all dependent on semanticawareness or otherwise ignoring the semantic informationall together In our work we have tackled with data pre-processing where outliers have been eliminated and data ismade consolidated for analysis and further cell oscillationissue is resolved for complete mobility profile building thenall this clustered information has been used for the mobilityprofile building through a proposed clustering techniquewhich is a mixture of semantic information and GSMnetwork property usage Our work is mainly focused onsuccessfulmobility profile building based on anaive approachthat is a mixture of both semantic information and rawlocation information where prior to profile building outlierextraction of location information from GSM cell globalidentity (CGI) and cell oscillation phenomenon are welldealt by experimenting them onMIT reality mining mobilitydataset

3 Methodology

Figure 1 shows the overall process of proposed methodology

The Scientific World Journal 3

Internet MSC BSC

BTS

BTS

BTS

BTS

MS

MS

MS

MS

Figure 2 Architecture of functional GSM network

(a) (b)

Figure 3 (a) Network topology hexagonal view (b) Cells overlapped view

31 Location Information Retrieval and Outlier RemovalThe basic GSM structure is shown in Figure 2 As shownthe base transceiver station (BTS) is a basic unit being arepresentative of the location area where multiple cells fallin This distribution is dependent on mobile operator and ishidden from users or public use

Mobile station (MS) moves in the network and gets itsconnection through base station controller (BSC) WhileBSC is connected to mobile services switching center (MSC)which connects different BSCs and MSCs over network Oneimportant identity is the location area which all BSCs shareconnected to common MSC

Each cell conceptually has polygon shape (Figure 3(a))but actually it has overlapping bubble shape as shown inFigure 3(b)

Now there are two main concerns regarding the extrac-tion of information from dataset firstly as mentioned earlierthe dataset is taken from MIT reality mining which haspartial information of cell global identity (CGI) about userlocation that is LAC and Cell ID so it is apparently hardto determine whether this partial set of information isenough for location extraction secondly the GSM network ischanging over time so LAC is reorganized or thrown awayby the operator so it is obvious that there will be missingvalues and outlier issues in dataset beside shift of GSM to 3Gtechnologies nowadays which disable us from determiningmost of the location information usingMIT dataset collectedin 2008

We proposed the methodology to deal with all theseproblems in our work [30] where we used the basic network

4 The Scientific World Journal

information to solve the missing values issue and the outlierresolution along with precise clustering of cells for futurelocation extraction and mobility profiling We proposed andused the clustering methodology where LAC and cell IDprovide all set of information for mobility profiling basicbuilding using the open source Google location API reverseengineering And formissing values we utilized the semanticinformation provided by the user in the dataset for preciselocation extraction and mobility profile building

32 Cell Oscillation Resolution and Role of Semantic Informa-tion As stated earlier in the problem statement cell oscil-lation is a common phenomenon in GSM network wherea user can be assigned multiple cell IDs while static whichleads to a fake mobility during mobility profile building dueto change in cell IDs over time We presented a methodologyin our work [31] for cell oscillation resolution using thesemantic tagging information and introducing the time stayphenomenon where overlapping cells represent the locationof interest rather than mobility In our mentioned work wenot only resolved the oscillation phenomenon successfullybut also we clustered the cells on the basis of semanticinformation provided by user for example home lab airportclub and so forth and overlapping time stay area so thatlater during mobility profiling this clustered informationcan be utilized for stay location identification We usedthe overlapped location information for identification ofsignificant places which can later be utilized for mobilityprofiling

33 Mobility Profile Building Mobility profile building is oneof the trivial tasks in any of the location base service (LBS)where the mobility profiling is done through extraction ofsignificant placesThe significant places can be defined as theplaces which are important for a user over geocoordinatesand most of the time the user stays on these locations Userusually semantically tags these locations or spends significantamount of time on these location or visits them frequentlyover the period of observation So it is clear that a significantplace can be a place where user spends most of time (homework etc) or user visits it frequently over a period of time(supermarket club) or user spends time significantly withoutfrequent visit (conference seminar travel) So it makes thediscovery of significant places valuable for LBS where userbehavior is the main source of stimulation As describedin previous sections the cell ID is the only viable solutionfor user profile building where the coverage is wide low inenergy consumption no data plan is required and availablein all kinds ofmobile phones so usermovement is available indata as set of different cell IDs are distributed over a networkBy observing the precise transition over these cells and usingthe effective technique this information can be used formobility profile building But construction of mobility profilebuilding is a complex process as it must deal with some ofthe following questions like when user moves over thousandsof cell IDs during period of time many of them cannot beavailable to extract their location information through cellID databases for example Google Open Cell ID so these

cellswill lead tomisinterpreted profiling there are dark placeswhere user lost the connection or user switched off the cellwhich seems to be significant places due to time spent bythe user and there is a lot of cell oscillations during usermovement which seems to be mobility even when user isstatic We have divided it into three parts (1) clustering of thecells for path finding and their fingerprinting (2) grouping ofsimilar patterns for projection of trends (3) profile buildingand (4) similarity measure between different users throughsharing property as adopted from [32] The whole process ofmobility profiling can be elaborated as follows

Let 119879 = 1199051 1199052 119905

119899 is set of towerslocations visited

by the user during the mobility we are interested in iden-tification of pattern group PG = pg

1 pg2 pg

119899 which

satisfies the THgroupcount which is group threshold wherePG is extracted from frequent mobility user history (FUH)defined over transition threshold THtransition location areathreshold THloctaion area and semantic tag information SEMFUH is retrieved through the visiting history VH retrievedthrough oscillation removal method [31] and time-stampingmethods

Algorithm 1 Frequent pattern discovery from userrsquos mobilityhistory

(1) Select complete list of user mobility in terms of celltowers visited 119879

(2) Apply cell oscillation technique on it and retrievedclustered cells 119862

(3) Apply proper time stamping on the clustered cellsretrieved after oscillation removal as visit history ofuser VH with complete spatiotemporal information

(4) Identify the frequent user mobility patterns FUHdefined over THloctain area and THtransition Group theidentified patterns G on the basis of their spatiotem-poral nature and semantic tags information usingprefix-span algorithm

(5) Repeat step (i) to (ii) until only supported group ofpatterns PG is identified

(i) if size of group 119892 satisfies the group supportthreshold THGorupcount

(ii) assign the group to supported group pattern set

Let the set of users119880 = 1199061 1199062 119906

119899 with the complete

information of pattern groups PGs we are interested tobuild an M[119894][119895] on the basis of each set of patterns 119901

119894and

119902119895 where two patterns belong to two potentially similar

users The similarity between two patters is calculated overLnCSS (longest common subsequence) and CoL (colocationprobability measure)

Algorithm 2 Userrsquos similarity measurement

(1) Select the users(2) Select the pattern group of two users(3) Repeat step (i) to (ii) for each 119901 and 119902 belongs to two

different users

The Scientific World Journal 5

(i) calculate the LnCSS and CoL property of twogiven patterns

(ii) if two patterns satisfy the minimum support ofsimilarity than compute M(119901 119902) as 1 otherwisecompute it as 0

(4) Calculate the userrsquos similarity based on similaritymatrix

331 Clustering of Subsequences in Mobility History Data Asmentioned earlier we retrieved the clustered cells by imple-menting the cell oscillation algorithm where it is guaranteedthat the data has no oscillation problem We adopted naıveclustering approach which clusters the cells on the basis ofcircular subsequences The baseline algorithm works on thefact that common cell IDs can be merged together to makea circular subsequence which can represent user behavior[31] We implemented two proposed algorithms successfully[31] for the resolution of cell oscillation and retrieval ofcommon clustered cells to represent the significant placefor the user using the overlapped area however details ofthe algorithm are given in our previous work It is alsoof due importance that as mentioned earlier the proposedalgorithm also resolved the problem of missing values whichcan occur due to network change or nonavailability of cell-ID information in open cell-ID databases For examplethe mobility sequence 119862

1 1198622 1198623 1198624 1198625has 119862

3and 119862

4no

retrievable through cell-ID database which obliviously willlead to a mislead mobility of the user our algorithm willreplace these two cells using the majority voting mechanismwith the most likely cell in the cluster so that mobilityremains traceable with the most likely authentication Theretrieved clusters carry the information about significantplaces as a semantic tag like home lab club and so forth asthe derived algorithm can infer the significant place based ontime spent over an overlapped area between different cellswe tagged such location with a convention of ldquostay pointrdquoAfter the retrieval of the clustered cells we implemented thefingerprinting on them to represent the percentage of timespent by the user on particular cells and as a whole on onecluster that is stay points Each cluster is being representedas a sequence of cell IDs that is

Cluster1[1198621 1198622 11986215]

Cluster2[1198623 1198625 1198627 119862

19]

Cluster119899[1198627 11986218 119862

24]

(1)

Each cluster represents the stay point which may or may nothave user defined semantic tag associated with it

332 Time Stamping of Clustered Cells We bound the timepercentage with each of the cells in the cluster alongwith totaltime spends on that particular cluster with time stamp Thetime stamping is the most important part of fingerprinting

where user behavior can be determined easily over timedateslicing The structure of fingerprint is as

[11986211198791 11986221198792 119862

119899119879119899

TotaltimeTime stamp Semantic tag] (2)

where each cluster represents the stay point of the user alongwith percentage time on it as 119879 date and semantic taginformation if available After this process we have a set ofall the stay points or significant places user had visited overtime so we converted the given history of user distributedover cell IDs to stay points in form of clusters as follows

VH = Cluster1Cluster

2 Cluster

119899 (3)

where VH represents the complete mobility history of theuser in terms of stay points in formof spatiotemporal clustersthat is Cluster

1Cluster

2 Cluster

119899 This information can

be used to identify the Trajectory patterns against a particularuser to build up the mobility profile by grouping themdistributed over time After the extraction of patterns weneed to group the patterns and discard infrequent patternthrough usage of minimum support count otherwise theresultant pattern will be in millions for such a huge amountof data which will be a burden in the memory So weare interested in only frequent pattern through usage ofspatiotemporal information and support count So we candefine group G as subset of VH such that it satisfies thefollowing two rules

(i) 1198751 1198752 119875

119899isin 119892 if 119892 sdot Distance le THlocation area and

119892 sdot Time le 119892 sdot THstay time

(ii) |119892| ge THgroup support

333 Extraction of Mobility Patterns from Clustered StayPoints After the retrieval of all the stay points in the givenspatiotemporal history of mobile user we can transform itinto trajectory pattern which the user follows over time Wecan define a pattern 119875 as a trip over two or more consecutivestay points SPs in ordered set of VHor trip between stay pointSP and firstlast point of user mobility history VH We candefine the pattern 119875 as

119875 = SP119898 SP

119899 where 0 lt 119898 le 119899

119875119894(VH119894 SP

119898 cap VH

119894+1SP119899 ) Or

119875 = SP1 SP

119898 where 0 lt 119898 le 119900

119875119894(VH119894SP119898 ) Or

119875 = SP119899 SP

119900 where 0 lt 119899 le 119900

119875119894(VH119894 SP

119899)

(4)

Further for the extraction of the true patterns from usermobility we introduced the transition time threshold toensures the continuity of the trip where this transitionthreshold ensure the smooth transition of user from onestay point to another and differentiates one visiting pattern

6 The Scientific World Journal

from another For the extraction of mobility pattern from thesemantically arranged stay points we used prefix-span algo-rithm [33] on user mobility history The result of algorithmgave semantic pattern of the user over time for example⟨Home LabGoogle Grand parents⟩ along with all ofits subsequences As the result of the algorithm gave redun-dant patterns we adopted maximal trajectory pattern [34]technique for the representation of user mobility pattern Sothe resultant extracted patterns are true representatives offrequent mobility of user (FUM) called over time as

FUM = Ptn1 ⟨Home Stay point OfficeGoogle

Stay point ⟩

Ptn2 ⟨Home Super marketTopo hub

Grand parentsClub⟩

Ptn119899 ⟨OfficeBank Stay PointPark⟩

(5)

334 Projection of Similarity between Different UsersThrough Pattern Matching After the successful extractionof user pattern we determine if two patterns aresimilar through the longest common subsequence(LnCSS) measure For example if there is pattern119875 = ⟨Home Stay point OfficeGoogle Stay point⟩and 119876 = ⟨Home OfficeBank Stay point⟩ their LnCSSwill be = ⟨Home Office Stay point⟩ which we candefine as

Ratio ( LnCSS (119901 119902) 119901)

=sum|119901|

119894=1sum|LnCSS(119901119902)|119895=1

M (119901119894 LnCSS

119895)

10038161003816100381610038161199011003816100381610038161003816

M (119901119894 LnCSS

119895) =

119901119894isin LnCSS

119895

10038161003816100381610038161199011003816100381610038161003816

if LnCSS119895is matching to 119901

119894otherwise = 0

(6)

But the longest common subsequence (LnCSS)measurementis not only based on merely semantic tag based as same placeis tagged with different names against different users forexample Media Lab is tagged as Lab Media Lab Work laband so forth so we introduced the time threshold for thetransition that is THtrans between two stay points (as everysemantically tagged location consists of cell towers) furtherwe introduced the spatial property threshold using locationTHlocation area area so it together joins the spatiotemporalproperty of user mobility

So we can define the similarity between two users giventwo frequent user mobility (FUH) FUH

1= Pattern

11

Pattern12 Pattern

1119899 and FUH

2= Pattern

21

Pattern22 Pattern

2119899 and defined time threshold

THtime cover and location area threshold THlocation area wecan define similarity between users as

Similarity (FUH1 FUH

2THtime coverTHlocation area)

= Location area (Ptn11Ptn2119894)

+ Distance (Ptn1119899Ptn2119895le THlocation area)

CoL (Ptn11Ptn2119894) le 1 where

119894 119895 belongs to 119873 | 0 lt 119894 le 119895 le 119898

(7)

where CoL is colocation property of that find is user 119909 and 119910visit location 119897 at the same time to exploit their spatiotemporalproperty beside semantic tags as mentioned semantic tagscannot assure the true accuracy of similarity measure dueto different conventions used by different users for the sameplace Colocation rate idea is adopted from their work [35]where the most likely location of the user 119909 is defined as

119871 (119909) = arg119897isinLoc119875 (119909 119897) (8)

where 119871 is the most likely location of user 119909 and Locrepresents the cell towers or locations set user traversed overtime during mobility while 119875 is the probability of the user 119909to visit location 119897 and can be defined as

119875 (119909 119897) =

119899(119909)

sum119894=1

120575 (119897 119871119894(119909))

119899 (119909) (9)

where 120575(119909 119910) = 1 if 119909 = 119910 otherwise it is 0 Furtherdistance between user 119909 and 119910 can be defined as 119889(119909 119910) =dist(119871(119909) 119871(119910)) to represent physical distance between theirfrequent locations So on the basis of it colocation rate can bedefined as follows

CoL =

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816) 120575 (119871

119894(119909) minus 119871

119895(119910))

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816)

(10)

As (10) shows that colocation rate counts both time andlocation simultaneously for two users that is 119909 and 119910 So itbinds both spatiotemporal trends together normalized overthe number of times the users visited the location whereΘ isHeaviside function and Δ119879 is equal to 119879stay time

On the basis of the above similarity measuring valuewe can calculate the similarity between the given user pairsthrough comparing their patterns over three basic unitssemantic tag spatial value and temporal value For this weconstruct a similarity matrix which gives a clear picture ofsimilarity measure against every frequent pattern pair oftwo different users After calculating the similarity betweenpatterns of two users these values are used to calculatethe overall similarity measure between two users to inferif they are related to each other or not For examplewe have two userrsquos FUH

1= 119875

11 11987512 and FUH

2=

11987521 11987522 to find their similarity with a given pair of

The Scientific World Journal 7

Office

School

Bank

Stay point

User 1

User 211987521

11987511

11987512

11987522

Figure 4 User mobility pattern

patterns where 11987511= ⟨Home⟩⟨BankOffice⟩⟨Stay point⟩

and 11987512

= ⟨Home⟩ ⟨School⟩ ⟨Office⟩ while 11987521

=

⟨Home⟩⟨BankOffice⟩⟨Stay point⟩ and 11987522

= ⟨Home⟩⟨Office⟩ Figure 4 shows the spatial incidence of thesepatterns We can construct a similarity matrix through it asshown in Table 1

So the similarity measure can be concluded as = Sumof all extracted similarity weightsTotal number of patternswhere the high value represents the most similarity amongtwo users while the lower value represents the dissimilaritybetween the two users

So on the basis of the above similarity measure we candefine the profile sharing measurement of the given users as

Sharing value (FUH1 FUH

2THlocation areaTHtransition)

=

1003816100381610038161003816119901 isin FUH1 | exist119902 isin FUH2 sdot similarity (119901 119902CoL)10038161003816100381610038161003816100381610038161003816FUH1

1003816100381610038161003816

(11)

4 Dataset

As mentioned earlier the selected dataset is taken frommining project group of MIT Media labs [36] This datasetis collected from 100 people who are students for a period of9 month with total activity span of 350K hoursThe collecteddata is logged on Symbian mobile that is Nokia 6600 whichhas no GPS in it so all of the information related to userlocation is identified by cell ID only While in this datasetcell global identity header has partial information where onlyLAC and cell ID are available for location tracking of the userHowever users have provided semantic tagging informationto the most important locations over mobility history indata logs But overall this semantic tag information variesa lot in terms of annotations and usage from user to userAnd among these users only 94 gave their full informationregarding similaritymeasure through online survey for socialinteractions From these 94 users 7 of the users do not havecell logs and 10 have no cell annotation logs So for ouranalysis there are only 77 users available for analysis andevaluation

Table 1 Similarity matrix

11987511

11987512

11987521

11987522

11987511

mdash mdash 1 011987512

mdash mdash 0 111987521

1 0 mdash mdash11987522

0 1 mdash mdash

5 Experiments and Results

As mentioned we have chosen the reality mining dataset forour experimental purposes The results are as follows

51 Retrieval of Location Information and Removal of OutliersUsing Raw Cell ID As the data is quite old in its nature andthere are frequent changes in mobile network so presencesof outliers and missing values are obvious in the datasetso we applied our clustering approach [30] on the raw datato remove spatial outliers from the data and extract theirlocation information through Google APIs [30] The resultof applied technique is shown in Table 2

Figure 5(a) shows the cells retrieved from the GoogleAPI and Figure 5(b) shows the consolidated data withoutoutliers while Figure 5(c) shows the complete effect of spatialclustering over the data

52 Observation of Semantic Tags in Dataset After theextraction of outlier free data points we applied our spatialclustering techniques [30 31] on the clean data to cluster themin terms of stay points which may or may not have semantictags in them As Table 2 shows that each semantic locationobserved can carry multiple cell IDs so these cells can beclustered together to define a common location As most ofthe time stays at known places are usually tagged so thesesemantic locations are of more importance than untaggedstay points [31] as shown in Table 3

53 Cell Oscillation Resolution and Discovery of Stay PointsAs defined previously cell oscillation is a phenomenonobvious in GSM dataset where user is assigned multiple

8 The Scientific World Journal

Latitude (deg)

Longitude (deg)

(a)

Latitude (deg)

Longitude (deg)

(b)

(c)

Figure 5

Table 2 Locations retrieved with respect to user mobility

of cells Total number of unique cells against subject X 1744 100 of cellrsquos location retrieved through GoogleAPI 680 39

of cellrsquos location retrieved through semantictagged algorithm 88 3

Total of cellrsquos location retrieved 768 42

Table 3 Semantic location with of representative cells

Semantic location Number of cells incidenceMgh 5Office 4Airport 4Home 4Gregrsquos apt 4Grand parents 3Google 3Redhat 3Chicago OrsquoHare 2Topo hub 2

cell IDs while being stationary for load balancing Table 2shows that semantic location can be identified with multiplecell IDs which make it clear that beside the semantic taginformation we are bound to use spatiotemporal analysiswith location for pattern building and finding similar usersas this semantic tag is limited in the dataWe have applied ourspatial clustering technique [31] on the dataset for the removalof cell oscillation and using the overlapping area analysis weidentified stay pointswhich are not semantically tagged by the

Table 4 Overlapping of locations

Cells (LAC Cell ID) of locations represented11986230

711986240

611986223

611986214

511986256

411986244

411986225

311986247

211986227

211986239

2

user otherwise And as per our previous assumption GSMcells are distributed in bubble form that they overlap witheach other this assumption is evident from resultant Table 4

As result we retrieved all the locations as a clustered cellswhich are representative of user mobility history rather theraw cells In Figure 6 of the tagged locations are plotted overgeographical map which shows the vicinity of tagged placesfurther Figure 6 shows our assumption is correct as ChicagoOrsquoHare airport is tagged twice with overlapped cells by theuser which shows overlapping of cells and oscillation oversame place

54 Discovery of Mobility Patterns After the extractionof mobility history in form of clustered cells where eachcluster represents the stay point with or without user definedsemantic tag we applied time-stamping methodology onit defined in Section 332 On this time-stamped clusteredhistory we applied the mobility pattern extraction techniqueproposed in Section 333 We used the THstay time of 20minutes along with THtransition of 10 minutes which are

The Scientific World Journal 9

119899 2Chicago OrsquoHare

Figure 6 Semantic location plotted over geographical map

0

10

20

30

40

50

60

Pattern length

Log

cove

rage

2 3 4 5 6 7 8 9 100

Figure 7 Mobility pattern length over log

adopted formourwork [31] After extraction of thesemobilitypatterns we implemented the maximal trajectory patterntechnique proposed in Section 333 for the representationof the most frequent mobility pattern We plotted graphin Figure 7 representing the log time coverage of the usermobility with respect to the length of pattern to analyze themost frequent pattern length

The figure clearly shows that most of the patterns whichcover the log of user mobility are of length 3 after whichthe coverage is declined tremendously This result givesinformation about the transition phase length set for LnCSSand trend of user over time which we will be using laterduring similarity measure

As the explored patterns carry all the basic informationlike location semantic tag and time stamping we can plotthe user trend quite easily As shown in Figure 8 we plottedthe user location visit history against log history period andwe have selected top locations only due to of space limitationWe selected only two-month user history for this plottingThe figure clearly shows that user spent most of his time atknown places and rarely explored new places and this alsoshows one important fact about mobility data that user spentmost of his time on the locationwhich he tagged semanticallythat is home MGH office and so forth This satisfies ourassumption in Section 3 about trend of the use mobility Thistrend also shows some facts like that the user visits somelocations like home everyday regardless of weekdays andweekend and some locations like office every weekday onlywhereas some locations like Gregrsquos home and grandparentsuser usually visit once in a week and on weekends

1 2 3 4 5 6 7 8 9 10 11 120

10

20

30

40

50

60Home

Office

Grand parents

MghGoogle

Mob

ility

cove

rage

Time spent0

Gregrsquos apartment

Figure 8 Locations visit over log

We plotted usermobility ratio in term of exploration withtime for further analysis in Figure 9 which shows that uservisit average is of 2ndash5 places daily while in exception casesuser visited more than 7 locations at a single day We plottedthe location visiting frequency over a data of 30 consecutivedays

55 Similarity Measure between Users For the calculation ofsimilarity measure we constructed the similarity matrix onthe basis of two main parameters that is semantic patternand spatiotemporal similarity For this we use LnCSS and setthe THlocation area to 10 being a Euclidean distance betweenthe two points of location that belong to two different usersto ensure if they fall in same area or not further we set theTHtimecover as 20 minutes as described in Section 334

After formulating the similarity matrix we plotted sim-ilarity measure between user 119909 and other users along withother user similarity measurement models that is spatialcosine similarity (SCS) and extra-role colocation rate (ERCR)[37] for evaluation Spatial cosine similarity can be defined assimilarity of visitation frequencies of user 119909 and 119910 assignedby cosine of angle between the two vectors with respect tonumber of visit at each location And extra-role colocation isbased on probability of two users 119909 and 119910 to colocate in thesame hour at night or on weekends This relationship servesas a great indicator to determine the friendship between twousers

We plotted the similarity measure between users usingour proposed methodology named HA (Hybrid approach)SCS and ERCR using the two metrics that is mean averageprecision (MAP) and normalized discounted cumulative gain(NDCG)

In Figure 10 we plotted the user similarity based onMAPmatrices where MAP can be defined as

MAP = 1

119873

119873

sum119894=1

sum119870

119879=1(119875119894(119903) times rel

119894(119903))

10038161003816100381610038161198771198941003816100381610038161003816

(12)

where119873 is number of test users and |119877119894| indicates number of

similar users for test user 119909 119903 donates cut-off rank and 119875119894(119903)

represents precision of 119880119894over binary function rel

119894(119903)

Weplotted the similaritymeasure betweenusers using theNDCG matrices as defined by [38] in Figure 11

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Unsupervised User Similarity Mining in ...

The Scientific World Journal 3

Internet MSC BSC

BTS

BTS

BTS

BTS

MS

MS

MS

MS

Figure 2 Architecture of functional GSM network

(a) (b)

Figure 3 (a) Network topology hexagonal view (b) Cells overlapped view

31 Location Information Retrieval and Outlier RemovalThe basic GSM structure is shown in Figure 2 As shownthe base transceiver station (BTS) is a basic unit being arepresentative of the location area where multiple cells fallin This distribution is dependent on mobile operator and ishidden from users or public use

Mobile station (MS) moves in the network and gets itsconnection through base station controller (BSC) WhileBSC is connected to mobile services switching center (MSC)which connects different BSCs and MSCs over network Oneimportant identity is the location area which all BSCs shareconnected to common MSC

Each cell conceptually has polygon shape (Figure 3(a))but actually it has overlapping bubble shape as shown inFigure 3(b)

Now there are two main concerns regarding the extrac-tion of information from dataset firstly as mentioned earlierthe dataset is taken from MIT reality mining which haspartial information of cell global identity (CGI) about userlocation that is LAC and Cell ID so it is apparently hardto determine whether this partial set of information isenough for location extraction secondly the GSM network ischanging over time so LAC is reorganized or thrown awayby the operator so it is obvious that there will be missingvalues and outlier issues in dataset beside shift of GSM to 3Gtechnologies nowadays which disable us from determiningmost of the location information usingMIT dataset collectedin 2008

We proposed the methodology to deal with all theseproblems in our work [30] where we used the basic network

4 The Scientific World Journal

information to solve the missing values issue and the outlierresolution along with precise clustering of cells for futurelocation extraction and mobility profiling We proposed andused the clustering methodology where LAC and cell IDprovide all set of information for mobility profiling basicbuilding using the open source Google location API reverseengineering And formissing values we utilized the semanticinformation provided by the user in the dataset for preciselocation extraction and mobility profile building

32 Cell Oscillation Resolution and Role of Semantic Informa-tion As stated earlier in the problem statement cell oscil-lation is a common phenomenon in GSM network wherea user can be assigned multiple cell IDs while static whichleads to a fake mobility during mobility profile building dueto change in cell IDs over time We presented a methodologyin our work [31] for cell oscillation resolution using thesemantic tagging information and introducing the time stayphenomenon where overlapping cells represent the locationof interest rather than mobility In our mentioned work wenot only resolved the oscillation phenomenon successfullybut also we clustered the cells on the basis of semanticinformation provided by user for example home lab airportclub and so forth and overlapping time stay area so thatlater during mobility profiling this clustered informationcan be utilized for stay location identification We usedthe overlapped location information for identification ofsignificant places which can later be utilized for mobilityprofiling

33 Mobility Profile Building Mobility profile building is oneof the trivial tasks in any of the location base service (LBS)where the mobility profiling is done through extraction ofsignificant placesThe significant places can be defined as theplaces which are important for a user over geocoordinatesand most of the time the user stays on these locations Userusually semantically tags these locations or spends significantamount of time on these location or visits them frequentlyover the period of observation So it is clear that a significantplace can be a place where user spends most of time (homework etc) or user visits it frequently over a period of time(supermarket club) or user spends time significantly withoutfrequent visit (conference seminar travel) So it makes thediscovery of significant places valuable for LBS where userbehavior is the main source of stimulation As describedin previous sections the cell ID is the only viable solutionfor user profile building where the coverage is wide low inenergy consumption no data plan is required and availablein all kinds ofmobile phones so usermovement is available indata as set of different cell IDs are distributed over a networkBy observing the precise transition over these cells and usingthe effective technique this information can be used formobility profile building But construction of mobility profilebuilding is a complex process as it must deal with some ofthe following questions like when user moves over thousandsof cell IDs during period of time many of them cannot beavailable to extract their location information through cellID databases for example Google Open Cell ID so these

cellswill lead tomisinterpreted profiling there are dark placeswhere user lost the connection or user switched off the cellwhich seems to be significant places due to time spent bythe user and there is a lot of cell oscillations during usermovement which seems to be mobility even when user isstatic We have divided it into three parts (1) clustering of thecells for path finding and their fingerprinting (2) grouping ofsimilar patterns for projection of trends (3) profile buildingand (4) similarity measure between different users throughsharing property as adopted from [32] The whole process ofmobility profiling can be elaborated as follows

Let 119879 = 1199051 1199052 119905

119899 is set of towerslocations visited

by the user during the mobility we are interested in iden-tification of pattern group PG = pg

1 pg2 pg

119899 which

satisfies the THgroupcount which is group threshold wherePG is extracted from frequent mobility user history (FUH)defined over transition threshold THtransition location areathreshold THloctaion area and semantic tag information SEMFUH is retrieved through the visiting history VH retrievedthrough oscillation removal method [31] and time-stampingmethods

Algorithm 1 Frequent pattern discovery from userrsquos mobilityhistory

(1) Select complete list of user mobility in terms of celltowers visited 119879

(2) Apply cell oscillation technique on it and retrievedclustered cells 119862

(3) Apply proper time stamping on the clustered cellsretrieved after oscillation removal as visit history ofuser VH with complete spatiotemporal information

(4) Identify the frequent user mobility patterns FUHdefined over THloctain area and THtransition Group theidentified patterns G on the basis of their spatiotem-poral nature and semantic tags information usingprefix-span algorithm

(5) Repeat step (i) to (ii) until only supported group ofpatterns PG is identified

(i) if size of group 119892 satisfies the group supportthreshold THGorupcount

(ii) assign the group to supported group pattern set

Let the set of users119880 = 1199061 1199062 119906

119899 with the complete

information of pattern groups PGs we are interested tobuild an M[119894][119895] on the basis of each set of patterns 119901

119894and

119902119895 where two patterns belong to two potentially similar

users The similarity between two patters is calculated overLnCSS (longest common subsequence) and CoL (colocationprobability measure)

Algorithm 2 Userrsquos similarity measurement

(1) Select the users(2) Select the pattern group of two users(3) Repeat step (i) to (ii) for each 119901 and 119902 belongs to two

different users

The Scientific World Journal 5

(i) calculate the LnCSS and CoL property of twogiven patterns

(ii) if two patterns satisfy the minimum support ofsimilarity than compute M(119901 119902) as 1 otherwisecompute it as 0

(4) Calculate the userrsquos similarity based on similaritymatrix

331 Clustering of Subsequences in Mobility History Data Asmentioned earlier we retrieved the clustered cells by imple-menting the cell oscillation algorithm where it is guaranteedthat the data has no oscillation problem We adopted naıveclustering approach which clusters the cells on the basis ofcircular subsequences The baseline algorithm works on thefact that common cell IDs can be merged together to makea circular subsequence which can represent user behavior[31] We implemented two proposed algorithms successfully[31] for the resolution of cell oscillation and retrieval ofcommon clustered cells to represent the significant placefor the user using the overlapped area however details ofthe algorithm are given in our previous work It is alsoof due importance that as mentioned earlier the proposedalgorithm also resolved the problem of missing values whichcan occur due to network change or nonavailability of cell-ID information in open cell-ID databases For examplethe mobility sequence 119862

1 1198622 1198623 1198624 1198625has 119862

3and 119862

4no

retrievable through cell-ID database which obliviously willlead to a mislead mobility of the user our algorithm willreplace these two cells using the majority voting mechanismwith the most likely cell in the cluster so that mobilityremains traceable with the most likely authentication Theretrieved clusters carry the information about significantplaces as a semantic tag like home lab club and so forth asthe derived algorithm can infer the significant place based ontime spent over an overlapped area between different cellswe tagged such location with a convention of ldquostay pointrdquoAfter the retrieval of the clustered cells we implemented thefingerprinting on them to represent the percentage of timespent by the user on particular cells and as a whole on onecluster that is stay points Each cluster is being representedas a sequence of cell IDs that is

Cluster1[1198621 1198622 11986215]

Cluster2[1198623 1198625 1198627 119862

19]

Cluster119899[1198627 11986218 119862

24]

(1)

Each cluster represents the stay point which may or may nothave user defined semantic tag associated with it

332 Time Stamping of Clustered Cells We bound the timepercentage with each of the cells in the cluster alongwith totaltime spends on that particular cluster with time stamp Thetime stamping is the most important part of fingerprinting

where user behavior can be determined easily over timedateslicing The structure of fingerprint is as

[11986211198791 11986221198792 119862

119899119879119899

TotaltimeTime stamp Semantic tag] (2)

where each cluster represents the stay point of the user alongwith percentage time on it as 119879 date and semantic taginformation if available After this process we have a set ofall the stay points or significant places user had visited overtime so we converted the given history of user distributedover cell IDs to stay points in form of clusters as follows

VH = Cluster1Cluster

2 Cluster

119899 (3)

where VH represents the complete mobility history of theuser in terms of stay points in formof spatiotemporal clustersthat is Cluster

1Cluster

2 Cluster

119899 This information can

be used to identify the Trajectory patterns against a particularuser to build up the mobility profile by grouping themdistributed over time After the extraction of patterns weneed to group the patterns and discard infrequent patternthrough usage of minimum support count otherwise theresultant pattern will be in millions for such a huge amountof data which will be a burden in the memory So weare interested in only frequent pattern through usage ofspatiotemporal information and support count So we candefine group G as subset of VH such that it satisfies thefollowing two rules

(i) 1198751 1198752 119875

119899isin 119892 if 119892 sdot Distance le THlocation area and

119892 sdot Time le 119892 sdot THstay time

(ii) |119892| ge THgroup support

333 Extraction of Mobility Patterns from Clustered StayPoints After the retrieval of all the stay points in the givenspatiotemporal history of mobile user we can transform itinto trajectory pattern which the user follows over time Wecan define a pattern 119875 as a trip over two or more consecutivestay points SPs in ordered set of VHor trip between stay pointSP and firstlast point of user mobility history VH We candefine the pattern 119875 as

119875 = SP119898 SP

119899 where 0 lt 119898 le 119899

119875119894(VH119894 SP

119898 cap VH

119894+1SP119899 ) Or

119875 = SP1 SP

119898 where 0 lt 119898 le 119900

119875119894(VH119894SP119898 ) Or

119875 = SP119899 SP

119900 where 0 lt 119899 le 119900

119875119894(VH119894 SP

119899)

(4)

Further for the extraction of the true patterns from usermobility we introduced the transition time threshold toensures the continuity of the trip where this transitionthreshold ensure the smooth transition of user from onestay point to another and differentiates one visiting pattern

6 The Scientific World Journal

from another For the extraction of mobility pattern from thesemantically arranged stay points we used prefix-span algo-rithm [33] on user mobility history The result of algorithmgave semantic pattern of the user over time for example⟨Home LabGoogle Grand parents⟩ along with all ofits subsequences As the result of the algorithm gave redun-dant patterns we adopted maximal trajectory pattern [34]technique for the representation of user mobility pattern Sothe resultant extracted patterns are true representatives offrequent mobility of user (FUM) called over time as

FUM = Ptn1 ⟨Home Stay point OfficeGoogle

Stay point ⟩

Ptn2 ⟨Home Super marketTopo hub

Grand parentsClub⟩

Ptn119899 ⟨OfficeBank Stay PointPark⟩

(5)

334 Projection of Similarity between Different UsersThrough Pattern Matching After the successful extractionof user pattern we determine if two patterns aresimilar through the longest common subsequence(LnCSS) measure For example if there is pattern119875 = ⟨Home Stay point OfficeGoogle Stay point⟩and 119876 = ⟨Home OfficeBank Stay point⟩ their LnCSSwill be = ⟨Home Office Stay point⟩ which we candefine as

Ratio ( LnCSS (119901 119902) 119901)

=sum|119901|

119894=1sum|LnCSS(119901119902)|119895=1

M (119901119894 LnCSS

119895)

10038161003816100381610038161199011003816100381610038161003816

M (119901119894 LnCSS

119895) =

119901119894isin LnCSS

119895

10038161003816100381610038161199011003816100381610038161003816

if LnCSS119895is matching to 119901

119894otherwise = 0

(6)

But the longest common subsequence (LnCSS)measurementis not only based on merely semantic tag based as same placeis tagged with different names against different users forexample Media Lab is tagged as Lab Media Lab Work laband so forth so we introduced the time threshold for thetransition that is THtrans between two stay points (as everysemantically tagged location consists of cell towers) furtherwe introduced the spatial property threshold using locationTHlocation area area so it together joins the spatiotemporalproperty of user mobility

So we can define the similarity between two users giventwo frequent user mobility (FUH) FUH

1= Pattern

11

Pattern12 Pattern

1119899 and FUH

2= Pattern

21

Pattern22 Pattern

2119899 and defined time threshold

THtime cover and location area threshold THlocation area wecan define similarity between users as

Similarity (FUH1 FUH

2THtime coverTHlocation area)

= Location area (Ptn11Ptn2119894)

+ Distance (Ptn1119899Ptn2119895le THlocation area)

CoL (Ptn11Ptn2119894) le 1 where

119894 119895 belongs to 119873 | 0 lt 119894 le 119895 le 119898

(7)

where CoL is colocation property of that find is user 119909 and 119910visit location 119897 at the same time to exploit their spatiotemporalproperty beside semantic tags as mentioned semantic tagscannot assure the true accuracy of similarity measure dueto different conventions used by different users for the sameplace Colocation rate idea is adopted from their work [35]where the most likely location of the user 119909 is defined as

119871 (119909) = arg119897isinLoc119875 (119909 119897) (8)

where 119871 is the most likely location of user 119909 and Locrepresents the cell towers or locations set user traversed overtime during mobility while 119875 is the probability of the user 119909to visit location 119897 and can be defined as

119875 (119909 119897) =

119899(119909)

sum119894=1

120575 (119897 119871119894(119909))

119899 (119909) (9)

where 120575(119909 119910) = 1 if 119909 = 119910 otherwise it is 0 Furtherdistance between user 119909 and 119910 can be defined as 119889(119909 119910) =dist(119871(119909) 119871(119910)) to represent physical distance between theirfrequent locations So on the basis of it colocation rate can bedefined as follows

CoL =

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816) 120575 (119871

119894(119909) minus 119871

119895(119910))

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816)

(10)

As (10) shows that colocation rate counts both time andlocation simultaneously for two users that is 119909 and 119910 So itbinds both spatiotemporal trends together normalized overthe number of times the users visited the location whereΘ isHeaviside function and Δ119879 is equal to 119879stay time

On the basis of the above similarity measuring valuewe can calculate the similarity between the given user pairsthrough comparing their patterns over three basic unitssemantic tag spatial value and temporal value For this weconstruct a similarity matrix which gives a clear picture ofsimilarity measure against every frequent pattern pair oftwo different users After calculating the similarity betweenpatterns of two users these values are used to calculatethe overall similarity measure between two users to inferif they are related to each other or not For examplewe have two userrsquos FUH

1= 119875

11 11987512 and FUH

2=

11987521 11987522 to find their similarity with a given pair of

The Scientific World Journal 7

Office

School

Bank

Stay point

User 1

User 211987521

11987511

11987512

11987522

Figure 4 User mobility pattern

patterns where 11987511= ⟨Home⟩⟨BankOffice⟩⟨Stay point⟩

and 11987512

= ⟨Home⟩ ⟨School⟩ ⟨Office⟩ while 11987521

=

⟨Home⟩⟨BankOffice⟩⟨Stay point⟩ and 11987522

= ⟨Home⟩⟨Office⟩ Figure 4 shows the spatial incidence of thesepatterns We can construct a similarity matrix through it asshown in Table 1

So the similarity measure can be concluded as = Sumof all extracted similarity weightsTotal number of patternswhere the high value represents the most similarity amongtwo users while the lower value represents the dissimilaritybetween the two users

So on the basis of the above similarity measure we candefine the profile sharing measurement of the given users as

Sharing value (FUH1 FUH

2THlocation areaTHtransition)

=

1003816100381610038161003816119901 isin FUH1 | exist119902 isin FUH2 sdot similarity (119901 119902CoL)10038161003816100381610038161003816100381610038161003816FUH1

1003816100381610038161003816

(11)

4 Dataset

As mentioned earlier the selected dataset is taken frommining project group of MIT Media labs [36] This datasetis collected from 100 people who are students for a period of9 month with total activity span of 350K hoursThe collecteddata is logged on Symbian mobile that is Nokia 6600 whichhas no GPS in it so all of the information related to userlocation is identified by cell ID only While in this datasetcell global identity header has partial information where onlyLAC and cell ID are available for location tracking of the userHowever users have provided semantic tagging informationto the most important locations over mobility history indata logs But overall this semantic tag information variesa lot in terms of annotations and usage from user to userAnd among these users only 94 gave their full informationregarding similaritymeasure through online survey for socialinteractions From these 94 users 7 of the users do not havecell logs and 10 have no cell annotation logs So for ouranalysis there are only 77 users available for analysis andevaluation

Table 1 Similarity matrix

11987511

11987512

11987521

11987522

11987511

mdash mdash 1 011987512

mdash mdash 0 111987521

1 0 mdash mdash11987522

0 1 mdash mdash

5 Experiments and Results

As mentioned we have chosen the reality mining dataset forour experimental purposes The results are as follows

51 Retrieval of Location Information and Removal of OutliersUsing Raw Cell ID As the data is quite old in its nature andthere are frequent changes in mobile network so presencesof outliers and missing values are obvious in the datasetso we applied our clustering approach [30] on the raw datato remove spatial outliers from the data and extract theirlocation information through Google APIs [30] The resultof applied technique is shown in Table 2

Figure 5(a) shows the cells retrieved from the GoogleAPI and Figure 5(b) shows the consolidated data withoutoutliers while Figure 5(c) shows the complete effect of spatialclustering over the data

52 Observation of Semantic Tags in Dataset After theextraction of outlier free data points we applied our spatialclustering techniques [30 31] on the clean data to cluster themin terms of stay points which may or may not have semantictags in them As Table 2 shows that each semantic locationobserved can carry multiple cell IDs so these cells can beclustered together to define a common location As most ofthe time stays at known places are usually tagged so thesesemantic locations are of more importance than untaggedstay points [31] as shown in Table 3

53 Cell Oscillation Resolution and Discovery of Stay PointsAs defined previously cell oscillation is a phenomenonobvious in GSM dataset where user is assigned multiple

8 The Scientific World Journal

Latitude (deg)

Longitude (deg)

(a)

Latitude (deg)

Longitude (deg)

(b)

(c)

Figure 5

Table 2 Locations retrieved with respect to user mobility

of cells Total number of unique cells against subject X 1744 100 of cellrsquos location retrieved through GoogleAPI 680 39

of cellrsquos location retrieved through semantictagged algorithm 88 3

Total of cellrsquos location retrieved 768 42

Table 3 Semantic location with of representative cells

Semantic location Number of cells incidenceMgh 5Office 4Airport 4Home 4Gregrsquos apt 4Grand parents 3Google 3Redhat 3Chicago OrsquoHare 2Topo hub 2

cell IDs while being stationary for load balancing Table 2shows that semantic location can be identified with multiplecell IDs which make it clear that beside the semantic taginformation we are bound to use spatiotemporal analysiswith location for pattern building and finding similar usersas this semantic tag is limited in the dataWe have applied ourspatial clustering technique [31] on the dataset for the removalof cell oscillation and using the overlapping area analysis weidentified stay pointswhich are not semantically tagged by the

Table 4 Overlapping of locations

Cells (LAC Cell ID) of locations represented11986230

711986240

611986223

611986214

511986256

411986244

411986225

311986247

211986227

211986239

2

user otherwise And as per our previous assumption GSMcells are distributed in bubble form that they overlap witheach other this assumption is evident from resultant Table 4

As result we retrieved all the locations as a clustered cellswhich are representative of user mobility history rather theraw cells In Figure 6 of the tagged locations are plotted overgeographical map which shows the vicinity of tagged placesfurther Figure 6 shows our assumption is correct as ChicagoOrsquoHare airport is tagged twice with overlapped cells by theuser which shows overlapping of cells and oscillation oversame place

54 Discovery of Mobility Patterns After the extractionof mobility history in form of clustered cells where eachcluster represents the stay point with or without user definedsemantic tag we applied time-stamping methodology onit defined in Section 332 On this time-stamped clusteredhistory we applied the mobility pattern extraction techniqueproposed in Section 333 We used the THstay time of 20minutes along with THtransition of 10 minutes which are

The Scientific World Journal 9

119899 2Chicago OrsquoHare

Figure 6 Semantic location plotted over geographical map

0

10

20

30

40

50

60

Pattern length

Log

cove

rage

2 3 4 5 6 7 8 9 100

Figure 7 Mobility pattern length over log

adopted formourwork [31] After extraction of thesemobilitypatterns we implemented the maximal trajectory patterntechnique proposed in Section 333 for the representationof the most frequent mobility pattern We plotted graphin Figure 7 representing the log time coverage of the usermobility with respect to the length of pattern to analyze themost frequent pattern length

The figure clearly shows that most of the patterns whichcover the log of user mobility are of length 3 after whichthe coverage is declined tremendously This result givesinformation about the transition phase length set for LnCSSand trend of user over time which we will be using laterduring similarity measure

As the explored patterns carry all the basic informationlike location semantic tag and time stamping we can plotthe user trend quite easily As shown in Figure 8 we plottedthe user location visit history against log history period andwe have selected top locations only due to of space limitationWe selected only two-month user history for this plottingThe figure clearly shows that user spent most of his time atknown places and rarely explored new places and this alsoshows one important fact about mobility data that user spentmost of his time on the locationwhich he tagged semanticallythat is home MGH office and so forth This satisfies ourassumption in Section 3 about trend of the use mobility Thistrend also shows some facts like that the user visits somelocations like home everyday regardless of weekdays andweekend and some locations like office every weekday onlywhereas some locations like Gregrsquos home and grandparentsuser usually visit once in a week and on weekends

1 2 3 4 5 6 7 8 9 10 11 120

10

20

30

40

50

60Home

Office

Grand parents

MghGoogle

Mob

ility

cove

rage

Time spent0

Gregrsquos apartment

Figure 8 Locations visit over log

We plotted usermobility ratio in term of exploration withtime for further analysis in Figure 9 which shows that uservisit average is of 2ndash5 places daily while in exception casesuser visited more than 7 locations at a single day We plottedthe location visiting frequency over a data of 30 consecutivedays

55 Similarity Measure between Users For the calculation ofsimilarity measure we constructed the similarity matrix onthe basis of two main parameters that is semantic patternand spatiotemporal similarity For this we use LnCSS and setthe THlocation area to 10 being a Euclidean distance betweenthe two points of location that belong to two different usersto ensure if they fall in same area or not further we set theTHtimecover as 20 minutes as described in Section 334

After formulating the similarity matrix we plotted sim-ilarity measure between user 119909 and other users along withother user similarity measurement models that is spatialcosine similarity (SCS) and extra-role colocation rate (ERCR)[37] for evaluation Spatial cosine similarity can be defined assimilarity of visitation frequencies of user 119909 and 119910 assignedby cosine of angle between the two vectors with respect tonumber of visit at each location And extra-role colocation isbased on probability of two users 119909 and 119910 to colocate in thesame hour at night or on weekends This relationship servesas a great indicator to determine the friendship between twousers

We plotted the similarity measure between users usingour proposed methodology named HA (Hybrid approach)SCS and ERCR using the two metrics that is mean averageprecision (MAP) and normalized discounted cumulative gain(NDCG)

In Figure 10 we plotted the user similarity based onMAPmatrices where MAP can be defined as

MAP = 1

119873

119873

sum119894=1

sum119870

119879=1(119875119894(119903) times rel

119894(119903))

10038161003816100381610038161198771198941003816100381610038161003816

(12)

where119873 is number of test users and |119877119894| indicates number of

similar users for test user 119909 119903 donates cut-off rank and 119875119894(119903)

represents precision of 119880119894over binary function rel

119894(119903)

Weplotted the similaritymeasure betweenusers using theNDCG matrices as defined by [38] in Figure 11

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Unsupervised User Similarity Mining in ...

4 The Scientific World Journal

information to solve the missing values issue and the outlierresolution along with precise clustering of cells for futurelocation extraction and mobility profiling We proposed andused the clustering methodology where LAC and cell IDprovide all set of information for mobility profiling basicbuilding using the open source Google location API reverseengineering And formissing values we utilized the semanticinformation provided by the user in the dataset for preciselocation extraction and mobility profile building

32 Cell Oscillation Resolution and Role of Semantic Informa-tion As stated earlier in the problem statement cell oscil-lation is a common phenomenon in GSM network wherea user can be assigned multiple cell IDs while static whichleads to a fake mobility during mobility profile building dueto change in cell IDs over time We presented a methodologyin our work [31] for cell oscillation resolution using thesemantic tagging information and introducing the time stayphenomenon where overlapping cells represent the locationof interest rather than mobility In our mentioned work wenot only resolved the oscillation phenomenon successfullybut also we clustered the cells on the basis of semanticinformation provided by user for example home lab airportclub and so forth and overlapping time stay area so thatlater during mobility profiling this clustered informationcan be utilized for stay location identification We usedthe overlapped location information for identification ofsignificant places which can later be utilized for mobilityprofiling

33 Mobility Profile Building Mobility profile building is oneof the trivial tasks in any of the location base service (LBS)where the mobility profiling is done through extraction ofsignificant placesThe significant places can be defined as theplaces which are important for a user over geocoordinatesand most of the time the user stays on these locations Userusually semantically tags these locations or spends significantamount of time on these location or visits them frequentlyover the period of observation So it is clear that a significantplace can be a place where user spends most of time (homework etc) or user visits it frequently over a period of time(supermarket club) or user spends time significantly withoutfrequent visit (conference seminar travel) So it makes thediscovery of significant places valuable for LBS where userbehavior is the main source of stimulation As describedin previous sections the cell ID is the only viable solutionfor user profile building where the coverage is wide low inenergy consumption no data plan is required and availablein all kinds ofmobile phones so usermovement is available indata as set of different cell IDs are distributed over a networkBy observing the precise transition over these cells and usingthe effective technique this information can be used formobility profile building But construction of mobility profilebuilding is a complex process as it must deal with some ofthe following questions like when user moves over thousandsof cell IDs during period of time many of them cannot beavailable to extract their location information through cellID databases for example Google Open Cell ID so these

cellswill lead tomisinterpreted profiling there are dark placeswhere user lost the connection or user switched off the cellwhich seems to be significant places due to time spent bythe user and there is a lot of cell oscillations during usermovement which seems to be mobility even when user isstatic We have divided it into three parts (1) clustering of thecells for path finding and their fingerprinting (2) grouping ofsimilar patterns for projection of trends (3) profile buildingand (4) similarity measure between different users throughsharing property as adopted from [32] The whole process ofmobility profiling can be elaborated as follows

Let 119879 = 1199051 1199052 119905

119899 is set of towerslocations visited

by the user during the mobility we are interested in iden-tification of pattern group PG = pg

1 pg2 pg

119899 which

satisfies the THgroupcount which is group threshold wherePG is extracted from frequent mobility user history (FUH)defined over transition threshold THtransition location areathreshold THloctaion area and semantic tag information SEMFUH is retrieved through the visiting history VH retrievedthrough oscillation removal method [31] and time-stampingmethods

Algorithm 1 Frequent pattern discovery from userrsquos mobilityhistory

(1) Select complete list of user mobility in terms of celltowers visited 119879

(2) Apply cell oscillation technique on it and retrievedclustered cells 119862

(3) Apply proper time stamping on the clustered cellsretrieved after oscillation removal as visit history ofuser VH with complete spatiotemporal information

(4) Identify the frequent user mobility patterns FUHdefined over THloctain area and THtransition Group theidentified patterns G on the basis of their spatiotem-poral nature and semantic tags information usingprefix-span algorithm

(5) Repeat step (i) to (ii) until only supported group ofpatterns PG is identified

(i) if size of group 119892 satisfies the group supportthreshold THGorupcount

(ii) assign the group to supported group pattern set

Let the set of users119880 = 1199061 1199062 119906

119899 with the complete

information of pattern groups PGs we are interested tobuild an M[119894][119895] on the basis of each set of patterns 119901

119894and

119902119895 where two patterns belong to two potentially similar

users The similarity between two patters is calculated overLnCSS (longest common subsequence) and CoL (colocationprobability measure)

Algorithm 2 Userrsquos similarity measurement

(1) Select the users(2) Select the pattern group of two users(3) Repeat step (i) to (ii) for each 119901 and 119902 belongs to two

different users

The Scientific World Journal 5

(i) calculate the LnCSS and CoL property of twogiven patterns

(ii) if two patterns satisfy the minimum support ofsimilarity than compute M(119901 119902) as 1 otherwisecompute it as 0

(4) Calculate the userrsquos similarity based on similaritymatrix

331 Clustering of Subsequences in Mobility History Data Asmentioned earlier we retrieved the clustered cells by imple-menting the cell oscillation algorithm where it is guaranteedthat the data has no oscillation problem We adopted naıveclustering approach which clusters the cells on the basis ofcircular subsequences The baseline algorithm works on thefact that common cell IDs can be merged together to makea circular subsequence which can represent user behavior[31] We implemented two proposed algorithms successfully[31] for the resolution of cell oscillation and retrieval ofcommon clustered cells to represent the significant placefor the user using the overlapped area however details ofthe algorithm are given in our previous work It is alsoof due importance that as mentioned earlier the proposedalgorithm also resolved the problem of missing values whichcan occur due to network change or nonavailability of cell-ID information in open cell-ID databases For examplethe mobility sequence 119862

1 1198622 1198623 1198624 1198625has 119862

3and 119862

4no

retrievable through cell-ID database which obliviously willlead to a mislead mobility of the user our algorithm willreplace these two cells using the majority voting mechanismwith the most likely cell in the cluster so that mobilityremains traceable with the most likely authentication Theretrieved clusters carry the information about significantplaces as a semantic tag like home lab club and so forth asthe derived algorithm can infer the significant place based ontime spent over an overlapped area between different cellswe tagged such location with a convention of ldquostay pointrdquoAfter the retrieval of the clustered cells we implemented thefingerprinting on them to represent the percentage of timespent by the user on particular cells and as a whole on onecluster that is stay points Each cluster is being representedas a sequence of cell IDs that is

Cluster1[1198621 1198622 11986215]

Cluster2[1198623 1198625 1198627 119862

19]

Cluster119899[1198627 11986218 119862

24]

(1)

Each cluster represents the stay point which may or may nothave user defined semantic tag associated with it

332 Time Stamping of Clustered Cells We bound the timepercentage with each of the cells in the cluster alongwith totaltime spends on that particular cluster with time stamp Thetime stamping is the most important part of fingerprinting

where user behavior can be determined easily over timedateslicing The structure of fingerprint is as

[11986211198791 11986221198792 119862

119899119879119899

TotaltimeTime stamp Semantic tag] (2)

where each cluster represents the stay point of the user alongwith percentage time on it as 119879 date and semantic taginformation if available After this process we have a set ofall the stay points or significant places user had visited overtime so we converted the given history of user distributedover cell IDs to stay points in form of clusters as follows

VH = Cluster1Cluster

2 Cluster

119899 (3)

where VH represents the complete mobility history of theuser in terms of stay points in formof spatiotemporal clustersthat is Cluster

1Cluster

2 Cluster

119899 This information can

be used to identify the Trajectory patterns against a particularuser to build up the mobility profile by grouping themdistributed over time After the extraction of patterns weneed to group the patterns and discard infrequent patternthrough usage of minimum support count otherwise theresultant pattern will be in millions for such a huge amountof data which will be a burden in the memory So weare interested in only frequent pattern through usage ofspatiotemporal information and support count So we candefine group G as subset of VH such that it satisfies thefollowing two rules

(i) 1198751 1198752 119875

119899isin 119892 if 119892 sdot Distance le THlocation area and

119892 sdot Time le 119892 sdot THstay time

(ii) |119892| ge THgroup support

333 Extraction of Mobility Patterns from Clustered StayPoints After the retrieval of all the stay points in the givenspatiotemporal history of mobile user we can transform itinto trajectory pattern which the user follows over time Wecan define a pattern 119875 as a trip over two or more consecutivestay points SPs in ordered set of VHor trip between stay pointSP and firstlast point of user mobility history VH We candefine the pattern 119875 as

119875 = SP119898 SP

119899 where 0 lt 119898 le 119899

119875119894(VH119894 SP

119898 cap VH

119894+1SP119899 ) Or

119875 = SP1 SP

119898 where 0 lt 119898 le 119900

119875119894(VH119894SP119898 ) Or

119875 = SP119899 SP

119900 where 0 lt 119899 le 119900

119875119894(VH119894 SP

119899)

(4)

Further for the extraction of the true patterns from usermobility we introduced the transition time threshold toensures the continuity of the trip where this transitionthreshold ensure the smooth transition of user from onestay point to another and differentiates one visiting pattern

6 The Scientific World Journal

from another For the extraction of mobility pattern from thesemantically arranged stay points we used prefix-span algo-rithm [33] on user mobility history The result of algorithmgave semantic pattern of the user over time for example⟨Home LabGoogle Grand parents⟩ along with all ofits subsequences As the result of the algorithm gave redun-dant patterns we adopted maximal trajectory pattern [34]technique for the representation of user mobility pattern Sothe resultant extracted patterns are true representatives offrequent mobility of user (FUM) called over time as

FUM = Ptn1 ⟨Home Stay point OfficeGoogle

Stay point ⟩

Ptn2 ⟨Home Super marketTopo hub

Grand parentsClub⟩

Ptn119899 ⟨OfficeBank Stay PointPark⟩

(5)

334 Projection of Similarity between Different UsersThrough Pattern Matching After the successful extractionof user pattern we determine if two patterns aresimilar through the longest common subsequence(LnCSS) measure For example if there is pattern119875 = ⟨Home Stay point OfficeGoogle Stay point⟩and 119876 = ⟨Home OfficeBank Stay point⟩ their LnCSSwill be = ⟨Home Office Stay point⟩ which we candefine as

Ratio ( LnCSS (119901 119902) 119901)

=sum|119901|

119894=1sum|LnCSS(119901119902)|119895=1

M (119901119894 LnCSS

119895)

10038161003816100381610038161199011003816100381610038161003816

M (119901119894 LnCSS

119895) =

119901119894isin LnCSS

119895

10038161003816100381610038161199011003816100381610038161003816

if LnCSS119895is matching to 119901

119894otherwise = 0

(6)

But the longest common subsequence (LnCSS)measurementis not only based on merely semantic tag based as same placeis tagged with different names against different users forexample Media Lab is tagged as Lab Media Lab Work laband so forth so we introduced the time threshold for thetransition that is THtrans between two stay points (as everysemantically tagged location consists of cell towers) furtherwe introduced the spatial property threshold using locationTHlocation area area so it together joins the spatiotemporalproperty of user mobility

So we can define the similarity between two users giventwo frequent user mobility (FUH) FUH

1= Pattern

11

Pattern12 Pattern

1119899 and FUH

2= Pattern

21

Pattern22 Pattern

2119899 and defined time threshold

THtime cover and location area threshold THlocation area wecan define similarity between users as

Similarity (FUH1 FUH

2THtime coverTHlocation area)

= Location area (Ptn11Ptn2119894)

+ Distance (Ptn1119899Ptn2119895le THlocation area)

CoL (Ptn11Ptn2119894) le 1 where

119894 119895 belongs to 119873 | 0 lt 119894 le 119895 le 119898

(7)

where CoL is colocation property of that find is user 119909 and 119910visit location 119897 at the same time to exploit their spatiotemporalproperty beside semantic tags as mentioned semantic tagscannot assure the true accuracy of similarity measure dueto different conventions used by different users for the sameplace Colocation rate idea is adopted from their work [35]where the most likely location of the user 119909 is defined as

119871 (119909) = arg119897isinLoc119875 (119909 119897) (8)

where 119871 is the most likely location of user 119909 and Locrepresents the cell towers or locations set user traversed overtime during mobility while 119875 is the probability of the user 119909to visit location 119897 and can be defined as

119875 (119909 119897) =

119899(119909)

sum119894=1

120575 (119897 119871119894(119909))

119899 (119909) (9)

where 120575(119909 119910) = 1 if 119909 = 119910 otherwise it is 0 Furtherdistance between user 119909 and 119910 can be defined as 119889(119909 119910) =dist(119871(119909) 119871(119910)) to represent physical distance between theirfrequent locations So on the basis of it colocation rate can bedefined as follows

CoL =

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816) 120575 (119871

119894(119909) minus 119871

119895(119910))

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816)

(10)

As (10) shows that colocation rate counts both time andlocation simultaneously for two users that is 119909 and 119910 So itbinds both spatiotemporal trends together normalized overthe number of times the users visited the location whereΘ isHeaviside function and Δ119879 is equal to 119879stay time

On the basis of the above similarity measuring valuewe can calculate the similarity between the given user pairsthrough comparing their patterns over three basic unitssemantic tag spatial value and temporal value For this weconstruct a similarity matrix which gives a clear picture ofsimilarity measure against every frequent pattern pair oftwo different users After calculating the similarity betweenpatterns of two users these values are used to calculatethe overall similarity measure between two users to inferif they are related to each other or not For examplewe have two userrsquos FUH

1= 119875

11 11987512 and FUH

2=

11987521 11987522 to find their similarity with a given pair of

The Scientific World Journal 7

Office

School

Bank

Stay point

User 1

User 211987521

11987511

11987512

11987522

Figure 4 User mobility pattern

patterns where 11987511= ⟨Home⟩⟨BankOffice⟩⟨Stay point⟩

and 11987512

= ⟨Home⟩ ⟨School⟩ ⟨Office⟩ while 11987521

=

⟨Home⟩⟨BankOffice⟩⟨Stay point⟩ and 11987522

= ⟨Home⟩⟨Office⟩ Figure 4 shows the spatial incidence of thesepatterns We can construct a similarity matrix through it asshown in Table 1

So the similarity measure can be concluded as = Sumof all extracted similarity weightsTotal number of patternswhere the high value represents the most similarity amongtwo users while the lower value represents the dissimilaritybetween the two users

So on the basis of the above similarity measure we candefine the profile sharing measurement of the given users as

Sharing value (FUH1 FUH

2THlocation areaTHtransition)

=

1003816100381610038161003816119901 isin FUH1 | exist119902 isin FUH2 sdot similarity (119901 119902CoL)10038161003816100381610038161003816100381610038161003816FUH1

1003816100381610038161003816

(11)

4 Dataset

As mentioned earlier the selected dataset is taken frommining project group of MIT Media labs [36] This datasetis collected from 100 people who are students for a period of9 month with total activity span of 350K hoursThe collecteddata is logged on Symbian mobile that is Nokia 6600 whichhas no GPS in it so all of the information related to userlocation is identified by cell ID only While in this datasetcell global identity header has partial information where onlyLAC and cell ID are available for location tracking of the userHowever users have provided semantic tagging informationto the most important locations over mobility history indata logs But overall this semantic tag information variesa lot in terms of annotations and usage from user to userAnd among these users only 94 gave their full informationregarding similaritymeasure through online survey for socialinteractions From these 94 users 7 of the users do not havecell logs and 10 have no cell annotation logs So for ouranalysis there are only 77 users available for analysis andevaluation

Table 1 Similarity matrix

11987511

11987512

11987521

11987522

11987511

mdash mdash 1 011987512

mdash mdash 0 111987521

1 0 mdash mdash11987522

0 1 mdash mdash

5 Experiments and Results

As mentioned we have chosen the reality mining dataset forour experimental purposes The results are as follows

51 Retrieval of Location Information and Removal of OutliersUsing Raw Cell ID As the data is quite old in its nature andthere are frequent changes in mobile network so presencesof outliers and missing values are obvious in the datasetso we applied our clustering approach [30] on the raw datato remove spatial outliers from the data and extract theirlocation information through Google APIs [30] The resultof applied technique is shown in Table 2

Figure 5(a) shows the cells retrieved from the GoogleAPI and Figure 5(b) shows the consolidated data withoutoutliers while Figure 5(c) shows the complete effect of spatialclustering over the data

52 Observation of Semantic Tags in Dataset After theextraction of outlier free data points we applied our spatialclustering techniques [30 31] on the clean data to cluster themin terms of stay points which may or may not have semantictags in them As Table 2 shows that each semantic locationobserved can carry multiple cell IDs so these cells can beclustered together to define a common location As most ofthe time stays at known places are usually tagged so thesesemantic locations are of more importance than untaggedstay points [31] as shown in Table 3

53 Cell Oscillation Resolution and Discovery of Stay PointsAs defined previously cell oscillation is a phenomenonobvious in GSM dataset where user is assigned multiple

8 The Scientific World Journal

Latitude (deg)

Longitude (deg)

(a)

Latitude (deg)

Longitude (deg)

(b)

(c)

Figure 5

Table 2 Locations retrieved with respect to user mobility

of cells Total number of unique cells against subject X 1744 100 of cellrsquos location retrieved through GoogleAPI 680 39

of cellrsquos location retrieved through semantictagged algorithm 88 3

Total of cellrsquos location retrieved 768 42

Table 3 Semantic location with of representative cells

Semantic location Number of cells incidenceMgh 5Office 4Airport 4Home 4Gregrsquos apt 4Grand parents 3Google 3Redhat 3Chicago OrsquoHare 2Topo hub 2

cell IDs while being stationary for load balancing Table 2shows that semantic location can be identified with multiplecell IDs which make it clear that beside the semantic taginformation we are bound to use spatiotemporal analysiswith location for pattern building and finding similar usersas this semantic tag is limited in the dataWe have applied ourspatial clustering technique [31] on the dataset for the removalof cell oscillation and using the overlapping area analysis weidentified stay pointswhich are not semantically tagged by the

Table 4 Overlapping of locations

Cells (LAC Cell ID) of locations represented11986230

711986240

611986223

611986214

511986256

411986244

411986225

311986247

211986227

211986239

2

user otherwise And as per our previous assumption GSMcells are distributed in bubble form that they overlap witheach other this assumption is evident from resultant Table 4

As result we retrieved all the locations as a clustered cellswhich are representative of user mobility history rather theraw cells In Figure 6 of the tagged locations are plotted overgeographical map which shows the vicinity of tagged placesfurther Figure 6 shows our assumption is correct as ChicagoOrsquoHare airport is tagged twice with overlapped cells by theuser which shows overlapping of cells and oscillation oversame place

54 Discovery of Mobility Patterns After the extractionof mobility history in form of clustered cells where eachcluster represents the stay point with or without user definedsemantic tag we applied time-stamping methodology onit defined in Section 332 On this time-stamped clusteredhistory we applied the mobility pattern extraction techniqueproposed in Section 333 We used the THstay time of 20minutes along with THtransition of 10 minutes which are

The Scientific World Journal 9

119899 2Chicago OrsquoHare

Figure 6 Semantic location plotted over geographical map

0

10

20

30

40

50

60

Pattern length

Log

cove

rage

2 3 4 5 6 7 8 9 100

Figure 7 Mobility pattern length over log

adopted formourwork [31] After extraction of thesemobilitypatterns we implemented the maximal trajectory patterntechnique proposed in Section 333 for the representationof the most frequent mobility pattern We plotted graphin Figure 7 representing the log time coverage of the usermobility with respect to the length of pattern to analyze themost frequent pattern length

The figure clearly shows that most of the patterns whichcover the log of user mobility are of length 3 after whichthe coverage is declined tremendously This result givesinformation about the transition phase length set for LnCSSand trend of user over time which we will be using laterduring similarity measure

As the explored patterns carry all the basic informationlike location semantic tag and time stamping we can plotthe user trend quite easily As shown in Figure 8 we plottedthe user location visit history against log history period andwe have selected top locations only due to of space limitationWe selected only two-month user history for this plottingThe figure clearly shows that user spent most of his time atknown places and rarely explored new places and this alsoshows one important fact about mobility data that user spentmost of his time on the locationwhich he tagged semanticallythat is home MGH office and so forth This satisfies ourassumption in Section 3 about trend of the use mobility Thistrend also shows some facts like that the user visits somelocations like home everyday regardless of weekdays andweekend and some locations like office every weekday onlywhereas some locations like Gregrsquos home and grandparentsuser usually visit once in a week and on weekends

1 2 3 4 5 6 7 8 9 10 11 120

10

20

30

40

50

60Home

Office

Grand parents

MghGoogle

Mob

ility

cove

rage

Time spent0

Gregrsquos apartment

Figure 8 Locations visit over log

We plotted usermobility ratio in term of exploration withtime for further analysis in Figure 9 which shows that uservisit average is of 2ndash5 places daily while in exception casesuser visited more than 7 locations at a single day We plottedthe location visiting frequency over a data of 30 consecutivedays

55 Similarity Measure between Users For the calculation ofsimilarity measure we constructed the similarity matrix onthe basis of two main parameters that is semantic patternand spatiotemporal similarity For this we use LnCSS and setthe THlocation area to 10 being a Euclidean distance betweenthe two points of location that belong to two different usersto ensure if they fall in same area or not further we set theTHtimecover as 20 minutes as described in Section 334

After formulating the similarity matrix we plotted sim-ilarity measure between user 119909 and other users along withother user similarity measurement models that is spatialcosine similarity (SCS) and extra-role colocation rate (ERCR)[37] for evaluation Spatial cosine similarity can be defined assimilarity of visitation frequencies of user 119909 and 119910 assignedby cosine of angle between the two vectors with respect tonumber of visit at each location And extra-role colocation isbased on probability of two users 119909 and 119910 to colocate in thesame hour at night or on weekends This relationship servesas a great indicator to determine the friendship between twousers

We plotted the similarity measure between users usingour proposed methodology named HA (Hybrid approach)SCS and ERCR using the two metrics that is mean averageprecision (MAP) and normalized discounted cumulative gain(NDCG)

In Figure 10 we plotted the user similarity based onMAPmatrices where MAP can be defined as

MAP = 1

119873

119873

sum119894=1

sum119870

119879=1(119875119894(119903) times rel

119894(119903))

10038161003816100381610038161198771198941003816100381610038161003816

(12)

where119873 is number of test users and |119877119894| indicates number of

similar users for test user 119909 119903 donates cut-off rank and 119875119894(119903)

represents precision of 119880119894over binary function rel

119894(119903)

Weplotted the similaritymeasure betweenusers using theNDCG matrices as defined by [38] in Figure 11

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Unsupervised User Similarity Mining in ...

The Scientific World Journal 5

(i) calculate the LnCSS and CoL property of twogiven patterns

(ii) if two patterns satisfy the minimum support ofsimilarity than compute M(119901 119902) as 1 otherwisecompute it as 0

(4) Calculate the userrsquos similarity based on similaritymatrix

331 Clustering of Subsequences in Mobility History Data Asmentioned earlier we retrieved the clustered cells by imple-menting the cell oscillation algorithm where it is guaranteedthat the data has no oscillation problem We adopted naıveclustering approach which clusters the cells on the basis ofcircular subsequences The baseline algorithm works on thefact that common cell IDs can be merged together to makea circular subsequence which can represent user behavior[31] We implemented two proposed algorithms successfully[31] for the resolution of cell oscillation and retrieval ofcommon clustered cells to represent the significant placefor the user using the overlapped area however details ofthe algorithm are given in our previous work It is alsoof due importance that as mentioned earlier the proposedalgorithm also resolved the problem of missing values whichcan occur due to network change or nonavailability of cell-ID information in open cell-ID databases For examplethe mobility sequence 119862

1 1198622 1198623 1198624 1198625has 119862

3and 119862

4no

retrievable through cell-ID database which obliviously willlead to a mislead mobility of the user our algorithm willreplace these two cells using the majority voting mechanismwith the most likely cell in the cluster so that mobilityremains traceable with the most likely authentication Theretrieved clusters carry the information about significantplaces as a semantic tag like home lab club and so forth asthe derived algorithm can infer the significant place based ontime spent over an overlapped area between different cellswe tagged such location with a convention of ldquostay pointrdquoAfter the retrieval of the clustered cells we implemented thefingerprinting on them to represent the percentage of timespent by the user on particular cells and as a whole on onecluster that is stay points Each cluster is being representedas a sequence of cell IDs that is

Cluster1[1198621 1198622 11986215]

Cluster2[1198623 1198625 1198627 119862

19]

Cluster119899[1198627 11986218 119862

24]

(1)

Each cluster represents the stay point which may or may nothave user defined semantic tag associated with it

332 Time Stamping of Clustered Cells We bound the timepercentage with each of the cells in the cluster alongwith totaltime spends on that particular cluster with time stamp Thetime stamping is the most important part of fingerprinting

where user behavior can be determined easily over timedateslicing The structure of fingerprint is as

[11986211198791 11986221198792 119862

119899119879119899

TotaltimeTime stamp Semantic tag] (2)

where each cluster represents the stay point of the user alongwith percentage time on it as 119879 date and semantic taginformation if available After this process we have a set ofall the stay points or significant places user had visited overtime so we converted the given history of user distributedover cell IDs to stay points in form of clusters as follows

VH = Cluster1Cluster

2 Cluster

119899 (3)

where VH represents the complete mobility history of theuser in terms of stay points in formof spatiotemporal clustersthat is Cluster

1Cluster

2 Cluster

119899 This information can

be used to identify the Trajectory patterns against a particularuser to build up the mobility profile by grouping themdistributed over time After the extraction of patterns weneed to group the patterns and discard infrequent patternthrough usage of minimum support count otherwise theresultant pattern will be in millions for such a huge amountof data which will be a burden in the memory So weare interested in only frequent pattern through usage ofspatiotemporal information and support count So we candefine group G as subset of VH such that it satisfies thefollowing two rules

(i) 1198751 1198752 119875

119899isin 119892 if 119892 sdot Distance le THlocation area and

119892 sdot Time le 119892 sdot THstay time

(ii) |119892| ge THgroup support

333 Extraction of Mobility Patterns from Clustered StayPoints After the retrieval of all the stay points in the givenspatiotemporal history of mobile user we can transform itinto trajectory pattern which the user follows over time Wecan define a pattern 119875 as a trip over two or more consecutivestay points SPs in ordered set of VHor trip between stay pointSP and firstlast point of user mobility history VH We candefine the pattern 119875 as

119875 = SP119898 SP

119899 where 0 lt 119898 le 119899

119875119894(VH119894 SP

119898 cap VH

119894+1SP119899 ) Or

119875 = SP1 SP

119898 where 0 lt 119898 le 119900

119875119894(VH119894SP119898 ) Or

119875 = SP119899 SP

119900 where 0 lt 119899 le 119900

119875119894(VH119894 SP

119899)

(4)

Further for the extraction of the true patterns from usermobility we introduced the transition time threshold toensures the continuity of the trip where this transitionthreshold ensure the smooth transition of user from onestay point to another and differentiates one visiting pattern

6 The Scientific World Journal

from another For the extraction of mobility pattern from thesemantically arranged stay points we used prefix-span algo-rithm [33] on user mobility history The result of algorithmgave semantic pattern of the user over time for example⟨Home LabGoogle Grand parents⟩ along with all ofits subsequences As the result of the algorithm gave redun-dant patterns we adopted maximal trajectory pattern [34]technique for the representation of user mobility pattern Sothe resultant extracted patterns are true representatives offrequent mobility of user (FUM) called over time as

FUM = Ptn1 ⟨Home Stay point OfficeGoogle

Stay point ⟩

Ptn2 ⟨Home Super marketTopo hub

Grand parentsClub⟩

Ptn119899 ⟨OfficeBank Stay PointPark⟩

(5)

334 Projection of Similarity between Different UsersThrough Pattern Matching After the successful extractionof user pattern we determine if two patterns aresimilar through the longest common subsequence(LnCSS) measure For example if there is pattern119875 = ⟨Home Stay point OfficeGoogle Stay point⟩and 119876 = ⟨Home OfficeBank Stay point⟩ their LnCSSwill be = ⟨Home Office Stay point⟩ which we candefine as

Ratio ( LnCSS (119901 119902) 119901)

=sum|119901|

119894=1sum|LnCSS(119901119902)|119895=1

M (119901119894 LnCSS

119895)

10038161003816100381610038161199011003816100381610038161003816

M (119901119894 LnCSS

119895) =

119901119894isin LnCSS

119895

10038161003816100381610038161199011003816100381610038161003816

if LnCSS119895is matching to 119901

119894otherwise = 0

(6)

But the longest common subsequence (LnCSS)measurementis not only based on merely semantic tag based as same placeis tagged with different names against different users forexample Media Lab is tagged as Lab Media Lab Work laband so forth so we introduced the time threshold for thetransition that is THtrans between two stay points (as everysemantically tagged location consists of cell towers) furtherwe introduced the spatial property threshold using locationTHlocation area area so it together joins the spatiotemporalproperty of user mobility

So we can define the similarity between two users giventwo frequent user mobility (FUH) FUH

1= Pattern

11

Pattern12 Pattern

1119899 and FUH

2= Pattern

21

Pattern22 Pattern

2119899 and defined time threshold

THtime cover and location area threshold THlocation area wecan define similarity between users as

Similarity (FUH1 FUH

2THtime coverTHlocation area)

= Location area (Ptn11Ptn2119894)

+ Distance (Ptn1119899Ptn2119895le THlocation area)

CoL (Ptn11Ptn2119894) le 1 where

119894 119895 belongs to 119873 | 0 lt 119894 le 119895 le 119898

(7)

where CoL is colocation property of that find is user 119909 and 119910visit location 119897 at the same time to exploit their spatiotemporalproperty beside semantic tags as mentioned semantic tagscannot assure the true accuracy of similarity measure dueto different conventions used by different users for the sameplace Colocation rate idea is adopted from their work [35]where the most likely location of the user 119909 is defined as

119871 (119909) = arg119897isinLoc119875 (119909 119897) (8)

where 119871 is the most likely location of user 119909 and Locrepresents the cell towers or locations set user traversed overtime during mobility while 119875 is the probability of the user 119909to visit location 119897 and can be defined as

119875 (119909 119897) =

119899(119909)

sum119894=1

120575 (119897 119871119894(119909))

119899 (119909) (9)

where 120575(119909 119910) = 1 if 119909 = 119910 otherwise it is 0 Furtherdistance between user 119909 and 119910 can be defined as 119889(119909 119910) =dist(119871(119909) 119871(119910)) to represent physical distance between theirfrequent locations So on the basis of it colocation rate can bedefined as follows

CoL =

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816) 120575 (119871

119894(119909) minus 119871

119895(119910))

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816)

(10)

As (10) shows that colocation rate counts both time andlocation simultaneously for two users that is 119909 and 119910 So itbinds both spatiotemporal trends together normalized overthe number of times the users visited the location whereΘ isHeaviside function and Δ119879 is equal to 119879stay time

On the basis of the above similarity measuring valuewe can calculate the similarity between the given user pairsthrough comparing their patterns over three basic unitssemantic tag spatial value and temporal value For this weconstruct a similarity matrix which gives a clear picture ofsimilarity measure against every frequent pattern pair oftwo different users After calculating the similarity betweenpatterns of two users these values are used to calculatethe overall similarity measure between two users to inferif they are related to each other or not For examplewe have two userrsquos FUH

1= 119875

11 11987512 and FUH

2=

11987521 11987522 to find their similarity with a given pair of

The Scientific World Journal 7

Office

School

Bank

Stay point

User 1

User 211987521

11987511

11987512

11987522

Figure 4 User mobility pattern

patterns where 11987511= ⟨Home⟩⟨BankOffice⟩⟨Stay point⟩

and 11987512

= ⟨Home⟩ ⟨School⟩ ⟨Office⟩ while 11987521

=

⟨Home⟩⟨BankOffice⟩⟨Stay point⟩ and 11987522

= ⟨Home⟩⟨Office⟩ Figure 4 shows the spatial incidence of thesepatterns We can construct a similarity matrix through it asshown in Table 1

So the similarity measure can be concluded as = Sumof all extracted similarity weightsTotal number of patternswhere the high value represents the most similarity amongtwo users while the lower value represents the dissimilaritybetween the two users

So on the basis of the above similarity measure we candefine the profile sharing measurement of the given users as

Sharing value (FUH1 FUH

2THlocation areaTHtransition)

=

1003816100381610038161003816119901 isin FUH1 | exist119902 isin FUH2 sdot similarity (119901 119902CoL)10038161003816100381610038161003816100381610038161003816FUH1

1003816100381610038161003816

(11)

4 Dataset

As mentioned earlier the selected dataset is taken frommining project group of MIT Media labs [36] This datasetis collected from 100 people who are students for a period of9 month with total activity span of 350K hoursThe collecteddata is logged on Symbian mobile that is Nokia 6600 whichhas no GPS in it so all of the information related to userlocation is identified by cell ID only While in this datasetcell global identity header has partial information where onlyLAC and cell ID are available for location tracking of the userHowever users have provided semantic tagging informationto the most important locations over mobility history indata logs But overall this semantic tag information variesa lot in terms of annotations and usage from user to userAnd among these users only 94 gave their full informationregarding similaritymeasure through online survey for socialinteractions From these 94 users 7 of the users do not havecell logs and 10 have no cell annotation logs So for ouranalysis there are only 77 users available for analysis andevaluation

Table 1 Similarity matrix

11987511

11987512

11987521

11987522

11987511

mdash mdash 1 011987512

mdash mdash 0 111987521

1 0 mdash mdash11987522

0 1 mdash mdash

5 Experiments and Results

As mentioned we have chosen the reality mining dataset forour experimental purposes The results are as follows

51 Retrieval of Location Information and Removal of OutliersUsing Raw Cell ID As the data is quite old in its nature andthere are frequent changes in mobile network so presencesof outliers and missing values are obvious in the datasetso we applied our clustering approach [30] on the raw datato remove spatial outliers from the data and extract theirlocation information through Google APIs [30] The resultof applied technique is shown in Table 2

Figure 5(a) shows the cells retrieved from the GoogleAPI and Figure 5(b) shows the consolidated data withoutoutliers while Figure 5(c) shows the complete effect of spatialclustering over the data

52 Observation of Semantic Tags in Dataset After theextraction of outlier free data points we applied our spatialclustering techniques [30 31] on the clean data to cluster themin terms of stay points which may or may not have semantictags in them As Table 2 shows that each semantic locationobserved can carry multiple cell IDs so these cells can beclustered together to define a common location As most ofthe time stays at known places are usually tagged so thesesemantic locations are of more importance than untaggedstay points [31] as shown in Table 3

53 Cell Oscillation Resolution and Discovery of Stay PointsAs defined previously cell oscillation is a phenomenonobvious in GSM dataset where user is assigned multiple

8 The Scientific World Journal

Latitude (deg)

Longitude (deg)

(a)

Latitude (deg)

Longitude (deg)

(b)

(c)

Figure 5

Table 2 Locations retrieved with respect to user mobility

of cells Total number of unique cells against subject X 1744 100 of cellrsquos location retrieved through GoogleAPI 680 39

of cellrsquos location retrieved through semantictagged algorithm 88 3

Total of cellrsquos location retrieved 768 42

Table 3 Semantic location with of representative cells

Semantic location Number of cells incidenceMgh 5Office 4Airport 4Home 4Gregrsquos apt 4Grand parents 3Google 3Redhat 3Chicago OrsquoHare 2Topo hub 2

cell IDs while being stationary for load balancing Table 2shows that semantic location can be identified with multiplecell IDs which make it clear that beside the semantic taginformation we are bound to use spatiotemporal analysiswith location for pattern building and finding similar usersas this semantic tag is limited in the dataWe have applied ourspatial clustering technique [31] on the dataset for the removalof cell oscillation and using the overlapping area analysis weidentified stay pointswhich are not semantically tagged by the

Table 4 Overlapping of locations

Cells (LAC Cell ID) of locations represented11986230

711986240

611986223

611986214

511986256

411986244

411986225

311986247

211986227

211986239

2

user otherwise And as per our previous assumption GSMcells are distributed in bubble form that they overlap witheach other this assumption is evident from resultant Table 4

As result we retrieved all the locations as a clustered cellswhich are representative of user mobility history rather theraw cells In Figure 6 of the tagged locations are plotted overgeographical map which shows the vicinity of tagged placesfurther Figure 6 shows our assumption is correct as ChicagoOrsquoHare airport is tagged twice with overlapped cells by theuser which shows overlapping of cells and oscillation oversame place

54 Discovery of Mobility Patterns After the extractionof mobility history in form of clustered cells where eachcluster represents the stay point with or without user definedsemantic tag we applied time-stamping methodology onit defined in Section 332 On this time-stamped clusteredhistory we applied the mobility pattern extraction techniqueproposed in Section 333 We used the THstay time of 20minutes along with THtransition of 10 minutes which are

The Scientific World Journal 9

119899 2Chicago OrsquoHare

Figure 6 Semantic location plotted over geographical map

0

10

20

30

40

50

60

Pattern length

Log

cove

rage

2 3 4 5 6 7 8 9 100

Figure 7 Mobility pattern length over log

adopted formourwork [31] After extraction of thesemobilitypatterns we implemented the maximal trajectory patterntechnique proposed in Section 333 for the representationof the most frequent mobility pattern We plotted graphin Figure 7 representing the log time coverage of the usermobility with respect to the length of pattern to analyze themost frequent pattern length

The figure clearly shows that most of the patterns whichcover the log of user mobility are of length 3 after whichthe coverage is declined tremendously This result givesinformation about the transition phase length set for LnCSSand trend of user over time which we will be using laterduring similarity measure

As the explored patterns carry all the basic informationlike location semantic tag and time stamping we can plotthe user trend quite easily As shown in Figure 8 we plottedthe user location visit history against log history period andwe have selected top locations only due to of space limitationWe selected only two-month user history for this plottingThe figure clearly shows that user spent most of his time atknown places and rarely explored new places and this alsoshows one important fact about mobility data that user spentmost of his time on the locationwhich he tagged semanticallythat is home MGH office and so forth This satisfies ourassumption in Section 3 about trend of the use mobility Thistrend also shows some facts like that the user visits somelocations like home everyday regardless of weekdays andweekend and some locations like office every weekday onlywhereas some locations like Gregrsquos home and grandparentsuser usually visit once in a week and on weekends

1 2 3 4 5 6 7 8 9 10 11 120

10

20

30

40

50

60Home

Office

Grand parents

MghGoogle

Mob

ility

cove

rage

Time spent0

Gregrsquos apartment

Figure 8 Locations visit over log

We plotted usermobility ratio in term of exploration withtime for further analysis in Figure 9 which shows that uservisit average is of 2ndash5 places daily while in exception casesuser visited more than 7 locations at a single day We plottedthe location visiting frequency over a data of 30 consecutivedays

55 Similarity Measure between Users For the calculation ofsimilarity measure we constructed the similarity matrix onthe basis of two main parameters that is semantic patternand spatiotemporal similarity For this we use LnCSS and setthe THlocation area to 10 being a Euclidean distance betweenthe two points of location that belong to two different usersto ensure if they fall in same area or not further we set theTHtimecover as 20 minutes as described in Section 334

After formulating the similarity matrix we plotted sim-ilarity measure between user 119909 and other users along withother user similarity measurement models that is spatialcosine similarity (SCS) and extra-role colocation rate (ERCR)[37] for evaluation Spatial cosine similarity can be defined assimilarity of visitation frequencies of user 119909 and 119910 assignedby cosine of angle between the two vectors with respect tonumber of visit at each location And extra-role colocation isbased on probability of two users 119909 and 119910 to colocate in thesame hour at night or on weekends This relationship servesas a great indicator to determine the friendship between twousers

We plotted the similarity measure between users usingour proposed methodology named HA (Hybrid approach)SCS and ERCR using the two metrics that is mean averageprecision (MAP) and normalized discounted cumulative gain(NDCG)

In Figure 10 we plotted the user similarity based onMAPmatrices where MAP can be defined as

MAP = 1

119873

119873

sum119894=1

sum119870

119879=1(119875119894(119903) times rel

119894(119903))

10038161003816100381610038161198771198941003816100381610038161003816

(12)

where119873 is number of test users and |119877119894| indicates number of

similar users for test user 119909 119903 donates cut-off rank and 119875119894(119903)

represents precision of 119880119894over binary function rel

119894(119903)

Weplotted the similaritymeasure betweenusers using theNDCG matrices as defined by [38] in Figure 11

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Unsupervised User Similarity Mining in ...

6 The Scientific World Journal

from another For the extraction of mobility pattern from thesemantically arranged stay points we used prefix-span algo-rithm [33] on user mobility history The result of algorithmgave semantic pattern of the user over time for example⟨Home LabGoogle Grand parents⟩ along with all ofits subsequences As the result of the algorithm gave redun-dant patterns we adopted maximal trajectory pattern [34]technique for the representation of user mobility pattern Sothe resultant extracted patterns are true representatives offrequent mobility of user (FUM) called over time as

FUM = Ptn1 ⟨Home Stay point OfficeGoogle

Stay point ⟩

Ptn2 ⟨Home Super marketTopo hub

Grand parentsClub⟩

Ptn119899 ⟨OfficeBank Stay PointPark⟩

(5)

334 Projection of Similarity between Different UsersThrough Pattern Matching After the successful extractionof user pattern we determine if two patterns aresimilar through the longest common subsequence(LnCSS) measure For example if there is pattern119875 = ⟨Home Stay point OfficeGoogle Stay point⟩and 119876 = ⟨Home OfficeBank Stay point⟩ their LnCSSwill be = ⟨Home Office Stay point⟩ which we candefine as

Ratio ( LnCSS (119901 119902) 119901)

=sum|119901|

119894=1sum|LnCSS(119901119902)|119895=1

M (119901119894 LnCSS

119895)

10038161003816100381610038161199011003816100381610038161003816

M (119901119894 LnCSS

119895) =

119901119894isin LnCSS

119895

10038161003816100381610038161199011003816100381610038161003816

if LnCSS119895is matching to 119901

119894otherwise = 0

(6)

But the longest common subsequence (LnCSS)measurementis not only based on merely semantic tag based as same placeis tagged with different names against different users forexample Media Lab is tagged as Lab Media Lab Work laband so forth so we introduced the time threshold for thetransition that is THtrans between two stay points (as everysemantically tagged location consists of cell towers) furtherwe introduced the spatial property threshold using locationTHlocation area area so it together joins the spatiotemporalproperty of user mobility

So we can define the similarity between two users giventwo frequent user mobility (FUH) FUH

1= Pattern

11

Pattern12 Pattern

1119899 and FUH

2= Pattern

21

Pattern22 Pattern

2119899 and defined time threshold

THtime cover and location area threshold THlocation area wecan define similarity between users as

Similarity (FUH1 FUH

2THtime coverTHlocation area)

= Location area (Ptn11Ptn2119894)

+ Distance (Ptn1119899Ptn2119895le THlocation area)

CoL (Ptn11Ptn2119894) le 1 where

119894 119895 belongs to 119873 | 0 lt 119894 le 119895 le 119898

(7)

where CoL is colocation property of that find is user 119909 and 119910visit location 119897 at the same time to exploit their spatiotemporalproperty beside semantic tags as mentioned semantic tagscannot assure the true accuracy of similarity measure dueto different conventions used by different users for the sameplace Colocation rate idea is adopted from their work [35]where the most likely location of the user 119909 is defined as

119871 (119909) = arg119897isinLoc119875 (119909 119897) (8)

where 119871 is the most likely location of user 119909 and Locrepresents the cell towers or locations set user traversed overtime during mobility while 119875 is the probability of the user 119909to visit location 119897 and can be defined as

119875 (119909 119897) =

119899(119909)

sum119894=1

120575 (119897 119871119894(119909))

119899 (119909) (9)

where 120575(119909 119910) = 1 if 119909 = 119910 otherwise it is 0 Furtherdistance between user 119909 and 119910 can be defined as 119889(119909 119910) =dist(119871(119909) 119871(119910)) to represent physical distance between theirfrequent locations So on the basis of it colocation rate can bedefined as follows

CoL =

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816) 120575 (119871

119894(119909) minus 119871

119895(119910))

sum119899(119909)

119894=1sum119899(119910)

119895=1Θ(Δ119879 minus

10038161003816100381610038161003816119879119894(119909) minus 119879

119895(119910)10038161003816100381610038161003816)

(10)

As (10) shows that colocation rate counts both time andlocation simultaneously for two users that is 119909 and 119910 So itbinds both spatiotemporal trends together normalized overthe number of times the users visited the location whereΘ isHeaviside function and Δ119879 is equal to 119879stay time

On the basis of the above similarity measuring valuewe can calculate the similarity between the given user pairsthrough comparing their patterns over three basic unitssemantic tag spatial value and temporal value For this weconstruct a similarity matrix which gives a clear picture ofsimilarity measure against every frequent pattern pair oftwo different users After calculating the similarity betweenpatterns of two users these values are used to calculatethe overall similarity measure between two users to inferif they are related to each other or not For examplewe have two userrsquos FUH

1= 119875

11 11987512 and FUH

2=

11987521 11987522 to find their similarity with a given pair of

The Scientific World Journal 7

Office

School

Bank

Stay point

User 1

User 211987521

11987511

11987512

11987522

Figure 4 User mobility pattern

patterns where 11987511= ⟨Home⟩⟨BankOffice⟩⟨Stay point⟩

and 11987512

= ⟨Home⟩ ⟨School⟩ ⟨Office⟩ while 11987521

=

⟨Home⟩⟨BankOffice⟩⟨Stay point⟩ and 11987522

= ⟨Home⟩⟨Office⟩ Figure 4 shows the spatial incidence of thesepatterns We can construct a similarity matrix through it asshown in Table 1

So the similarity measure can be concluded as = Sumof all extracted similarity weightsTotal number of patternswhere the high value represents the most similarity amongtwo users while the lower value represents the dissimilaritybetween the two users

So on the basis of the above similarity measure we candefine the profile sharing measurement of the given users as

Sharing value (FUH1 FUH

2THlocation areaTHtransition)

=

1003816100381610038161003816119901 isin FUH1 | exist119902 isin FUH2 sdot similarity (119901 119902CoL)10038161003816100381610038161003816100381610038161003816FUH1

1003816100381610038161003816

(11)

4 Dataset

As mentioned earlier the selected dataset is taken frommining project group of MIT Media labs [36] This datasetis collected from 100 people who are students for a period of9 month with total activity span of 350K hoursThe collecteddata is logged on Symbian mobile that is Nokia 6600 whichhas no GPS in it so all of the information related to userlocation is identified by cell ID only While in this datasetcell global identity header has partial information where onlyLAC and cell ID are available for location tracking of the userHowever users have provided semantic tagging informationto the most important locations over mobility history indata logs But overall this semantic tag information variesa lot in terms of annotations and usage from user to userAnd among these users only 94 gave their full informationregarding similaritymeasure through online survey for socialinteractions From these 94 users 7 of the users do not havecell logs and 10 have no cell annotation logs So for ouranalysis there are only 77 users available for analysis andevaluation

Table 1 Similarity matrix

11987511

11987512

11987521

11987522

11987511

mdash mdash 1 011987512

mdash mdash 0 111987521

1 0 mdash mdash11987522

0 1 mdash mdash

5 Experiments and Results

As mentioned we have chosen the reality mining dataset forour experimental purposes The results are as follows

51 Retrieval of Location Information and Removal of OutliersUsing Raw Cell ID As the data is quite old in its nature andthere are frequent changes in mobile network so presencesof outliers and missing values are obvious in the datasetso we applied our clustering approach [30] on the raw datato remove spatial outliers from the data and extract theirlocation information through Google APIs [30] The resultof applied technique is shown in Table 2

Figure 5(a) shows the cells retrieved from the GoogleAPI and Figure 5(b) shows the consolidated data withoutoutliers while Figure 5(c) shows the complete effect of spatialclustering over the data

52 Observation of Semantic Tags in Dataset After theextraction of outlier free data points we applied our spatialclustering techniques [30 31] on the clean data to cluster themin terms of stay points which may or may not have semantictags in them As Table 2 shows that each semantic locationobserved can carry multiple cell IDs so these cells can beclustered together to define a common location As most ofthe time stays at known places are usually tagged so thesesemantic locations are of more importance than untaggedstay points [31] as shown in Table 3

53 Cell Oscillation Resolution and Discovery of Stay PointsAs defined previously cell oscillation is a phenomenonobvious in GSM dataset where user is assigned multiple

8 The Scientific World Journal

Latitude (deg)

Longitude (deg)

(a)

Latitude (deg)

Longitude (deg)

(b)

(c)

Figure 5

Table 2 Locations retrieved with respect to user mobility

of cells Total number of unique cells against subject X 1744 100 of cellrsquos location retrieved through GoogleAPI 680 39

of cellrsquos location retrieved through semantictagged algorithm 88 3

Total of cellrsquos location retrieved 768 42

Table 3 Semantic location with of representative cells

Semantic location Number of cells incidenceMgh 5Office 4Airport 4Home 4Gregrsquos apt 4Grand parents 3Google 3Redhat 3Chicago OrsquoHare 2Topo hub 2

cell IDs while being stationary for load balancing Table 2shows that semantic location can be identified with multiplecell IDs which make it clear that beside the semantic taginformation we are bound to use spatiotemporal analysiswith location for pattern building and finding similar usersas this semantic tag is limited in the dataWe have applied ourspatial clustering technique [31] on the dataset for the removalof cell oscillation and using the overlapping area analysis weidentified stay pointswhich are not semantically tagged by the

Table 4 Overlapping of locations

Cells (LAC Cell ID) of locations represented11986230

711986240

611986223

611986214

511986256

411986244

411986225

311986247

211986227

211986239

2

user otherwise And as per our previous assumption GSMcells are distributed in bubble form that they overlap witheach other this assumption is evident from resultant Table 4

As result we retrieved all the locations as a clustered cellswhich are representative of user mobility history rather theraw cells In Figure 6 of the tagged locations are plotted overgeographical map which shows the vicinity of tagged placesfurther Figure 6 shows our assumption is correct as ChicagoOrsquoHare airport is tagged twice with overlapped cells by theuser which shows overlapping of cells and oscillation oversame place

54 Discovery of Mobility Patterns After the extractionof mobility history in form of clustered cells where eachcluster represents the stay point with or without user definedsemantic tag we applied time-stamping methodology onit defined in Section 332 On this time-stamped clusteredhistory we applied the mobility pattern extraction techniqueproposed in Section 333 We used the THstay time of 20minutes along with THtransition of 10 minutes which are

The Scientific World Journal 9

119899 2Chicago OrsquoHare

Figure 6 Semantic location plotted over geographical map

0

10

20

30

40

50

60

Pattern length

Log

cove

rage

2 3 4 5 6 7 8 9 100

Figure 7 Mobility pattern length over log

adopted formourwork [31] After extraction of thesemobilitypatterns we implemented the maximal trajectory patterntechnique proposed in Section 333 for the representationof the most frequent mobility pattern We plotted graphin Figure 7 representing the log time coverage of the usermobility with respect to the length of pattern to analyze themost frequent pattern length

The figure clearly shows that most of the patterns whichcover the log of user mobility are of length 3 after whichthe coverage is declined tremendously This result givesinformation about the transition phase length set for LnCSSand trend of user over time which we will be using laterduring similarity measure

As the explored patterns carry all the basic informationlike location semantic tag and time stamping we can plotthe user trend quite easily As shown in Figure 8 we plottedthe user location visit history against log history period andwe have selected top locations only due to of space limitationWe selected only two-month user history for this plottingThe figure clearly shows that user spent most of his time atknown places and rarely explored new places and this alsoshows one important fact about mobility data that user spentmost of his time on the locationwhich he tagged semanticallythat is home MGH office and so forth This satisfies ourassumption in Section 3 about trend of the use mobility Thistrend also shows some facts like that the user visits somelocations like home everyday regardless of weekdays andweekend and some locations like office every weekday onlywhereas some locations like Gregrsquos home and grandparentsuser usually visit once in a week and on weekends

1 2 3 4 5 6 7 8 9 10 11 120

10

20

30

40

50

60Home

Office

Grand parents

MghGoogle

Mob

ility

cove

rage

Time spent0

Gregrsquos apartment

Figure 8 Locations visit over log

We plotted usermobility ratio in term of exploration withtime for further analysis in Figure 9 which shows that uservisit average is of 2ndash5 places daily while in exception casesuser visited more than 7 locations at a single day We plottedthe location visiting frequency over a data of 30 consecutivedays

55 Similarity Measure between Users For the calculation ofsimilarity measure we constructed the similarity matrix onthe basis of two main parameters that is semantic patternand spatiotemporal similarity For this we use LnCSS and setthe THlocation area to 10 being a Euclidean distance betweenthe two points of location that belong to two different usersto ensure if they fall in same area or not further we set theTHtimecover as 20 minutes as described in Section 334

After formulating the similarity matrix we plotted sim-ilarity measure between user 119909 and other users along withother user similarity measurement models that is spatialcosine similarity (SCS) and extra-role colocation rate (ERCR)[37] for evaluation Spatial cosine similarity can be defined assimilarity of visitation frequencies of user 119909 and 119910 assignedby cosine of angle between the two vectors with respect tonumber of visit at each location And extra-role colocation isbased on probability of two users 119909 and 119910 to colocate in thesame hour at night or on weekends This relationship servesas a great indicator to determine the friendship between twousers

We plotted the similarity measure between users usingour proposed methodology named HA (Hybrid approach)SCS and ERCR using the two metrics that is mean averageprecision (MAP) and normalized discounted cumulative gain(NDCG)

In Figure 10 we plotted the user similarity based onMAPmatrices where MAP can be defined as

MAP = 1

119873

119873

sum119894=1

sum119870

119879=1(119875119894(119903) times rel

119894(119903))

10038161003816100381610038161198771198941003816100381610038161003816

(12)

where119873 is number of test users and |119877119894| indicates number of

similar users for test user 119909 119903 donates cut-off rank and 119875119894(119903)

represents precision of 119880119894over binary function rel

119894(119903)

Weplotted the similaritymeasure betweenusers using theNDCG matrices as defined by [38] in Figure 11

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Unsupervised User Similarity Mining in ...

The Scientific World Journal 7

Office

School

Bank

Stay point

User 1

User 211987521

11987511

11987512

11987522

Figure 4 User mobility pattern

patterns where 11987511= ⟨Home⟩⟨BankOffice⟩⟨Stay point⟩

and 11987512

= ⟨Home⟩ ⟨School⟩ ⟨Office⟩ while 11987521

=

⟨Home⟩⟨BankOffice⟩⟨Stay point⟩ and 11987522

= ⟨Home⟩⟨Office⟩ Figure 4 shows the spatial incidence of thesepatterns We can construct a similarity matrix through it asshown in Table 1

So the similarity measure can be concluded as = Sumof all extracted similarity weightsTotal number of patternswhere the high value represents the most similarity amongtwo users while the lower value represents the dissimilaritybetween the two users

So on the basis of the above similarity measure we candefine the profile sharing measurement of the given users as

Sharing value (FUH1 FUH

2THlocation areaTHtransition)

=

1003816100381610038161003816119901 isin FUH1 | exist119902 isin FUH2 sdot similarity (119901 119902CoL)10038161003816100381610038161003816100381610038161003816FUH1

1003816100381610038161003816

(11)

4 Dataset

As mentioned earlier the selected dataset is taken frommining project group of MIT Media labs [36] This datasetis collected from 100 people who are students for a period of9 month with total activity span of 350K hoursThe collecteddata is logged on Symbian mobile that is Nokia 6600 whichhas no GPS in it so all of the information related to userlocation is identified by cell ID only While in this datasetcell global identity header has partial information where onlyLAC and cell ID are available for location tracking of the userHowever users have provided semantic tagging informationto the most important locations over mobility history indata logs But overall this semantic tag information variesa lot in terms of annotations and usage from user to userAnd among these users only 94 gave their full informationregarding similaritymeasure through online survey for socialinteractions From these 94 users 7 of the users do not havecell logs and 10 have no cell annotation logs So for ouranalysis there are only 77 users available for analysis andevaluation

Table 1 Similarity matrix

11987511

11987512

11987521

11987522

11987511

mdash mdash 1 011987512

mdash mdash 0 111987521

1 0 mdash mdash11987522

0 1 mdash mdash

5 Experiments and Results

As mentioned we have chosen the reality mining dataset forour experimental purposes The results are as follows

51 Retrieval of Location Information and Removal of OutliersUsing Raw Cell ID As the data is quite old in its nature andthere are frequent changes in mobile network so presencesof outliers and missing values are obvious in the datasetso we applied our clustering approach [30] on the raw datato remove spatial outliers from the data and extract theirlocation information through Google APIs [30] The resultof applied technique is shown in Table 2

Figure 5(a) shows the cells retrieved from the GoogleAPI and Figure 5(b) shows the consolidated data withoutoutliers while Figure 5(c) shows the complete effect of spatialclustering over the data

52 Observation of Semantic Tags in Dataset After theextraction of outlier free data points we applied our spatialclustering techniques [30 31] on the clean data to cluster themin terms of stay points which may or may not have semantictags in them As Table 2 shows that each semantic locationobserved can carry multiple cell IDs so these cells can beclustered together to define a common location As most ofthe time stays at known places are usually tagged so thesesemantic locations are of more importance than untaggedstay points [31] as shown in Table 3

53 Cell Oscillation Resolution and Discovery of Stay PointsAs defined previously cell oscillation is a phenomenonobvious in GSM dataset where user is assigned multiple

8 The Scientific World Journal

Latitude (deg)

Longitude (deg)

(a)

Latitude (deg)

Longitude (deg)

(b)

(c)

Figure 5

Table 2 Locations retrieved with respect to user mobility

of cells Total number of unique cells against subject X 1744 100 of cellrsquos location retrieved through GoogleAPI 680 39

of cellrsquos location retrieved through semantictagged algorithm 88 3

Total of cellrsquos location retrieved 768 42

Table 3 Semantic location with of representative cells

Semantic location Number of cells incidenceMgh 5Office 4Airport 4Home 4Gregrsquos apt 4Grand parents 3Google 3Redhat 3Chicago OrsquoHare 2Topo hub 2

cell IDs while being stationary for load balancing Table 2shows that semantic location can be identified with multiplecell IDs which make it clear that beside the semantic taginformation we are bound to use spatiotemporal analysiswith location for pattern building and finding similar usersas this semantic tag is limited in the dataWe have applied ourspatial clustering technique [31] on the dataset for the removalof cell oscillation and using the overlapping area analysis weidentified stay pointswhich are not semantically tagged by the

Table 4 Overlapping of locations

Cells (LAC Cell ID) of locations represented11986230

711986240

611986223

611986214

511986256

411986244

411986225

311986247

211986227

211986239

2

user otherwise And as per our previous assumption GSMcells are distributed in bubble form that they overlap witheach other this assumption is evident from resultant Table 4

As result we retrieved all the locations as a clustered cellswhich are representative of user mobility history rather theraw cells In Figure 6 of the tagged locations are plotted overgeographical map which shows the vicinity of tagged placesfurther Figure 6 shows our assumption is correct as ChicagoOrsquoHare airport is tagged twice with overlapped cells by theuser which shows overlapping of cells and oscillation oversame place

54 Discovery of Mobility Patterns After the extractionof mobility history in form of clustered cells where eachcluster represents the stay point with or without user definedsemantic tag we applied time-stamping methodology onit defined in Section 332 On this time-stamped clusteredhistory we applied the mobility pattern extraction techniqueproposed in Section 333 We used the THstay time of 20minutes along with THtransition of 10 minutes which are

The Scientific World Journal 9

119899 2Chicago OrsquoHare

Figure 6 Semantic location plotted over geographical map

0

10

20

30

40

50

60

Pattern length

Log

cove

rage

2 3 4 5 6 7 8 9 100

Figure 7 Mobility pattern length over log

adopted formourwork [31] After extraction of thesemobilitypatterns we implemented the maximal trajectory patterntechnique proposed in Section 333 for the representationof the most frequent mobility pattern We plotted graphin Figure 7 representing the log time coverage of the usermobility with respect to the length of pattern to analyze themost frequent pattern length

The figure clearly shows that most of the patterns whichcover the log of user mobility are of length 3 after whichthe coverage is declined tremendously This result givesinformation about the transition phase length set for LnCSSand trend of user over time which we will be using laterduring similarity measure

As the explored patterns carry all the basic informationlike location semantic tag and time stamping we can plotthe user trend quite easily As shown in Figure 8 we plottedthe user location visit history against log history period andwe have selected top locations only due to of space limitationWe selected only two-month user history for this plottingThe figure clearly shows that user spent most of his time atknown places and rarely explored new places and this alsoshows one important fact about mobility data that user spentmost of his time on the locationwhich he tagged semanticallythat is home MGH office and so forth This satisfies ourassumption in Section 3 about trend of the use mobility Thistrend also shows some facts like that the user visits somelocations like home everyday regardless of weekdays andweekend and some locations like office every weekday onlywhereas some locations like Gregrsquos home and grandparentsuser usually visit once in a week and on weekends

1 2 3 4 5 6 7 8 9 10 11 120

10

20

30

40

50

60Home

Office

Grand parents

MghGoogle

Mob

ility

cove

rage

Time spent0

Gregrsquos apartment

Figure 8 Locations visit over log

We plotted usermobility ratio in term of exploration withtime for further analysis in Figure 9 which shows that uservisit average is of 2ndash5 places daily while in exception casesuser visited more than 7 locations at a single day We plottedthe location visiting frequency over a data of 30 consecutivedays

55 Similarity Measure between Users For the calculation ofsimilarity measure we constructed the similarity matrix onthe basis of two main parameters that is semantic patternand spatiotemporal similarity For this we use LnCSS and setthe THlocation area to 10 being a Euclidean distance betweenthe two points of location that belong to two different usersto ensure if they fall in same area or not further we set theTHtimecover as 20 minutes as described in Section 334

After formulating the similarity matrix we plotted sim-ilarity measure between user 119909 and other users along withother user similarity measurement models that is spatialcosine similarity (SCS) and extra-role colocation rate (ERCR)[37] for evaluation Spatial cosine similarity can be defined assimilarity of visitation frequencies of user 119909 and 119910 assignedby cosine of angle between the two vectors with respect tonumber of visit at each location And extra-role colocation isbased on probability of two users 119909 and 119910 to colocate in thesame hour at night or on weekends This relationship servesas a great indicator to determine the friendship between twousers

We plotted the similarity measure between users usingour proposed methodology named HA (Hybrid approach)SCS and ERCR using the two metrics that is mean averageprecision (MAP) and normalized discounted cumulative gain(NDCG)

In Figure 10 we plotted the user similarity based onMAPmatrices where MAP can be defined as

MAP = 1

119873

119873

sum119894=1

sum119870

119879=1(119875119894(119903) times rel

119894(119903))

10038161003816100381610038161198771198941003816100381610038161003816

(12)

where119873 is number of test users and |119877119894| indicates number of

similar users for test user 119909 119903 donates cut-off rank and 119875119894(119903)

represents precision of 119880119894over binary function rel

119894(119903)

Weplotted the similaritymeasure betweenusers using theNDCG matrices as defined by [38] in Figure 11

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Unsupervised User Similarity Mining in ...

8 The Scientific World Journal

Latitude (deg)

Longitude (deg)

(a)

Latitude (deg)

Longitude (deg)

(b)

(c)

Figure 5

Table 2 Locations retrieved with respect to user mobility

of cells Total number of unique cells against subject X 1744 100 of cellrsquos location retrieved through GoogleAPI 680 39

of cellrsquos location retrieved through semantictagged algorithm 88 3

Total of cellrsquos location retrieved 768 42

Table 3 Semantic location with of representative cells

Semantic location Number of cells incidenceMgh 5Office 4Airport 4Home 4Gregrsquos apt 4Grand parents 3Google 3Redhat 3Chicago OrsquoHare 2Topo hub 2

cell IDs while being stationary for load balancing Table 2shows that semantic location can be identified with multiplecell IDs which make it clear that beside the semantic taginformation we are bound to use spatiotemporal analysiswith location for pattern building and finding similar usersas this semantic tag is limited in the dataWe have applied ourspatial clustering technique [31] on the dataset for the removalof cell oscillation and using the overlapping area analysis weidentified stay pointswhich are not semantically tagged by the

Table 4 Overlapping of locations

Cells (LAC Cell ID) of locations represented11986230

711986240

611986223

611986214

511986256

411986244

411986225

311986247

211986227

211986239

2

user otherwise And as per our previous assumption GSMcells are distributed in bubble form that they overlap witheach other this assumption is evident from resultant Table 4

As result we retrieved all the locations as a clustered cellswhich are representative of user mobility history rather theraw cells In Figure 6 of the tagged locations are plotted overgeographical map which shows the vicinity of tagged placesfurther Figure 6 shows our assumption is correct as ChicagoOrsquoHare airport is tagged twice with overlapped cells by theuser which shows overlapping of cells and oscillation oversame place

54 Discovery of Mobility Patterns After the extractionof mobility history in form of clustered cells where eachcluster represents the stay point with or without user definedsemantic tag we applied time-stamping methodology onit defined in Section 332 On this time-stamped clusteredhistory we applied the mobility pattern extraction techniqueproposed in Section 333 We used the THstay time of 20minutes along with THtransition of 10 minutes which are

The Scientific World Journal 9

119899 2Chicago OrsquoHare

Figure 6 Semantic location plotted over geographical map

0

10

20

30

40

50

60

Pattern length

Log

cove

rage

2 3 4 5 6 7 8 9 100

Figure 7 Mobility pattern length over log

adopted formourwork [31] After extraction of thesemobilitypatterns we implemented the maximal trajectory patterntechnique proposed in Section 333 for the representationof the most frequent mobility pattern We plotted graphin Figure 7 representing the log time coverage of the usermobility with respect to the length of pattern to analyze themost frequent pattern length

The figure clearly shows that most of the patterns whichcover the log of user mobility are of length 3 after whichthe coverage is declined tremendously This result givesinformation about the transition phase length set for LnCSSand trend of user over time which we will be using laterduring similarity measure

As the explored patterns carry all the basic informationlike location semantic tag and time stamping we can plotthe user trend quite easily As shown in Figure 8 we plottedthe user location visit history against log history period andwe have selected top locations only due to of space limitationWe selected only two-month user history for this plottingThe figure clearly shows that user spent most of his time atknown places and rarely explored new places and this alsoshows one important fact about mobility data that user spentmost of his time on the locationwhich he tagged semanticallythat is home MGH office and so forth This satisfies ourassumption in Section 3 about trend of the use mobility Thistrend also shows some facts like that the user visits somelocations like home everyday regardless of weekdays andweekend and some locations like office every weekday onlywhereas some locations like Gregrsquos home and grandparentsuser usually visit once in a week and on weekends

1 2 3 4 5 6 7 8 9 10 11 120

10

20

30

40

50

60Home

Office

Grand parents

MghGoogle

Mob

ility

cove

rage

Time spent0

Gregrsquos apartment

Figure 8 Locations visit over log

We plotted usermobility ratio in term of exploration withtime for further analysis in Figure 9 which shows that uservisit average is of 2ndash5 places daily while in exception casesuser visited more than 7 locations at a single day We plottedthe location visiting frequency over a data of 30 consecutivedays

55 Similarity Measure between Users For the calculation ofsimilarity measure we constructed the similarity matrix onthe basis of two main parameters that is semantic patternand spatiotemporal similarity For this we use LnCSS and setthe THlocation area to 10 being a Euclidean distance betweenthe two points of location that belong to two different usersto ensure if they fall in same area or not further we set theTHtimecover as 20 minutes as described in Section 334

After formulating the similarity matrix we plotted sim-ilarity measure between user 119909 and other users along withother user similarity measurement models that is spatialcosine similarity (SCS) and extra-role colocation rate (ERCR)[37] for evaluation Spatial cosine similarity can be defined assimilarity of visitation frequencies of user 119909 and 119910 assignedby cosine of angle between the two vectors with respect tonumber of visit at each location And extra-role colocation isbased on probability of two users 119909 and 119910 to colocate in thesame hour at night or on weekends This relationship servesas a great indicator to determine the friendship between twousers

We plotted the similarity measure between users usingour proposed methodology named HA (Hybrid approach)SCS and ERCR using the two metrics that is mean averageprecision (MAP) and normalized discounted cumulative gain(NDCG)

In Figure 10 we plotted the user similarity based onMAPmatrices where MAP can be defined as

MAP = 1

119873

119873

sum119894=1

sum119870

119879=1(119875119894(119903) times rel

119894(119903))

10038161003816100381610038161198771198941003816100381610038161003816

(12)

where119873 is number of test users and |119877119894| indicates number of

similar users for test user 119909 119903 donates cut-off rank and 119875119894(119903)

represents precision of 119880119894over binary function rel

119894(119903)

Weplotted the similaritymeasure betweenusers using theNDCG matrices as defined by [38] in Figure 11

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Unsupervised User Similarity Mining in ...

The Scientific World Journal 9

119899 2Chicago OrsquoHare

Figure 6 Semantic location plotted over geographical map

0

10

20

30

40

50

60

Pattern length

Log

cove

rage

2 3 4 5 6 7 8 9 100

Figure 7 Mobility pattern length over log

adopted formourwork [31] After extraction of thesemobilitypatterns we implemented the maximal trajectory patterntechnique proposed in Section 333 for the representationof the most frequent mobility pattern We plotted graphin Figure 7 representing the log time coverage of the usermobility with respect to the length of pattern to analyze themost frequent pattern length

The figure clearly shows that most of the patterns whichcover the log of user mobility are of length 3 after whichthe coverage is declined tremendously This result givesinformation about the transition phase length set for LnCSSand trend of user over time which we will be using laterduring similarity measure

As the explored patterns carry all the basic informationlike location semantic tag and time stamping we can plotthe user trend quite easily As shown in Figure 8 we plottedthe user location visit history against log history period andwe have selected top locations only due to of space limitationWe selected only two-month user history for this plottingThe figure clearly shows that user spent most of his time atknown places and rarely explored new places and this alsoshows one important fact about mobility data that user spentmost of his time on the locationwhich he tagged semanticallythat is home MGH office and so forth This satisfies ourassumption in Section 3 about trend of the use mobility Thistrend also shows some facts like that the user visits somelocations like home everyday regardless of weekdays andweekend and some locations like office every weekday onlywhereas some locations like Gregrsquos home and grandparentsuser usually visit once in a week and on weekends

1 2 3 4 5 6 7 8 9 10 11 120

10

20

30

40

50

60Home

Office

Grand parents

MghGoogle

Mob

ility

cove

rage

Time spent0

Gregrsquos apartment

Figure 8 Locations visit over log

We plotted usermobility ratio in term of exploration withtime for further analysis in Figure 9 which shows that uservisit average is of 2ndash5 places daily while in exception casesuser visited more than 7 locations at a single day We plottedthe location visiting frequency over a data of 30 consecutivedays

55 Similarity Measure between Users For the calculation ofsimilarity measure we constructed the similarity matrix onthe basis of two main parameters that is semantic patternand spatiotemporal similarity For this we use LnCSS and setthe THlocation area to 10 being a Euclidean distance betweenthe two points of location that belong to two different usersto ensure if they fall in same area or not further we set theTHtimecover as 20 minutes as described in Section 334

After formulating the similarity matrix we plotted sim-ilarity measure between user 119909 and other users along withother user similarity measurement models that is spatialcosine similarity (SCS) and extra-role colocation rate (ERCR)[37] for evaluation Spatial cosine similarity can be defined assimilarity of visitation frequencies of user 119909 and 119910 assignedby cosine of angle between the two vectors with respect tonumber of visit at each location And extra-role colocation isbased on probability of two users 119909 and 119910 to colocate in thesame hour at night or on weekends This relationship servesas a great indicator to determine the friendship between twousers

We plotted the similarity measure between users usingour proposed methodology named HA (Hybrid approach)SCS and ERCR using the two metrics that is mean averageprecision (MAP) and normalized discounted cumulative gain(NDCG)

In Figure 10 we plotted the user similarity based onMAPmatrices where MAP can be defined as

MAP = 1

119873

119873

sum119894=1

sum119870

119879=1(119875119894(119903) times rel

119894(119903))

10038161003816100381610038161198771198941003816100381610038161003816

(12)

where119873 is number of test users and |119877119894| indicates number of

similar users for test user 119909 119903 donates cut-off rank and 119875119894(119903)

represents precision of 119880119894over binary function rel

119894(119903)

Weplotted the similaritymeasure betweenusers using theNDCG matrices as defined by [38] in Figure 11

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Unsupervised User Similarity Mining in ...

10 The Scientific World Journal

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290123456789

10

Sample 30 consecutive days

Num

ber o

f loc

atio

ns v

isite

d

Figure 9 Userrsquos locations visit over days

HAERCR

1 2 3 4 60123456789

10

119870

5

SCS

MA

P

Figure 10 MAP with respect to proposed methodology perfor-mance

0123456789

10

nDCG

1 2 3 4 65119870

HAERCRSCS

Figure 11 nDCG with respect to proposed methodology perfor-mance

Figures 10 and 11 clearly show that our proposed method-ology outperforms with respect to spatial cosine similarity(SCS) and extra-role colocation rate (ERCR) over the indi-cated matrices The experiments show that our proposedmethodology outperforms on real dataset of MIT as com-pared to other mentioned methodologies

6 Conclusion and Future Work

In this paper we presented the two phases methodologyof building mobility profile and finding user similaritywhich is unsupervised approach based on the semantictag information along with the spatiotemporal trends Asour methodology uses both semantic and spatiotemporaltrends this makes it outperform over the mere use oftag spatial- or spatiotemporal-based methodologies definedearlier As discovering the similar user can play a vital role inmany potential applications related to location-based services(LBS) so our approach is quite efficient where semantictag along with spatiotemporal trends can serve as a preciseapproach towards similarity measurement between differentusers Further this methodology is complete frameworkwhich resolves all mobility profiling issues that is outlierdetection missing values retrieval cell oscillation user tra-jectory profiling and similarity measure

Future studies can be carried out on the behavior of usersdepending on transient reactions like civil work on road anda special event together with their trajectories to determinethe exact behavior of user for its application in real scenariosfor location-based services

Acknowledgments

The work described in this paper was supported by grantsfrom Natural Science Foundation of China (Grant no60775037) The National Major Special Science and Tech-nology Projects (Grant no 2011ZX04016-071) The HeGaoJiNational Major Special Science and Technology Projects(Grant no 2012ZX01029001-002) and Research Fund forthe Doctoral Program of Higher Education of China(20093402110017 20113402110024)

References

[1] M Couceiro D Suarez D Manzano and L Lafuente ldquoDatastream processing on real-time mobile advertisementrdquo in Pro-ceedings of the 12th IEEE International Conference on MobileData Management pp 313ndash320 June 2011

[2] Ad Orchestrator httpprodcatericssonsefrontendcategoryactioncode=FGD2010120008

[3] C Yang J Yang X Luo and P Gong ldquoUse of mobile phones inan emergency reporting system for infectious disease surveil-lance after the Sichuan earthquake in Chinardquo Bulletin of theWorld Health Organization vol 87 no 8 pp 619ndash623 2009

[4] S S Kanhere ldquoParticipatory sensing crowdsourcing data frommobile smartphones in urban spacesrdquo in Proceedings of the12th IEEE International Conference onMobileDataManagement(MDM rsquo11) pp 3ndash6 June 2011

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Unsupervised User Similarity Mining in ...

The Scientific World Journal 11

[5] L Chen M Lv Q Ye G Chen and J Woodward ldquoA personalroute prediction system based on trajectory data miningrdquoInformation Sciences vol 181 no 7 pp 1264ndash1284 2011

[6] httpthenextwebcommobile20110703the-rise-of-the-mo-bile-social-network

[7] A Harter and A Hopper ldquoDistributed location system for theactive officerdquo IEEE Network vol 8 no 1 pp 62ndash70 1994

[8] A Harter A Hopper P Steggles A Ward and PWebster ldquoTheanatomy of a context-aware applicationrdquoWireless Networks vol8 no 2-3 pp 187ndash197 2002

[9] ldquoLocation technologies for GSM GPRS and WCDMA net-worksrdquo White Paper SnapTrack A QUALCOMM November2001

[10] M Garetto and E Leonardi ldquoAnalysis of random mobilitymodels with pdersquosrdquo in Proceedings of the 7th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing(MobiHoc rsquo06) pp 73ndash84 Florence Italy May 2006

[11] T Abdelzaher Y Anokwa P Boda et al ldquoMobiscopes forhuman spacesrdquo IEEE Pervasive Computing vol 6 no 2 pp 20ndash29 2007

[12] M Demirbas et al ldquoiMAP indirect measurement of airpollution with cellphonesrdquo CSE Technical Reports 2008

[13] A Pentland ldquoAutomatic mapping and modeling of humannetworksrdquo Physica A Statistical Mechanics and Its Applicationsvol 378 no 1 pp 59ndash67 2007

[14] A Krause E Horvitz A Kansal and F Zhao ldquoToward commu-nity sensingrdquo in Proceedings of the International Conference onInformation Processing in Sensor Networks (IPSN rsquo08) pp 481ndash492 April 2008

[15] K Laasonen ldquoClustering and prediction of mobile user routesfrom cellular datardquo in Proceedings of the 9th European Con-ference on Principles and Practice of Knowledge Discovery inDatabases (PKDD rsquo05) pp 569ndash576 2005

[16] A Lindgren C Diot and J Scott ldquoImpact of communicationinfrastructure on forwarding in pocket switched networksrdquoin Proceedings of the SIGCOMM Workshop on ChallengedNetworks (SIGCOMM rsquo06) pp 261ndash268 2006

[17] L Wang Y Jia and W Han ldquoInstant message clusteringbased on extended vector space modelrdquo in Proceedings of the2nd International Conference on Advances in Computation andIntelligence (ISICA rsquo07) pp 435ndash443 2007

[18] N D Ziv and B Mulloth ldquoAn exploration on mobile socialnetworking dodgeball as a case in pointrdquo in Proceedings ofthe International Conference on Mobile Business (ICMB rsquo06)Copenhagen Denmark June 2006

[19] V Bychkovsky K Chen M Goraczko et al ldquoCartel a dis-tributed mobile sensor computing systemrdquo in Proceedings of the4th International Conference on Embedded Networked SensorSystems (SenSys rsquo06) pp 125ndash138 November 2006

[20] J Burke D Estrin M Hansen et al ldquoParticipatory sensingrdquo inProceedings of ACM Sensys World Sensor Web Workshop 2006

[21] D Kirovski N Oliver M Sinclair and D Tan ldquoHealth-OSa position paperrdquo in Proceedings of the ACM SIGMOBILEInternational Workshop on Systems and Networking Support forHealthcare and Assisted Living Environments pp 76ndash78 June2007

[22] EMDaly andMHaahr ldquoSocial network analysis for routing indisconnected delay-tolerantMANETsrdquo in Proceedings of the 8thACM International Symposium on Mobile Ad Hoc Networkingand Computing (MobiHoc rsquo07) pp 32ndash40 September 2007

[23] M M Zonoozi and P Dassanayake ldquoUser mobility modelingand characterization of mobility patternsrdquo IEEE Journal onSelected Areas in Communications vol 15 no 7 pp 1239ndash12521997

[24] J G Markoulidakis G L Lyberopoulos D F Tsirkas andE D Sykas ldquoMobility modeling in third-generation mobiletelecommunications systemsrdquo IEEE Personal Communicationsvol 4 no 4 pp 41ndash51 1997

[25] I F Akyildiz and W Wang ldquoThe predictive usermobility pro-file framework for wireless multimedia networksrdquo IEEEACMTransactions on Networking vol 12 no 6 pp 1021ndash1035 2004

[26] A Chaintreau P Hui J Crowcroft C Diot R Gass and JScott ldquoImpact of human mobility on opportunistic forwardingalgorithmsrdquo IEEE Transactions onMobile Computing vol 6 no6 pp 606ndash620 2007

[27] M Musolesi and C Mascolo ldquoMobility models for systemsevaluation A surveyrdquo inMiddleware for Network Eccentric andMobile Applications Springer Berlin Germany 2009

[28] M C Gonzalez C A Hidalgo and A L Barabasi ldquoUnder-standing individual human mobility patternsrdquo Nature vol 453no 7196 pp 779ndash782 2008

[29] P Nurmi and J Koolwaaij ldquoIdentifying meaningful locationsrdquoin Proceedings of the 3rd Annual International Conferenceon Mobile and Ubiquitous Systems Networking and Services(MobiQuitous rsquo06) IEEE Computer Society Sun Jose CalifUSA July 2006

[30] S A Shad and E Chen ldquoPrecise location acquisition ofmobilitydata using cell-idrdquo International Journal of Computer ScienceIssues vol 9 no 3 pp 222ndash231 2012

[31] S A Shad E Chen and T Bao ldquoCell oscillation resolutionin mobility profile buildingrdquo International Journal of ComputerScience Issues vol 9 no 3 pp 205ndash213 2012

[32] R Trasarti F Pinelli M Nanni and F Giannotti ldquoMiningmobility user profiles for car poolingrdquo in Proceedings of the17th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining pp 1190ndash1198 ACM 2011

[33] J Pei J Han B Mortazavi-Asl et al ldquoPrefixSpan min-ing sequential patterns efficiently by prefix-projected patterngrowthrdquo in Proceedings of the 17th International Conference onData Engineering (ICDE rsquo01) pp 215ndash224 April 2001

[34] C Luo and S Chung ldquoEfficient mining of maximal sequentialpatterns using multiple samplesrdquo in Proceeding of the SIAMInternational Conference on Data Mining (SDM rsquo05) pp 415ndash426 Newport Beach Calif USA 2005

[35] D Wang D Pedreschi C Song F Giannotti and A -LBarabasi ldquoHuman mobility social ties and link predictionrdquo inProceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining San Diego CalifUSA August 2011

[36] 2012 httprealitymediamitedudownloadphp[37] N Eagle A Pentland and D Lazer ldquoInferring friendship

network structure by using mobile phone datardquo Proceedings ofthe National Academy of Sciences of the United States of Americavol 106 no 36 pp 15274ndash15278 2009

[38] K Jarvelin and J Kekalainen ldquoIR evaluation methods forretrieving highly relevant documentsrdquo inProceedings of the 23rdAnnual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR rsquo00) pp 41ndash48ACM New York NY USA 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Unsupervised User Similarity Mining in ...

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014