Location-Centric Analysis For Advanced Indoor Mobility...

7
Location-Centric Analysis For Advanced Indoor Mobility Models: Nodes Temporal Density Predictions and Distribution Mimonah Al Qathrady, Ahmed Helmy Computer and Information Science and Engineering Department Gainesville, Florida, USA {mimonah,helmy}@ufl.edu ABSTRACT Building’s density, as its number of nodes at a specific period, is a significant parameter that affects smart and mobile applications per- formances and evaluations. Consequently, the buildings’ temporal density predictions and their nodes spatial distribution modeling have to follow real-world scenarios to a realistic evaluation. How- ever, there is lack of real-buildings density studies that examine these aspects thoroughly. As a result, this work presents a data- driven study that investigates the temporal density predictability and spatial density distributions of more than 100 real buildings with ten different categories, over 150 days across three semesters. The study covers the buildings nodes’ temporal modeling and pre- dictions, and their spatial distributions in the building. Seasonal predictive models are utilized to predict hour-by-hour density for a variable length of consequent periods using training data with different lengths. The models include Seasonal Naive, Holt-Winter additive seasonal, TBATS, and ARIMA-seasonal. The results show that the Seasonal Naive model is usually selected as the best predic- tive model when using data of the shorter period of time to predict the following periods. For example, Seasonal Naive predicted with the least error in 73%, 63% and 57% of cases in Summer, Spring and Fall respectively when using only one week to predict its consecu- tive five weeks. However, when using five weeks of data to predict the sixth week, TBATS model predicted with the least error in 60%, 54% and 43% of cases in Fall, Spring and Summer respectively. When investigating the spatial density distributions, power law and log-logistic and lognormal distributions are usually selected as the first best-fit nodes distributions for 82%, 65%, 62% of buildings in the Summer, Spring and Fall respectively. 1 INTRODUCTION Buildings density is defined as the number of nodes in the area during a specific period. It is an individual behavioral pattern of a location. The density is an important parameter in the indoor modeling scenarios since it affects many wireless network func- tions and characteristics such as capacity, connectivity, and routing algorithm. Moreover, many indoor operations depend on update information about the population density, for example, system man- agement of pedestrian flow inside crowd buildings [16]. Therefore, understanding the different aspects of indoor density, and repro- ducing them is requisite for realistic modeling and smart service designing and evaluation. The density has always been set as a static parameter that does not change during the simulation. However, in real life, buildings densities are dynamic and change from one period to another as nodes keep coming to the buildings and then leaving from one pe- riod to another. It is necessary to measure the population number change over time as it is reflected in the real traces. The dynamic change of density is essential to be considered when modeling the user mobility. This is the first time, to the best of our knowledge, where the building level multi-temporal dynamic spatial density of population has been investigated. The study processes the trace data in days and hours respectively. From time-series density data, we observed clear peak hours during the day for all of the build- ings. Then, the spectrum analyses were applied to investigate the frequency domain of the population density. Discovering the domi- nant density cycle facilitates reconstructing of the building density in indoor mobility models. The analysis of this paper involves more than 98 million mobile records from more than 100 buildings during three different semesters: Spring, Summer, and Fall. It covers several buildings categories including academic, museums, libraries, labs, administrations offices, sports facilities, dining, theaters, housing and health-care facilities. Moreover, population density prediction is an important mechanism when designing future indoor services or evaluating them. For example, predicting the crowd density in a crit- ical situation such as emergency evacuation helps to implement the right and efficient evacuation plan for the building. Consequently, we have examined four seasonal models to predict hour density in the building: Seasonal Naive, Holt-Winter Seasonal Additive, TBATS, and ARIMA-seasonal. More than 5 million hour prediction operations are performed to establish the validity of these models and their ability to predict future population density at the building level. After that, building level density distribution is studied across different buildings. Previously, The nodes are usually distributed uniformly in the mobility models. However, this does not represent the reality of nodes distributions in real buildings. Density distribu- tions for outdoor environments have been analyzed before, and it has been found to follow a power law distributions [9, 15]. In this work, we investigate the density at the building level where the number of nodes among different access points is examined. In summary, the contributions of this work are as follow: Multi-temporal dynamic densities of more than hundred buildings during different semesters are studied for the first time. The Fourier Transform analyses are used to analyze the density frequency domains and discover the cycle. The study shows 24-hour as the dominant cycle for most of the buildings. The prediction of density per hour is studied extensively for all possible combination of modeling and testing windows. Several seasonal modeling is investigated: Seasonal Naive, Additive Holt-Winter, TBATS, and ARIMA. The result shows the seasonal naive prediction error is the least in most cases,

Transcript of Location-Centric Analysis For Advanced Indoor Mobility...

Page 1: Location-Centric Analysis For Advanced Indoor Mobility ...qathrady/reports/BuildingsDensityAnalysis.pdf · the modeling process. •The density distribution of the population in different

Location-Centric Analysis For Advanced Indoor MobilityModels: Nodes Temporal Density Predictions and Distribution

Mimonah Al Qathrady, Ahmed HelmyComputer and Information Science and Engineering Department

Gainesville, Florida, USA{mimonah,helmy}@ufl.edu

ABSTRACTBuilding’s density, as its number of nodes at a specific period, is asignificant parameter that affects smart and mobile applications per-formances and evaluations. Consequently, the buildings’ temporaldensity predictions and their nodes spatial distribution modelinghave to follow real-world scenarios to a realistic evaluation. How-ever, there is lack of real-buildings density studies that examinethese aspects thoroughly. As a result, this work presents a data-driven study that investigates the temporal density predictabilityand spatial density distributions of more than 100 real buildingswith ten different categories, over 150 days across three semesters.The study covers the buildings nodes’ temporal modeling and pre-dictions, and their spatial distributions in the building. Seasonalpredictive models are utilized to predict hour-by-hour density fora variable length of consequent periods using training data withdifferent lengths. The models include Seasonal Naive, Holt-Winteradditive seasonal, TBATS, and ARIMA-seasonal. The results showthat the Seasonal Naive model is usually selected as the best predic-tive model when using data of the shorter period of time to predictthe following periods. For example, Seasonal Naive predicted withthe least error in 73%, 63% and 57% of cases in Summer, Spring andFall respectively when using only one week to predict its consecu-tive five weeks. However, when using five weeks of data to predictthe sixth week, TBATS model predicted with the least error in60%, 54% and 43% of cases in Fall, Spring and Summer respectively.When investigating the spatial density distributions, power law andlog-logistic and lognormal distributions are usually selected as thefirst best-fit nodes distributions for 82%, 65%, 62% of buildings inthe Summer, Spring and Fall respectively.

1 INTRODUCTIONBuildings density is defined as the number of nodes in the areaduring a specific period. It is an individual behavioral pattern ofa location. The density is an important parameter in the indoormodeling scenarios since it affects many wireless network func-tions and characteristics such as capacity, connectivity, and routingalgorithm. Moreover, many indoor operations depend on updateinformation about the population density, for example, systemman-agement of pedestrian flow inside crowd buildings [16]. Therefore,understanding the different aspects of indoor density, and repro-ducing them is requisite for realistic modeling and smart servicedesigning and evaluation.The density has always been set as a static parameter that doesnot change during the simulation. However, in real life, buildingsdensities are dynamic and change from one period to another as

nodes keep coming to the buildings and then leaving from one pe-riod to another. It is necessary to measure the population numberchange over time as it is reflected in the real traces. The dynamicchange of density is essential to be considered when modeling theuser mobility. This is the first time, to the best of our knowledge,where the building level multi-temporal dynamic spatial densityof population has been investigated. The study processes the tracedata in days and hours respectively. From time-series density data,we observed clear peak hours during the day for all of the build-ings. Then, the spectrum analyses were applied to investigate thefrequency domain of the population density. Discovering the domi-nant density cycle facilitates reconstructing of the building densityin indoor mobility models. The analysis of this paper involves morethan 98 million mobile records frommore than 100 buildings duringthree different semesters: Spring, Summer, and Fall. It covers severalbuildings categories including academic, museums, libraries, labs,administrations offices, sports facilities, dining, theaters, housingand health-care facilities. Moreover, population density prediction isan important mechanism when designing future indoor services orevaluating them. For example, predicting the crowd density in a crit-ical situation such as emergency evacuation helps to implement theright and efficient evacuation plan for the building. Consequently,we have examined four seasonal models to predict hour densityin the building: Seasonal Naive, Holt-Winter Seasonal Additive,TBATS, and ARIMA-seasonal. More than 5 million hour predictionoperations are performed to establish the validity of these modelsand their ability to predict future population density at the buildinglevel. After that, building level density distribution is studied acrossdifferent buildings. Previously, The nodes are usually distributeduniformly in the mobility models. However, this does not representthe reality of nodes distributions in real buildings. Density distribu-tions for outdoor environments have been analyzed before, and ithas been found to follow a power law distributions [9, 15]. In thiswork, we investigate the density at the building level where thenumber of nodes among different access points is examined.

In summary, the contributions of this work are as follow:

• Multi-temporal dynamic densities of more than hundredbuildings during different semesters are studied for the firsttime. The Fourier Transform analyses are used to analyzethe density frequency domains and discover the cycle. Thestudy shows 24-hour as the dominant cycle for most of thebuildings.• The prediction of density per hour is studied extensively forall possible combination of modeling and testing windows.Several seasonal modeling is investigated: Seasonal Naive,Additive Holt-Winter, TBATS, and ARIMA. The result showsthe seasonal naive prediction error is the least in most cases,

Page 2: Location-Centric Analysis For Advanced Indoor Mobility ...qathrady/reports/BuildingsDensityAnalysis.pdf · the modeling process. •The density distribution of the population in different

≤ 0.3 on average, while other models improve when thetraining data size is increased.• The temporal encounter events at the building level are alsostudied and correlated with the temporal density. Whilelibraries buildings usually have a strong positive correlation,a lab has a weak correlation, which points to the importanceof considering the building category and total density duringthe modeling process.• The density distribution of the population in different cat-egories of buildings are studied. The distributions of a daymostly follow a power-law distribution or log-logistic.

To the best of our knowledge, this building level temporal dy-namic density prediction and spatial distributions study is the firstof its kind in term of data or level of analysis. Besides, we believethat the result of this study will have a beneficial impact on futuremobility modeling, evaluation, and designing of IoT applicationsincluding the ones related to crowd management or tracking.

The remainder of this paper is organized as follows: The back-ground and related works are provided in section 2. The densityanalysis framework is presented in section 3. Then, the results arediscussed in section 4. The conclusion and future work are providedin 5.

2 BACKGROUND AND RELATEDWORKVarious researchers have taken Data-driven mobility models ap-proach. Therefore, real data traces have been collected and analyzedto build the models. Traces analysis are categorized as user-centricor location-centric [5].

2.1 User-Centric Vs. Location CentricUser-centric approaches to study the mobility behavior of the user,by tracking a user history, and their statistical patterns. Most ofthe mobility studies are user-centric where they focused on themobile user individual, pairwise or collective behavior [17]. Indi-vidual behaviors such as temporal and spatial preferences, a useraverage speed and duration have been analyzed [11]. Pairwise pat-terns include encounter studies and modeling [10], while collectivebehaviors detect communities and model them [17].Location-centric analysis approaches focus on the location andsummarize its statistical characteristics of its visitors regardlessof the identity or mobility histories of their users. Examples arestudying the pair-wise characteristics of location based on the flowof users between locations [5]. Another study is the location densityand density distributions as an individual location pattern charac-teristics.

2.2 Density Analysis and ModelingSince they are significant factors in mobility models, several previ-ous studies analyzed the spatial density distributions over outsidearea from different traces environment, and a mobility model thatpreserve density distributions is proposed [9, 15]. Most of the pre-vious studies were not designed to be used for modeling at thebuilding level. They usually cover a wide range of area and trackpedestrian movement outdoors. On the other side, this paper hasfocused on the spatial distributions of users at the building level.

Table 1: Wireless trace information of buildings categories

Semester From To Records Mac# ApsSpring 12-Mar 30-Apr 40,975,015 100,871 1727Summer 22-Jun 10-Aug 14,106,673 66,839 1938Fall 6-Sep 25-Oct 43,976,833 103,216 1974

Also, the study tracks the density change over time at the multi-granularity temporal level. The change of density from one periodto another is necessary to be captured and modeled in realisticindoor mobility.

Moreover, this study considers the building categories, compareand contrast between buildings based on their category.

3 DATA-DRIVEN BUILDING LEVEL DENSITYANALYSIS AND MODELING FRAMEWORK

Data Processing 

Buildings'  Categories

Data Filtering

Academic  Libraries

Administration

Housing Dining

Health

Museum 

Theater

Sport

Labs

Reduce Ping Pong

Session Duration

Phones Vs. Laptop

Density Analysis & Modeling Temporal Density

Predictions per Hour

Seasonal Naive

Holt-Winter-Additive Auto.ARIMA

Spatial Density

Building Level DistributionsT

rain

ing

an

d T

estin

g D

ata

Win

dow

s

Pre

dic

tio

ns A

sse

ssm

en

t

TBATS

Figure 1: Data Driven Density Analysis Framework

The analysis framework is presented in figure 1. The processstarts with data processing and building categorization. Then, thedensity analysis consists of two parts: temporal and spatial. Themulti-temporal analysis tracks the hour by hour and day by daydensity, while spatial analysis focuses on the distributions of thedensity in the buildings.

3.1 DataThis section consists of two parts: description of the wireless datathat is used in this study, and the data processing.

3.1.1 Wireless data. They are collected from more than 100buildings on the university campus. The mobile records includewhen a user starts and ends an association with an access point.The buildings belong to ten different categories. The categoriesinclude museums, academic, libraries, labs, administration, sportsfacilities, dining, and housing. Each access point is tagged withthe buildingsćode and the room number. Information about thebuildings categories data such as the number of buildings, users,and records are presented in table 2. The traces are for 150 days forthree semesters: Spring, Semester and Fall.

Page 3: Location-Centric Analysis For Advanced Indoor Mobility ...qathrady/reports/BuildingsDensityAnalysis.pdf · the modeling process. •The density distribution of the population in different

3.1.2 Records Filters. We filtered out the mobile records for twopurposes: reducing the effect of ping pong, and filtering based onthe device type.

Filtering phone devices. Our traces contain the mobile data ofboth phones and laptops. Most of the people always have theirSmartphone at their side day and night [1] and they are more likelyto use it while they are moving than laptops users. As a result, weconcentrate on phone mobile data and filter out the laptop data.The device mac address in the traces and their website visitationare used to identify if the device is a laptop or phone. More detailabout device type filtering can be found in [6, 14].

Filters to reduce the ping-pong Effect. The ping-pong effect occurswhen the wireless user is at the edge of the two access points andhope between them. Here the user does not change its location.Indeed, the user is being directed back and forth between the accesspoints. Then, the user appears to be repeatedly associated with afixed number of access points. Due to incomplete information andthe ambiguity in its interpretation [18], there is no perfect solutionto this problem. To reduce the ping-pong effect, we filtered out therecords that have association back and forth between two accesspoints in less than Θ. In our filters, we assign Θ =10. IncreasingΘ values risks deleting records that do not result from ping-pongeffects. This process filtered out less than 0.01 of the records.

Filtering very short session duration. Since the analysis done atthe building level, and the people that are suspected to be passingoutdoor are to be discarded. As a result, records with session dura-tion that are less than Theta are to be ignored. In this part, Thetaare selected to be one second. We are cautious here to not throwrecords that happens inside the building.

3.1.3 Assigning Buildings Category. Our data includes severalbuildings categories. The Access points are linked to their buildingsusing online information about buildings code. Then, the buildingshave been categorized. The categories are academic, administration,labs, dining, housing, sports facilities, museum, libraries, theatersand auditorium and health care. To identify which buildings belongto what category, we use the following means:

(1) Campus Map: We used online university campus map [4] toidentify several buildings categories. The map offer answersto queries such as the dining, libraries or housing buildings.

(2) Academic and classrooms buildings: pictures and informa-tion about classrooms on the campus can be found in [2].

(3) Specific Building Information: there are web pages that intro-duce a building history and what the building have been usedfor, especially the named facility, as in [3]. We were able tocategorize more than 102 buildings into a specific category.The categories include Academic, Administration, Sports,Labs, Libraries, Museums, Dining, Theaters and Health Care.Some buildings have several functionalities such as a build-ing consist of classrooms on one floor and a museum on thesecond floor. Hence, we do not consider them to belong to aspecific category.

3.2 Density Analysis and ModelingThe density analysis covers two parts: one is analyzing the temporalchange, and the other is to study the density distribution inside thebuilding.

3.2.1 Temporal Density Analysis and prediction. We targetedtwo temporal density analysis: day by day and hour by hour. Ineach building, the number of population per hour is computed. Theanalysis focuses on two parts: the spectrum analysis of density data,and correlate the density with encounter events.

Spectrum analysis of density. The frequency of the density dataare analyzed to answer the following questions:

(1) Is the population density has a dominant cycle that keepsrepeating again and again?

(2) If there is a dominant cycle, what is it?The frequency magnitude in the spectrum. The Fast Fourier Trans-form is used to calculate the periodogram. If γh is the autocovari-ance function for the data at time lag h, then at frequency ω, thespectral density f (ω) can be calculated as:

f (ω) =∑h=∞h=−∞ γ (h)e−2π iωh

We concerned about the frequency between 0 and 1/2. If thefrequency f corresponds to the maximum spectrum, then our domi-nant cycle will be 1/f. The periodogram graph shows the dominantcycle

3.3 Temporal Density PredictionThis section describes the training and testing period windows,prediction models and assessment.

3.3.1 Prediction Testing and Training Windows. We have triedall possible combinations of training modeling weeks and testingpredictions. For example, One week of training model, and the nextone week of predictions. For example, the next figure is the first-week training models; It will be tested for its ability to predict thenext week, 2weeks,...5 weeks.

After that, the training data become two weeks, and its abilityto predict the next period is tested for the next week, the next2weeks. The number of prediction operations for each buildingfor each semester will be: We will have 15 tries when we use oneweek for modeling, and 10 when using two weeks. In total, wewill have 35 tries for each building for each semester. We have 103buildings in fall and summer and 100 buildings in Spring. In fact,there are 32130 predictions operations for each model that havebeen completed. Each operation predicts hour by hour for a periodrange from one week to five weeks. Since we are comparing fourforecasting prediction models, 128,520 predictions operations areevaluated. They contain density predictions for 539,784,000 hours(4200 hours for each model for each building for each semester).

3.3.2 Prediction Models. Four seasonal predictive models areused to model hour by hour density and predict the future density:Seasonal Naive, Holt-Winters (additive seasonal), auto ARIMA andTBATS. The following paragraphs will describe them briefly. Moredetail can be found in [12].Seasonal Naive: the predicted value is equal to the last value from

Page 4: Location-Centric Analysis For Advanced Indoor Mobility ...qathrady/reports/BuildingsDensityAnalysis.pdf · the modeling process. •The density distribution of the population in different

the same season.

yT+h |T = yT+h−km (1)

wherem is the seasonal period and k = ⌊ h−1m ⌋+1. In case of densitystudy, a power spectrum analysis to hour by hour density revealsthat one day is the dominant cycle in most of the buildings. As aresult, one day is considered as a season in the model.

Holt-Winter Seasonal Model: it comprises the forecast equationand three smoothing equations: one for the level ℓt , one for trendbt ,and one for the seasonal component denoted by st , with smoothingparameters α , β∗ and γ . m is used to denote the period of theseasonality In this study, we used the additive seasonal method sincethe seasonal variation is roughly constant. The seasonal componentis expressed in absolute terms in the scale of the observed series, andin the level equation, the series is seasonally adjusted by subtractingthe seasonal component. The additive method components can beexpressed as:

yt+h |t = ℓt + hbt + st−m+h+mℓt = α (yt − st−m ) + (1 − α ) (ℓt−1 + bt−1)

bt = β∗ (ℓt − ℓt−1) + (1 − β∗)bt−1st = γ (yt − ℓt−1 − bt−1) + (1 − γ )st−m ,

(2)

where h+m = ⌊(h − 1) mod m ⌋ +1. The level equation shows aweighted average between the seasonally adjusted observation(yt -st−m ) and the non-seasonal forecast (ℓt−1 + bt−1) for time t . Theseasonal equation shows a weighted average between the currentseasonal index, (yt - ℓt - bt−1) , and the seasonal index of the sameseason (i.e.,m time periods ago). More detail about this model canbe found in [12].TBATS Model: It is Trigonometric regressor with Box Cox Transfor-mations, ARMA errors, T rend and Seasonality. It is used to modelseries exhibiting multiple complex seasonalities [8]. It uses a combi-nation of Fourier terms with an exponential smoothing state spacemodel and a Box-Cox transformation, in a completely automatedmanner. As with any automated modeling framework, there maybe cases where it gives poor results, but it can be a useful approachin some circumstances. A TBATS model differs from the dynamicharmonic regression in that the seasonality is allowed to changeslowly over time in a TBATS model, while harmonic regressionterms force the seasonal patterns to repeat periodically withoutchanging. One drawback of TBATS models, however, is that theycan be slow to estimate, especially with longtime seriesARIMA Model: It stands for Autoregressive Integrated Moving Av-erage. It aims to describe the autocorrelations in data. A seasonalARIMA model is formed by including additional seasonal terms inthe ARIMA models. It is written as follows:ARIMA (p,d,q)(P ,D,Q )m, where m= number of periods per season. The uppercase notationis used for the seasonal parts of the model, and lowercase notationfor the non-seasonal parts of the model. The seasonal part of themodel consists of terms that are very similar to the non-seasonalcomponents of the model, but they involve backshift of the seasonalperiod. For example, d is the order of first differencing, and D isthe order of seasonal differencing. Autoregressive AR(p) impliescurrent values depend on its p-previous values. Moving average

MA(q) means the current deviation from the mean depends onq-previous deviations, where q is the order ofMA process.

In this part, we used auto.arima model [13], which return thebest ARIMA model according to information criterion (AIC), butit is not necessary to be the best in term of prediction error. Theorder of differencing d is based on the KPSS test, while the order ofseasonal differencing is based on OCSB test.

3.3.3 Prediction Assessments. The mean absolute error havebeeen usedwidely for prediction assessment.MAE can be computedas:

MAE = T−1T∑t=1|yt − yt |, (3)

where yt denote the tth hour density and yt denote its predictionvalues, where yt ≥ 0. However,MAE is scale dependent. Therefore,the mean error is not a useful metric to compare between differentbuildings with different density. To makeMAE scale independent,theMAE is then divided by the mean density per hour in the testingperiod.

3.4 The density and encounter eventsThe encounter is critical events since they give the opportunityfor sedimentation of messages or even infections. To demonstratethe importance of considering density, we computed the numberof encounter events that occurred at each hour. Then, the relationbetween encounter and density are computed for each daily andhourly time series data.

0e+00

1e+07

2e+07

3e+07

0 10 20 30 40 50day

Spectrum

Figure 2: Spectrum Analysis of day by day data in CISE

0.0e+00

5.0e+06

1.0e+07

1.5e+07

0 250

500

750

1000

1250

hour

Spectrum

24-hourcycle

Figure 3: Spectrum Analysis of hour by hour data in CISE

3.5 The density distributionsTo investigate the density distributions, we investigate the numberof users that have associatedwith an access point through a day. Sev-eral distributions are used to fit the data. Then, eleven distributions

Page 5: Location-Centric Analysis For Advanced Indoor Mobility ...qathrady/reports/BuildingsDensityAnalysis.pdf · the modeling process. •The density distribution of the population in different

are used to fit the density using the maximum likelihood methods.The distributions are: Power-law (Pl), Weibull(W), gamma (G), log-normal (Ln), Pareto(Pr), Normal(N), Exponential(Ex), Uniform(U),Cauchy(C), Beta(B) and log-logistic(LG). The Kolmogorov-Smirnovstatistic is used to evaluate the fitted distributions. The three best fitis selected. Also, the percentage of distributions that have KS-statless than 0.075 or 0.10 are reported.

4 RESULT AND DISCUSSIONThis section shows the spectrum analysis result, hour-by- hour pre-dictions, density distributions and temporal density and encountercorrelations.

4.1 Spectrum analysis of temporal densityTo discover the cycle in the density, we analyze its frequency do-main as prescribed in section 3. This work used [7] to analyze thespectrum of buildings density. The daily cycle is shown to be oneweek in most of the building, with exception to a few buildings.However, all kinds of buildings have a 24-hour cycle. Figure 2 showsthe daily spectrum CSE building and figure 3 shows the hourly spec-trum of the same building. Most of the other buildings have similarspectrum analysis result, where 24 hour is the dominant cycle inhour by hour data and seven days for daily density data.

4.2 Prediction ModelsThis section discusses the density predictions from several dimen-sions: The prediction models, the predicted period length (testingdata size), the training model length. Figure 4 shows the mean Nor-malizedMAE for predicting density per hour for one, two, three andfour weeks using different lengths of training models during FallSemester, using different density algorithms, Spring and Summershows similar trends too. Interestingly, the mean error across allbuildings shows the seasonal naive is the best to model and predictthe temporal hour by hour data, since it predicts with less errorpercentage leq 0.3 on average. Also, the training period lengthdoes not affect the seasonal naive result as much as other mod-els. TBATS, Seasonal ARIMA, and Holt-Winter Additive Seasonalimproved when the training data period increases. For instance,TBATS specifically improved when using five weeks to predict thenext week and slightly outperforms the Seasonal Naive. Figure 5shows the result of predicting the next five weeks using one weekof data in the training models. Also, seasonal naive outperformedother models when predicting the hour by hour density for a week.The result shows it represents the most cases. However, there areexceptional cases when seasonal naive is not the best, especiallywhen having complex seasonality where TBATS outperform Sea-sonal Naive. The result in this section summarizes the case in mostof the buildings. In future, we will examine the exceptional cases,and categorize the buildings based on their best prediction models.

Note that, to remove outliers, we exclude the errors that are morethan 1, which turn out to be rare cases.

4.3 Density DistributionsTo investigate the density distributions, we consider buildings thathave more than 20 Access point during a day in summer wherebuildings are occupied the most. The distributions of users among

the access points in a day are investigated. The buildings have adense population in that day, more than most of the days in thetrace. Several distributions are used to fit the density using the max-imum likelihood methods. The distributions are: Power-law (Pl),Weibull(W), gamma (G), lognormal (Ln), Pareto(Pr), Normal(N), Ex-ponential(Ex), Uniform(U), Cauchy(C), Beta(B) and log-logistic(LG).The Kolmogorov-Smirnov statistic is used to evaluate the fitteddistributions. The three best fit are selected and presented in ta-ble 4. Interestingly, the power-law is only selected as the best fitfrom 29% of the buildings in spring to 35% of buildings in summer,other distributions such as log-logistic have been selected as thebest fit distribution for several buildings. Besides, the percentage ofbuildings that have KS-stat less than 0.10 in their fitted log-logisticdistributions are 53% in Spring, while 29% only are power-law.This result is significant since the previous investigation at thecampus level or park level have found that the power law densitydistributions are the best fit distribution for the population density.However, the data at the building level do not follow the sameresult. Example of the buildings density distribution is Florida Gym.It consists of 48 Access points and more than 2.6K reports. Thethree best-fit distributions do not include power-law distribution.Figure 6 shows the empirical and theoretical CDFs of the FloridaGym. The parameters and the KS-tests are listed in table 4. In thefuture, we plan to investigate the density distributions for each dayin the whole period, and then investigate the relation of the bestfit distribution in a building with other factors such as the level ofbuilding density and building category.

4.4 Correlation Between Temporal Density andEncounters

The encounter per hour and density per hour data are measured. Asexpected, in most cases, the encounter rate is following the densityrate as their correlations are strongly positive. Figure 7 and 8 showthe result of density and encounter correlation in different buildingscategories.

5 CONCLUSIONS AND FUTUREWORKWe have analyzed a real trace data at the building level towardseeking a better indoor mobility model and smart applications. Wehave selected buildings from various categories and have focusedon density, as a critical parameter for indoor modeling and inves-tigated hour density prediction and spatial density distributionin the buildings. The density is covered as an individual locationpattern. The paper findings are beneficial for IoT applications thatrequire information about density, relies on the crowd and affectedby the crowd number. This study is built with the purpose of creat-ing an advanced indoor mobility model that represents the mostimportant observations, and enable accurate simulation of smartindoor applications by recreating real scenarios of population den-sity. The mobility model will combine the statistical data from realtraces with other contextual information such as buildings layout,constraints and vertical movement between floors.

ACKNOWLEDGMENTSThis work was partially funded by Najran University, Saudi Arabia,and NSF 1320694.

Page 6: Location-Centric Analysis For Advanced Indoor Mobility ...qathrady/reports/BuildingsDensityAnalysis.pdf · the modeling process. •The density distribution of the population in different

WeeksofTrainingData WeeksofTrainingData WeeksofTrainingData WeeksofTrainingData

PredictingOneWeek PredictingTwoWeeks PredictingThreeWeeks PredictingFourWeeks

Weeks of Training Data

Mea

n N

orm

aliz

ed M

AE

1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0 Seasonal Naive

Holt-Winter-SeasonalTBATSARIMA-Seasonal

Weeks of Training Data

1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

Weeks of Training Data

1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Weeks of Training Data

1 2

0.0

0.2

0.4

0.6

0.8

1.0

Figure 4: NormalizedMAEof predictions for hour byhour density for various length ofweeks using different length of trainingdata (Fall Semester)

Table 2: Percentage of the Best Prediction Model Per Semester, S: Seasonal Naive, T:TBATS, Ar: Arima, H:Holt-Winter

Training Predicted Period LengthData Length Semester One Two Three Four Five

Spring S:[52%], T:[36%] S:[57%], T:[37%] S:[53%], T:[43%] S:[58%], T:[40%] S:[57%], T:[38%]One Summer S:[47%], T:[32%] S:[58%], T:[32%] S:[66%], T:[27%] S:[68%], T:[27%] S:[73%], T:[20%]

Fall S:[56%], T:[37%] S:[59%], T:[36%] S:[59%], T:[37%] S:[56%], H:[41%] S:[63%], H:[35%]Spring T:[46%], A:[28%] T:[43%], A:[29%] T:[45%], A:[28%] T:[50%], S:[25%] -

Two Summer T:[34%], S:[28%] S:[33%], T:[33%] S:[33%], T:[33%] T:[40%], H:[37%] -Fall T:[40%], A:[27%] T:[44%], S:[28%] T:[47%], S:[31%] T:[52%], S:[33%] -

Spring T:[58%], A:[17%] T:[58%], S:[18%] T:[66%], S:[12%] - -Three Summer T:[39%], S:[24%] T:[37%], S:[35%] T:[40%], S:[33%] - -

Fall T:[50%], S:[26%] T:[56%], S:[24%] T:[58%], S:[23%] - -Spring T:[66%], S:[15%] T:[72%], S:[16%] - - -

Four Summer T:[40%], S:[25%] T:[47%], S:[33%] - - -Fall T:[61%], S:[18%] T:[57%], S:[25%] - - -

Spring T:[54%], H:[22%] - - - -Five Summer T:[43%], S:[23%] - - - -

Fall T:[60%], S:[15%] - - - -

Table 3: Modeling empirical density distributions of population density in 23 buildings.W:Weibull, Pl: Power Law, G: Gamma,Ll: Log Logistic, Pr: Pareto, Ln: Log Normal, N: Normal, B: Beta, E: Exponential

Semester 1st best Fit 2nd best Fit 3rd best Fit ≤7.5%[KS_stat] ≤10% [KS_stat]Fall Pl[31%], Ll[31%], W[15%] Ln[26%], Ll[21%], Pl[11%] G[26%], W[21%], Ll[16%] Ll[5%] Pl[31%], W[%26], C[%16],

C[10%] W[11%], G[11%], C[11%] Pl[11%], C[11%] Ll[%16], G[11%], Ln[11%]Summer Pl[35%], Ll[29%], Ln[18%] Ll[24%], C[18%], Pl[18%] W[29%], Ll[29%], G[12%] Ll[18%], Pl[12%] Ll[%41], Pl[29%], Ln[%29],

W[12%] W[12%], G[12%], Ln[12%] Ln[12%], E[12%] Ln[12%] G[%29], W[%24], C[%12]Spring Pl[29%], Ll[18%], Ln[18%] Ll[41%], G[18%], Ln[18%] Ln[24%], Ll[24%], Pl[24%] Ll[11%] Ll[%53], G[%35], Ln[%35]

G[12%] Pl[12%] W[12%], G[12%] , Pl[29%], W[11%]

REFERENCES[1] 2016. 16 mobile market statistics you should know in 2016. (2016).

https://deviceatlas.com/blog/16-mobile-market-statistics-you-should-know-2016.

[2] 2017. UF Campus classrooms. (2017). https://classrooms.at.ufl.edu/classroom-info/pictures-and-info/prettyPhoto.

[3] 2017. UFNamed Facilities. (2017). https://www.uff.ufl.edu/Facilities/ListingName.asp.[4] 207. UF Campus Map. (207). https://campusmap.ufl.edu/.[5] Mimonah Al Qathrady and Ahmed Helmy. 2017. Location-Centric Flow Flux for

Improved Indoor Mobility Models. In Proceedings INFOCOM. IEEE.

[6] Babak Alipour, Leonardo Tonetto, Aaron Yi Ding, Roozbeh Ketabi, Joerg Ott, andAhmed Helmy. 2018. Flutes vs. Cellos: Analyzing Mobility-Traffic Correlationsin Large WLAN Traces. In To appear in IEEE INFOCOM 2018. IEEE.

[7] Peter J Brockwell and Richard A Davis. 2013. Time series: theory and methods.Springer Science & Business Media.

[8] Alysha M De Livera, Rob J Hyndman, and Ralph D Snyder. 2011. Forecasting timeseries with complex seasonal patterns using exponential smoothing. J. Amer.Statist. Assoc. 106, 496 (2011), 1513–1527.

[9] Danielle Lopes Ferreira, Bruno AA Nunes, and Katia Obraczka. 2014. On theheavy tail properties of spatial node density for realistic mobility modeling. InSensing, Communication, and Networking (SECON), 2014 Eleventh Annual IEEEInternational Conference on. IEEE, 504–512.

Page 7: Location-Centric Analysis For Advanced Indoor Mobility ...qathrady/reports/BuildingsDensityAnalysis.pdf · the modeling process. •The density distribution of the population in different

0.25

0.50

0.75

Fall Spring SummerSemester

NMAE

Model

S

H

T

A

Figure 5: Normalized MAE of predictions for hour by hourdensity for five weeks using one week. For Fall, Spring andSummer

Table 4: Prameters and KS-statistics for Florida Gym best fitdensity distirubtions.

Distributions Parameters KS-stat1st fit Log-logistic Shape=2.382299 0.08265504

scale=19.2981082nd fit Log-normal meanLog=2.9636519 0.10329857

sdLog=0.75995393rd fit Gamma Scale=14.215032 0.11363913

Shape=1.833448

0 20 40 60 80 100 120

0.0

0.2

0.4

0.6

0.8

1.0

Empirical and theoretical CDFs

data

CD

F

Log−LogisticLog−NormalGamma

Figure 6: Florida Gym three best population density fit dis-tributions: 1st: Log Logistic, 2nd: Log Normal, 3rd is Gamma.Models parameters and KS-stat are listed in Table 4

[10] Wei-jen Hsu and Ahmed Helmy. 2010. On nodal encounter patterns in wirelessLAN traces. IEEE Transactions on Mobile Computing 9, 11 (2010), 1563–1577.

[11] Wei-Jen Hsu, Thrasyvoulos Spyropoulos, Konstantinos Psounis, and AhmedHelmy. 2009. Modeling spatial and temporal dependencies of user mobility inwireless mobile networks. IEEE/ACM Transactions on Networking (ToN) (2009).

[12] Rob J Hyndman and George Athanasopoulos. 2014. Forecasting: principles andpractice. OTexts.

[13] Rob J Hyndman, Yeasmin Khandakar, et al. 2007. Automatic time series for fore-casting: the forecast package for R. Number 6/07. Monash University, Departmentof Econometrics and Business Statistics.

0.00

0.25

0.50

0.75

1.00

Academic

Administration

Dining

Health

Housing

Lab

Libraries

Meuseum

Sports

Theater

Categories

Cor

rela

tion

betw

een

daily

en

coun

ters

and

dai

ly d

ensi

ty

SemesterFallSpringSummer

Figure 7: Correlation Between Daily Density and Daily En-counter Events

0.25

0.50

0.75

1.00

Academic

Administration

Dining

Health

Housing

Lab

Libraries

Meuseum

Sports

Theater

CategoriesC

orre

latio

n be

twee

n ho

urly

en

coun

ters

and

hou

rly d

ensi

ty

SemesterFallSpringSummer

Figure 8: Correlation Between Hourly Density and HourlyEncounter Events

[14] Udayan Kumar, Jeeyoung Kim, and Ahmed Helmy. 2013. Changing patterns ofmobile network (wlan) usage: Smart-phones vs. laptops. In Wireless Communica-tions and Mobile Computing Conference (IWCMC), 2013 9th International. IEEE,1584–1589.

[15] Bruno AA Nunes and Katia Obraczka. 2011. On the invariance of spatial nodedensity for realistic mobility modeling. InMobile adhoc and sensor systems (MASS),2011 IEEE 8th International Conference on. IEEE, 322–331.

[16] Lorenz Schauer, Martin Werner, and Philipp Marcus. 2014. Estimating crowddensities and pedestrian flows using wi-fi and bluetooth. In Proceedings of the11th International Conference on Mobile and Ubiquitous Systems: Computing, Net-working and Services. ICST (Institute for Computer Sciences, Social-Informaticsand Telecommunications Engineering), 171–177.

[17] Gautam S Thakur and AhmedHelmy. 2013. COBRA: A framework for the analysisof realistic mobility models. In INFOCOM, 2013. IEEE.

[18] Jungkeun Yoon, Brian D Noble, Mingyan Liu, and Minkyong Kim. 2006. Buildingrealistic mobility models from coarse-grained traces. In Proceedings of the 4thinternational conference on Mobile systems, applications and services. ACM, 177–190.