Crowd-Sourced Web Survey for Household Travel...

23
Crowd-Sourced Web Survey for Household Travel Diaries Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal * Department of Civil Engineering Indian Institute of Technology (IIT) Roorkee, Roorkee-247667, India Abstract Collecting travel data in the field is always a challenging task. It’s equally burdensome to respondents if the data is collected using face-to-face personal interview or self com- pletion surveys. To reduce the burden on respondent and to collect the time stamps and locations precisely, a few fully automated survey approaches are proposed with limited success mainly if required sample rate is higher in a large-urban agglomeration. This study presents an open-source, web-based, self-completion and/or personal-interview sur- vey platform, namely Travel Survey as a Service (TSaaS) which currently hosts two dif- ferent survey types. This study proposes to use the TSaaS platform as the crowd-sourced data collection approach for household travel diaries. The TSaaS provides flexibility to conduct multiple surveys for different purposes/locations simultaneously using a web- survey format. For better control of the data collection process, multiple survey links for household travel diaries (or any other survey) in a region can be created and eventually, collected data can be processed jointly or separately as per the requirements. The data is recorded in an efficient data structure. The data is recorded mainly in three tables, which are family, member and trip information. Personal information and location are neither asked nor tracked using devices or otherwise. To assist in recalling the activity locations, a location-search field is provided and integrated with a map. The permanent address, trip origin and destination are recorded as nearest landmark on the map and the location is shown as a marker on the map. The marker on the map can be adjust to correct the location if required. A pilot study was conducted in Jaipur and three dif- ferent data collection approaches is attempted. The approaches are compared in terms of survey completion rate, survey completion time and time-cost of each approach. The crowd-sourced web-survey turn out to be the most efficient in terms of the time-cost per completed survey record and most suitable to collect the large number of survey records in an urban agglomeration. Keywords: travel survey, household survey, trip diaries, activity-trip chain, person trip survey, crowd-sourced web survey 1. Introduction 1 Reliable traffic information is a key factor for effective planning, operation and man- 2 agement of traffic. In general, such information is collected using various travel surveys. 3 * Corresponding author Email address: [email protected] (Amit Agarwal) 1

Transcript of Crowd-Sourced Web Survey for Household Travel...

Page 1: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

Crowd-Sourced Web Survey for Household Travel

Diaries

Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal∗

Department of Civil EngineeringIndian Institute of Technology (IIT) Roorkee, Roorkee-247667, India

Abstract

Collecting travel data in the field is always a challenging task. It’s equally burdensometo respondents if the data is collected using face-to-face personal interview or self com-pletion surveys. To reduce the burden on respondent and to collect the time stamps andlocations precisely, a few fully automated survey approaches are proposed with limitedsuccess mainly if required sample rate is higher in a large-urban agglomeration. Thisstudy presents an open-source, web-based, self-completion and/or personal-interview sur-vey platform, namely Travel Survey as a Service (TSaaS) which currently hosts two dif-ferent survey types. This study proposes to use the TSaaS platform as the crowd-sourceddata collection approach for household travel diaries. The TSaaS provides flexibility toconduct multiple surveys for different purposes/locations simultaneously using a web-survey format. For better control of the data collection process, multiple survey links forhousehold travel diaries (or any other survey) in a region can be created and eventually,collected data can be processed jointly or separately as per the requirements. The datais recorded in an efficient data structure. The data is recorded mainly in three tables,which are family, member and trip information. Personal information and location areneither asked nor tracked using devices or otherwise. To assist in recalling the activitylocations, a location-search field is provided and integrated with a map. The permanentaddress, trip origin and destination are recorded as nearest landmark on the map andthe location is shown as a marker on the map. The marker on the map can be adjustto correct the location if required. A pilot study was conducted in Jaipur and three dif-ferent data collection approaches is attempted. The approaches are compared in termsof survey completion rate, survey completion time and time-cost of each approach. Thecrowd-sourced web-survey turn out to be the most efficient in terms of the time-cost percompleted survey record and most suitable to collect the large number of survey recordsin an urban agglomeration.

Keywords: travel survey, household survey, trip diaries, activity-trip chain, person tripsurvey, crowd-sourced �web survey

1. Introduction1

Reliable traffic information is a key factor for effective planning, operation and man-2

agement of traffic. In general, such information is collected using various travel surveys.3

∗Corresponding authorEmail address: [email protected] (Amit Agarwal)

1

Page 2: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

A travel survey is a detailed investigation of the transportation system in a specific area4

and data collection exercise in which captured data reflects the real-world traffic condi-5

tions. The objectives of a travel surveys are(i) to analyze the issues and characteristics6

of existing transportation system in the study area, (ii) to quantify the spatio-temporal7

variations, (ii) to assess the potential for future development/extensions, etc.8

A few examples of popular travel/traffic surveys are inventory of network, person trip9

survey, vehicle count survey, turning movement counts, origin-destination survey, travel10

speed survey, parking survey etc. In past, majority of the surveys were manual and11

involved high usages of pen-paper. With advancement in technology, pen-paper based12

surveys are replaced by video-graphic surveys, web/app based surveys, global position-13

ing systems (GPS) based surveys, mobile-phone based etc. However, many developing14

countries are still relying mainly on pen-paper based surveys. A few surveys need hu-15

man inputs to collect variety of traffic data; the data can be collected at home, on-site,16

during trip etc. Based on approach to collect the data, surveys are categorized as per-17

sonal interview based survey, postal survey, telephonic surveys, application based survey18

etc. These surveys are required to study the travel behavior, regional transport model,19

travel demand, origin-destination survey, etc. A few other surveys which don’t need in-20

puts by respondents are classified vehicle counts at an intersection, at mid-block section,21

license plate surveys, transport facility surveys etc. In recent years, image-processing,22

sensor/GPS based surveys are becoming more common for many of these surveys.23

In the last couple of decades, there has been a sharp increase in travel demand in par-24

allel with economic growth. Given the complex transportation systems and large urban25

transportation networks, various analytical and/or simulation models are developed for26

traffic modeling, planning, and analysis (e.g. activity-based models, agent-based models27

etc.). These simulation models are data intensive i.e. variety of data is collected to syn-28

thesize/generate the scenario, calibrate and validate the models Agarwal (2017); Agarwal29

et al. (2019). Typically, socio-economic and travel (origin, destination, trip purpose, travel30

mode, trip length, etc.) characteristics of households are required. Such data is included31

in person-trip diary or household survey.32

The state-of-the-art approach for trip diary survey is manual and error-prone. Typ-33

ically, the data is recorded using pen-paper, collected on the site and data-entry is pro-34

cessed afterward for suitable use. With time, based on the need, these approaches are35

extended/improved by telephonic survey, postal survey, smartphone-based survey etc.36

Online and smartphone-based surveys are becoming more common due to lower cost,37

convenience and ease of access to internet. The present study presents a comparison38

of past survey techniques to collect person-trip diaries and propose an crowd-sourced39

web-based survey to collect the activity trip-chain diaries. For this, first an open-source,40

web-survey travel survey platform is proposed which is suitable for self-completion and/or41

person-interview.42

To begin, this study provides a review of the existing literature related to the different43

traffic survey techniques and lists the limitations in Section 2. Section 3 discusses the two44

countermeasures in support of the proposed approach and the ideas to conduct crowd-45

source web-survey. Section 4 presents and demonstrates the travel survey platform in46

details. Section 5 presents the pilot study in Jaipur and the results of the study. Finally,47

the study is concluded in Section 6.48

2

Page 3: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

2. Literature Review49

In past, use of Pen (or pencil) paper for various traffic surveys is a common technique50

to collect the traffic characteristics, trip patterns etc. Hurst (1969); McClintock (1927). In51

such methods, a surveyor stands on the road side/transit stop and collect the information52

by observing (e.g. counting vehicles, passengers etc.). For person-trip diary surveys, an53

interviewer has to go door-to-door to collect the travel information or to interview a54

respondent at the intercept points along major roadways, transit routes Griffiths et al.55

(2000). Even today, personal interview surveys are used at many places because an56

interviewer (i) can explain/reformulate the unclear part (ii) use maps, pictures to make57

them understand, (iii) can translate the questionnaire in regional/local language, (iv)58

can fill out the questionnaire for the users who are unable to complete on their own59

etc. Zalewski et al. (2019).60

In 1980’s and early 1990’s, use of telephones for collecting the data has started to61

become popular Hitlin et al. (1987). For this, an interviewer is trained so that he/she62

can explain the aim of research, design of questionnaire and importance of data collection63

over a phone call. The responses are recorded by the interviewers in the desired format64

Richardson et al. (1995). Compared to face-to-face personal interviews, telephonic surveys65

(i) can maintain anonymity of the respondent, (ii) higher geographical coverage, (iii) are66

efficient in terms of costs and benefits Ampt (1989); Richardson et al. (1995). However,67

telephonic surveys lack in visual support which decreases the trust between interviewer68

and respondent. Additionally, it is limited to a short duration surveys and likely to have69

survey bias. Thus, a drop in the response rate of telephonic surveys is reported.70

Given the cost involved with person interview survey, self completion questionnaire71

surveys became apparent. These are the surveys, a respondent completes without as-72

sistance of an interviewer. In these survey, the questionnaire is delivered to respondent73

by mail or by post and then after completing, respondent mail it back or it is collected74

from respondent Richardson et al. (1995). Such surveys are about 3.5 times cheaper per75

completed survey than telephonic survey Hitlin et al. (1987). However, the lower response76

rate of self completion survey leads to higher cost per returned questionnaire.77

In 1990’s and 2000’s, computer administered interviews started to gain momentum78

over face-to face interviews and telephonic surveys. The self completion surveys (e.g.79

mail-back) was replaced with computer-assisted telephone interviews (CATI), personal80

interviews (CAPI). Use of computer-assisted data collection approach for personal inter-81

views (i) reduces time required to complete the survey (ii) improves the data quality by82

validating the data for possible errors during entry and (iii) saves time in data-entry and83

thus reduces costs Gravlee (2002). It also facilitates more complex questionnaire designs84

than pen-paper survey.85

With the increasing use of internet, use of web-surveys (also known as internet-survey)86

become prevalent in which the questionnaire is sent primarily over the internet. Compared87

to other data collection approaches, the main advantages of web-survey are (a) the low cost88

(b) greater potential to engage and interact with the participants and (c) automated data89

collection, etc. (Greaves et al., 2015; Bourbonnais and Morency, 2013). Auld et al. (2009)90

presents a web-survey for household travel survey with lesser chances of under-reporting91

of activities and trips. In order to get the accurate location and times, GPS logger is92

used. Auld et al. (2012) proposes a web-survey to record the responses of the users in93

a hypothetical emergencies which vary in terms of size, hazard level, time of day, etc.94

with no-notice. Similarly, Greaves et al. (2015) presents development and deployment of95

a web-based travel diary and optional-smartphone app to collect the travel data in inner-96

3

Page 4: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

city Sydney. Clearly, the continuous tracking increases the accuracy of trip reporting97

and efficiency in data-feeding, it also has serious privacy concerns, gaps in the GPS logs98

inside dense urban areas. Similarly, Kazemzadeh et al. (2020) uses web-survey and in-field99

personal interview survey to study the perception of cyclists. The users are more positive100

and optimistic when answering web-based questionnaire. Naturally, at operational point101

of view, the web-survey is more comfortable compared to the in-field personal interview102

surveys.103

During early introduction of smartphones, use of handheld devices (e.g. Tablet PCs,104

iPAD, smartphones etc.) became a common trend. In the beginning, it was personal105

interview type in which an electronic questionnaire was filled in front of a respondent on106

the site, these are called computer-aided personal interview (CAPI) Sowa et al. (2015).107

With the increasing use of the Internet, online questionnaires have become a popular108

way of collecting information. Computer-assisted-self-interview (CASI) or self-completion109

techniques are gaining popularity. In this approach, respondents directly input their110

responses in the devices. The associated softwares in the devices can play recorded audio111

voice-overs, can show graphics for better understanding of the surveys. The recorded112

data is directly available for further processing. Similarly, online questionnaires are a113

sub-set of a wider-range of online research methods. For instance, computer-assisted web114

interviewing (CAWI) is an internet surveying technique in which the interviewee follows115

a script provided in form of a website Sowa et al. (2015). In short, a questionnaire is116

created as a program for the web interviews. It consist of pictures, audio and video clips,117

links to other web sites etc. The flow of questions is designed based on the responses118

and existing information in the questionnaire. Major advantages of the computer-aided119

surveys (e.g. CAWI, CAPI, CASI) are (i) reduces costs and required human resources,120

(ii) reduces burden on respondents (iii) maintain anonymity, privacy provided location is121

not tracked and personal information is not collected, etc. Brown et al. (2008); Bayart122

and Bonnel (2015). Further, CASI are cheaper than CAPI because it does not require123

handheld devices (e.g. smartphones, tablets etc.). On the opposite side, such surveys are124

biased because these surveys are restricted to a particular segment of population which125

have access to such devices and internet Mol (2017).126

Improvements in remote sensing Technologies such as vehicle instrumentation, GPS127

and their integration with geographic information system (GIS) database, offer lot of128

opportunities to enhance the detail and accuracy of the data collected by travel surveys129

in 21st century Griffiths et al. (2000). In order to accurately identify the location of130

origin-destination, use of global positioning systems (GPS) is very helpful Wolf et al.131

(1999). In this case, the locations of the travelers are continuously tracked using GPS132

of the standalone device (Auld et al., 2009) or integrated with smartphone (Hood et al.,133

2011; Stipancic et al., 2017). Thus, it has ability to gather the data streams of individual134

traveler’s trajectories throughout the day. Together with the time stamp from the de-135

vices, not only locations, but trip times, duration can also be recorded accurately. Thus,136

more reliable data can be collected by reducing the response time and cost of the survey137

significantly Mol (2017); Prelipcean et al. (2018). In fully automated data collection ap-138

proach, some information (e.g. trip purpose, preferences etc.) is not explicitly available139

from the survey data. Similar to web surveys, chances of biased results are higher in these140

type of surveys. Additionally, it is a matter of concern (i) whether enough respondents141

would be comfortable to provide information about daily activities and precise locations142

and (ii) travelers may change their travel behavior under the impression of being tracked143

Griffiths et al. (2000). Use of GPS technology in the survey increases the burden on the144

4

Page 5: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

respondent in terms of significant battery depletion and cost of internet for transferring145

the recorded data over an interval Safi et al. (2013). Lee et al. (2016); Rieser-Schussler146

(2012) present a literature review of the emerging data collection techniques such as the147

mobile-positioning system, GPS and Bluetooth re-identification, automatic number plate148

recognition, technologies for travel demand modeling. However, the practical applications149

of these technologies are very limited (Lee et al., 2016).150

Some past studies explore options to generate trip diaries using various data sources.151

For instance, smart card (automatic fare collection system) data is used to detect trip152

direction, boarding time, home locations etc. Bagchi and White (2005); Zou et al. (2016);153

Chen and Fan (2018). In the similar direction, call data records (CDR) can also be used154

to reproduce trips in an urban area Colak et al. (2015); Zilske and Nagel (2014). Vehicle155

occupancy can be evaluated by detecting the WiFi devices using wireless routers Gore156

et al. (2019) and, with the help of link counts and Bluetooth data, origin-destination157

(OD) matrix can be estimated Michau et al. (2019). Some other techniques may not be158

used directly to synthesize the trip diaries, however to validate the model. For instance,159

Prajapati et al. (2020) presents a computer-vision techniques which records the number160

of vehicles and their trajectories under mixed traffic conditions. However, due to various161

reasons (e.g. privacy, permissions, availability of the data), these advanced technologies162

cannot be used everywhere and typical trip diaries must be collected using one of the163

survey approaches.164

From the foregoing discussion, it is clear that different survey techniques are used165

with a good mix of technologies and objectives. The present study focus on (i) quick166

completion of the survey (iii) a common, open-source web survey to collect the data167

(iii) use of a database to manage the survey data (iv) privacy concerns (v) consumption168

of battery (vi) assistance in recalling the activity locations, etc. Therefore, an open-169

source web survey platform is proposed to collect the data from various travel surveys170

simultaneously. This study focuses only on the development and deployment of the web-171

survey related to activity-trip chains of households. The web survey is suitable for self-172

completion as well as personal interview type approaches. The proposed survey overcomes173

the aforementioned limitations.174

3. Countermeasures175

3.1. Coverage176

As discussed in the previous section, technology is advancing progressively and use177

of computers, mobile phones (specifically smart phones) in various travel surveys is be-178

coming common. In India too, smart phones have become affordable and in reach of179

almost everyone. From 2016 data, total mobile subscriptions are about 0.96 billion for180

a population of about 1.3 billion Kanungo (2017). Out of 1.3 billion persons, roughly181

27% are under 14 and unlikely to have their personal mobile phones. This means, on an182

average every person who is older than 14 years has a mobile phone. Further, the market183

penetration of smartphones is 0.468 billion in 2017 Assocham (accesssed, 2019) i.e. every184

other person who is older than 14 years is having a smart phone. Further, use of internet185

is continuously rising due to low-rate data plans Rajkumar et al. (2016).1 This highlights186

the feasibility of better coverage using computer-aided self-completion surveys.187

1An example of data plan: it costs less than 150 Indian rupees for 28 days to make unlimited incoming,outgoing calls, 100 SMS per day and 1GB 4G data per day (July 2019).

5

Page 6: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

Figure 1: Jaipur road network (in gray), ward (zone) boundaries (in red) and locations of institutes andcolleges (in blue)

3.2. Collection of data using crowd-sourced web-based self-completion survey188

Given high market penetration of smartphones and low cost of internet, a web-based189

survey is proposed which is a combination of self-completion and person-interview surveys.190

In the former, a survey link is distributed to users and they are requested to complete the191

survey in a time-frame (typically 1-2 weeks). In order to reach out to maximum number of192

persons, the proposal is to reach out to students of various high-schools/institutes/colleges193

in a city and each student will be asked to complete the activity trip chain diary of all194

members of the family. For instance, Figure 1 shows locations of the institutes/colleges195

in Jaipur city. It is highly likely that every student in these institutes/colleges carry a196

smartphone, if not, students can carry the QR (Quick Response) code with them and197

complete the survey at home with the available devices. Alternatively, the same ap-198

proach is also applicable to secondary and senior-secondary schools however, in this case,199

students are unlikely to have a smart-phone. Therefore, students of the schools are ex-200

pected to come to the computer-laboratories and complete the survey for all members201

of the family. Clearly, in this approach, households in which no one is studying in these202

schools/institutes/colleges, are missed from the survey and can be captured using door-203

to-door personal interviews in each of the ward. For instance, in 2011 about 4500 families204

(≈0.3% of total population of Jaipur district) used to live on footpath in Jaipur Census205

(accessed, 2019) which can’t be covered using self-completion surveys.206

4. TSaaS: Travel Survey as a Service207

4.1. Overview208

The TSaaS (Travel Survey as a Survey) is an open-source platform which facilitates209

web/mobile-based self completion or personal interview type surveys. Currently, two210

type of surveys are linked with it i.e., household trip diary surveys and public transport211

survey to understand the behavior of metro users. The survey types are listed on the212

6

Page 7: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

homepage. The selection of a survey type will lead to the landing page of the survey213

type (see Figure 5(a)) and only a demo survey can be taken from here. The focus of the214

present study is to create a trip diary survey to record the activity-trip chain diaries of215

all members of the family and thus this is explained here in detail.216

The source-code of the project is hosted at GitHub2 and a demo survey can be217

started using https://tsaas.iitr.ac.in/hhs. For the present study, version ‘v0 2’ is used.The218

recorded data is saved on a secure server in JSON (JavaScript Object Notation) format.219

The design of the database in back-end is demonstrated in Section 4.2 and the used220

terminology is explained in Table 1.221

Table 1: Terminology for the household trip diary under TSaaS

term description

admin a person who controls the back-end admin panel

respondent a person who enters responses in the survey

surveyor a person who is doing survey (e.g. door-to-door)

supervisor a person who is supervising the group surveys

survey type survey with different objectives (e.g. household travel, public transport)

survey format predefined survey questionnaire for each survey type

4.2. Design of the database222

Technical details of the back-end In the back-end, Django3 is used which is open-223

source, has a clean Pythonic structure, follows a Model-View-Template (MVT) architec-224

ture, has a built-in admin panel and is capable of handling heavy traffic seamlessly. The225

admin panel is customized for easy monitoring and overview of trip profiles and facilitated226

with custom filters for quick overview of the recorded data. To record the travel diaries for227

multiple persons simultaneously in an urban agglomeration, a database which can handle228

a range of workloads, from single machines to many concurrent users is required. For229

this, various relational databases such as SQLite, PostgreSQL, MySQL and Oracle whose230

application data can interact with the default object-relational mapping layer (ORM)231

are compared and eventually, PostgreSQL is used. PostgreSQL is chosen because it is a232

powerful, open-source, object-relational database system which is reliable, robust and has233

good performance4. To make the database bootstrapping easier for testing, SQLite is used234

which is inbuilt with Python. It is a C-language library that implements a small, fast,235

self-contained, high-reliability, full-featured, SQL database engine5. In short, SQLite for236

local development and PostgreSQL in production are used. Two setting files – ‘produc-237

tion settings.py’ and ‘local setting.py – are incorporated for production and, testing and238

development respectively. A REpresentational State Transfer (REST), software architec-239

tural style is followed and a RESTful API is created using the Django-rest framework.6240

The web APIs allow web systems to request information from the database or create a241

2See https://github.com/teg-iitr/tsaas-frontend for front-end and https://github.com/teg-iitr/tsaas-backend for back-end of the TSaaS project.

3https://www.djangoproject.com/4https://www.postgresql.org/5https://www.sqlite.com/index.html6https://www.django-rest-framework.org/

7

Page 8: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

SurveyType

surveyTypeID

surveyFormat

surveyURL

SurveyList

surveyID

surveyType

Respo

nseTime

responseTimeID

surveyStartTime

surveyEn

dTime

surveyID

Family

familyID

collegeID

surveyID

noOfMem

bers

curre

ntCount

noOfCars

noOfCycles

noOfTwoW

heelers

familyIncome

country

homeS

tate

landmark

lat

lng

nameO

fDistrict

CollegeList

collegeID

collegeNam

e

collegeURL

constrainField

surveyTypeID

Mem

ber

familyID

mem

berID

created_at

gender

age

educationalQualification

monthlyIncome

maritialStatus

differentlyAb

led

principalSourceofIncome

stayAtHom

e

householdH

ead

respondent

twoW

heelerLicense

simCards

fourWheelerLicense

dataWhileDriving

bluetooth

wifi

Trip

tripID

mem

berID

Orig

inDestin

ation

tripID

originDestinationID

originLandmark

originLat

originLng

originPlace

destinationLandm

ark

destinationLat

destinationLng

destinationP

lace

fare 

travelDistance

departureTime

arrivalTime

Mod

e

tripID

modeID

modeN

ame

modeIndex

Feedback

feedback

feedback_time

Fig

ure

2:T

he

rela

tion

al

data

base

des

ign

of

hou

seh

old

trip

dia

ry

8

Page 9: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

piece of new information and store it in the database. Further, Apache27 is used to set242

up a reverse proxy server at a local server.243

Relational database The design of the database is shown in Figure 2. The idea behind244

TSaaS is to provide service for multiple types of surveys from one platform efficiently.245

Important tables of the database are explained briefly.246

• In the table ‘SurveyType’, an id is auto-generated, survey format is defined and247

survey URL is created. This facilitates the use of absolute URL (e.g. https://tsaas.248

iitr.ac.in) for multiple survey types (e.g. for household trip diary surveys − https:249

//tsaas.iitr.ac.in/hhs; for public transport surveys − https://tsaas.iitr.ac.in/pts).250

• The ‘ResponseTime’ table records the survey number, survey start and end times251

of each survey irrespective of the survey type.252

• The ‘CollegeList’ is the key table which facilitates the creation of the final survey253

URLs for each survey type using the field ‘collegeURL’. For instance: in order to254

collect household trip diaries, two different URLs are designed for two different255

target groups; they are https://tsaas.iitr.ac.in/hhs/stiitr and https://tsaas.iitr.ac.256

in/hhs/civilPhd. Similarly, more such URLs can be created for each type of survey257

simultaneously using a common, predefined survey format for each survey type.258

Though, the same database is used to store the survey data of different target259

groups; the collected data can be analyzed all together or separately using the260

collage id which is common in ‘Family’ and ‘CollegeList’ tables. A short custom261

message is also recorded here which will be displayed on the landing page of the262

survey page (see bold text in Figure 5(b)). In addition to this, to shorten the survey263

response/completion time and to avoid the recording information which is not in the264

scope of the study region, a ‘constrainField’ is also used in this table. For instance,265

a city can be entered in this field which will restrict the trip information from the266

users to the defined city only. Leaving this field empty will not restrict any trips for267

any member.268

• Tables ‘Family’, ‘Member’, ‘Trip’, ‘Mode’ and ‘OriginDestination’ records the infor-269

mation as their names depict. A hierarchy is formed here i.e. all family members270

will have same family id, multiple trips of a member will have same member id and271

so on.272

• The ’Feedback’ table is not connected with any table is the database.273

4.3. Design of the front-end274

The front-end design of the household trip diary survey is shown in Figure 3. Infor-275

mation collected through each page is explained in Section 4.4. The brief details of the276

front-end design is presented in this section.277

The landing page (or home page) of TSaaS provides entry to the all available travel278

surveys on the platform. Selection of any one will lead to the demo page of each survey279

type and it is possible to take the demo survey to get the idea about the format of the280

each survey type. For the actual surveys, a survey URL is created using the admin panel281

7httpd.apache.org/

9

Page 10: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

Hom

e(TS

aaS)

Publ

ic T

rans

port

Surv

eyH

ouse

hold

Sur

vey

(HH

S)...

Fam

ilyIn

form

atio

n

Mem

ber

Info

rmat

ion

Trip

s In

form

atio

n

Fini

sh S

urve

y

Yes

Trip

Info

rmat

ion

1. O

rigin

loca

tion

2. O

rigin

Pla

ce3.

Des

tinat

ion

loca

iton

4. D

estin

atin

pla

ce5.

Dep

artu

re ti

me

6. A

rriv

al ti

me

7. S

eque

nce

of tr

avel

mod

es8.

etc

.

Add

Trip

Proc

eed

Surv

ey B

egin

s(S

urve

y ID

Allo

tted)

(Sur

vey

Star

t Tim

e no

ted)

Fam

ily In

form

atio

n Su

bmitt

ed(F

amily

ID A

llotte

d)

Doe

s th

e m

embe

r st

ay a

t hom

e?

Is tr

ip b

eyon

d st

udy

area

?

No

Com

plet

ed a

ll tr

ips

of m

embe

r?N

o

Yes

Add

ed A

ll M

embe

rs?

Yes

No

Surv

ey E

ndTi

me

upda

ted

Yes

No

Mem

ber i

nfor

mat

ion

Subm

itted

(Trip

ID A

llotte

d) Fig

ure

3:

Fro

nt-

end

des

ign

of

the

hou

seh

old

trip

dia

ry

10

Page 11: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

of back-end (see Section 4.2) and a custom message is displayed in bold on the landing282

page (see Figure 5(b)).283

As soon as the survey is started, a survey id is created at the back-end and survey284

start time is recorded. At first, the family information is asked (see Figure 6) in which285

number of members are also asked. The family data is posted to server on submitting the286

family data and a family id is assigned. On the next page member information is asked287

which is continued until current member index is same as the number of members entered288

on the family page. If a member stays at home, the member page displayed again with289

increased member counter. On submitting the member page, a member id is generated290

on back-end, member information is posted to database and trip information is displayed.291

On the trip information page, state and district of the trip are asked and checked with292

respect to the defined study area in the back-end (see ‘CollegeList’ in Section 4.2). If the293

trip is beyond the study area, next member page is displayed and counter is increased.294

This will reduce the response time of a survey. If trip is in the study area, further trip295

information is asked until a respondent clicks on ‘Proceed’ (see Figure 8(f)) and confirms296

that all trips are added for the member. This will also end the survey if all trips of the297

last member are added. With this, the survey end time will be posted to the database.298

Figure 4: Trip information to check if trip is made in the study area

4.4. Structure of TSaaS and data recording procedure299

The design of the front-end is demonstrated in Figure 3 and discussed in Section 4.3.300

The start page of the household travel survey is shown in Figure 5. The household301

travel survey is categorized mainly in three categories; they are: family, member and trip302

information. Data fields and process for each of the category is explained next.303

Family information: On the family page (see Figure 6), at first, a respondent enters304

information about number of members in the family, motorized and non-motorized vehicle305

ownership. Afterwards, he/she selects one of the categories of monthly income from the306

drop down menu items. The income categories are nil, less than 5000 |, 5000 - 10000 |,307

10000 - 50000 |, 50000 - 1 lakh |8, 1 lakh - 2 lakh |, 2 - 5 lakh | and more than 5 lakh308

| to demonstrate the distribution of income and choices they make.309

The current survey format is supported only for the locations in India however, it310

is transferred to any other country with a few changes in the source-code. Further, a311

810 lakh = 1 million

11

Page 12: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

(a) Start page for demo survey (b) Start page for IIT Roorkee

Figure 5: Landing pages for demo survey and configured for the students of IIT Roorkee.

(a) Data about the vehicle owner-ship and monthly income

(b) Permanent address (c) Integration of map for locat-ing landmark

Figure 6: Information collected through family page of TSaaS.

12

Page 13: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

respondent selects the state and district for the permanent address, enters initials for312

nearest landmark (see Figure 6(b)). This gives a list of options and one of them can be313

selected as shown in Figure 6(c). This is performed by integrating Places API by Here314

Maps9. For the selected landmark, a marker and corresponding latitude and longitude315

are shown on the map. As instructed, the respondent can adjust the marker to change316

the nearest landmark which will also change the coordinates. After clicking on ‘Submit’317

button, together with the entered information, latitude and longitude of the landmark318

are sent to the server and the member information page is displayed.319

(a) Basic information of the mem-ber

(b) Income and mobile phone re-lated information

(c) licensing and other informa-tion

Figure 7: Information collected through member page of TSaaS.

Family member information: Figure 7 shows the information collected for each mem-320

ber. On the first screen (see Figure 7(a)) socio-demographic characteristics and income321

information are required. The income categories are same as that of on family page.322

On the next screen (see Figure 7(b)), information about the number of sim cards, data323

(internet) or phone usages during driving/traveling, information about Bluetooth, WiFi324

activation are required. This information will help in identifying the market penetra-325

tion of mobile phones, internet usages and number of Bluetooth and Wifi devices on the326

road. Such information is required when (i) various sensors are used to detect the num-327

ber of devices and then generate/validates trips Gore et al. (2019) (ii) call data records328

(CDR) are used to generate/validate the trip information Colak et al. (2015). The last329

screen of the member page (see Figure 7(c)) contains only radio buttons; information330

9See https://developer.here.com/documentation/places/topics/what-is.html. As of Mar. 2020, it pro-vides about 250,000 transactions per month under ‘Freemium’ licensing.

13

Page 14: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

about two-wheeler and four-wheeler licensing are required. Since, a member is supposed331

to complete the survey for all members of the family, it is asked explicitly that which332

member is respondent and/or head of the family. This is important if data for respondent333

and/or family-head needs to be processed separately. In contrast to past studies, it is334

asked whether a member staying at home for the whole day (e.g. babies and/or old/sick335

persons) to capture the all persons and true indicator for trip rates. In case a member336

is staying at home, submit button starts the member page again for next member of the337

family otherwise it redirects to trip information page.338

(a) Trip district and landmark (b) Location of trip origin on themap

(c) Trip purpose and departuretime

(d) Travel modes (e) Trip characteristics (f) Adding and removing trip

Figure 8: Information collected through trip page of TSaaS.

Trip information: Figure 8 demonstrates the various screens for trip information. Con-339

sidering that a member can stay in a city with is not same as that of the permanent ad-340

dress. Therefore, in the beginning of trip (i.e. one time per member) page (see Figure 4),341

state and district is asked so that the search space for the landmark can be narrowed.342

Similar to permanent address, landmark can be entered for trip origin and marker can343

be adjusted on the map (see Figure 8(b)). Afterwards, type of location and departure344

time are queried. The former field will provide the trip purpose. Similar to trip origin,345

14

Page 15: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

trip destination details are entered. After trip destination, travel mode information is346

added. Since the public transport trips have access, egress travel modes and likelihood of347

the multi-modal trips are positively correlated with the income (Blumenberg and Pierce,348

2013), this study records multi-modal trips. The respondent is asked to enter the travel349

modes in the order of the usage (see Figure 8(d)). Many trip diary surveys do not in-350

clude the information about access egress modes or chain of travel modes which is likely351

to impact results of the mode choice models. Lastly on the characteristics of the trip352

(e.g. travel distance and cost) are required (see Figure 8(e)). After this, a respondent is353

supposed to click on ‘Add Trip’ button.354

Addition of nth trip will start the (n + 1)th trip and origin of (n + 1)th will be au-355

tomatically entered same as the destination of nth trip. Afterwards the same process is356

repeated for all trips. From 2nd trip onward, page will show two additional functions357

(see Figure 8(f)). A user is supposed to click ‘Proceed’ when all trips of the member are358

added. After this, a warning pops up to make sure that all the trips are added; confirming359

it, a member page is shown where next member information can be added. The survey360

terminates after adding all trips for the last member.361

4.5. Data processing and time-savings362

The TSaaS platform provides full flexibility in the design which reduces the burden363

on the respondent. These design features are briefly described here.364

• The origin of the second trip onward is auto filled as destination of the previous365

trip.366

• To recall the activity locations, a search option for nearest landmark is provided367

and the marker on the map can be adjusted to set the location. In addition to this,368

the latitude and longitude of the activity locations are directly extracted and stored369

on the server while entering the responses which saves time in the post-processing370

and reduces the chances of errors.371

• In contrast to the web survey forms such as Google Forms10, TSaaS facilitates the372

conditional fields in the form as well as auto complete. For instance, if a member’s373

age is entered as two years, education qualification, marital status, monthly income,374

source of income, sim cars, driving license, etc. fields will be auto completed. This375

reduces the response time of the survey.376

• The respondent is asked to enter the trip information only if trip is made in the377

study area.378

Similar to other web-surveys, data-feeding is not required. On TSaaS platform, the379

data is maintained in JSON format and continuously stored on a secure server. To reduce380

the chances of error in completing the survey or in data entry, except landmarks and381

number of sim cards, all other fields are drop-down or radio buttons. Additionally, only382

numbers are allowed for sim card fields and landmark is immediately displayed on the map383

which leaves negligible scope of the data-entry error. Moreover, to reduce the diversion384

of the respondent, the survey pages (Figures 6 to 8) are kept simple, free from any385

advertisement and are not showing the menu bar (shown only on the home page). Though,386

it is possible to close the survey without completing it, no explicit button to end the survey387

10https://www.google.com/forms/about/

15

Page 16: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

is provided. For household travel surveys, contrary to the Smartphone based apps, the388

proposed survey approach allows a family member can complete the chain of activity and389

trips for all members of the family. This increases the number of trips per respondent390

significantly i.e. fewer respondent are required to record the fixed number of trips in a391

study area.392

4.6. Simultaneous surveys393

In order to use the same format of a survey type for multiple objectives/locations,394

a unique identifier code is generated and concatenated at the end of URL (e.g. https:395

//tsaas.iitr.ac.in/hhs/insti/). One unique URL (with custom message on the landing396

page) can be created for different objectives or for different locations or for different target397

groups, etc. The unique feature is that for a survey type, all of these URLs can be used398

simultaneously and this can happen for multiple survey types at a time. The identifier399

code is used to retrieve the data-set from the server. The technical details of this feature400

is illustrated in Section 4.2. Based on this, the two different landing pages of the same401

surveys are shown in Figure 5. This provides flexibility to process the data separately402

for each identifier (i.e. equivalent to multiple surveys simultaneously) as well as combined403

data of all identifiers (i.e. equivalent to only one survey). Another added advantage is404

that, by analyzing data corresponding to each identifier in a survey exercise, it is possible405

to identify the source of errors and redo the survey only for that particular region or406

target group. This will also highlight if a surveyor (if using door-to-door approach) or a407

supervisor (if the person is supervising respondents in a group) is being careless or faking408

the data.409

4.7. Privacy and battery-depletion concerns410

As discussed in the literature review, fully automation surveys track the locations411

of the travelers throughout the trip and therefore, it affects the response rate (fewer412

people agree to install the application) and/or a respondent alters his/her travel behavior.413

Additionally, continuous use of GPS affects the battery significantly. The present study414

considers these aspects carefully. TSaaS neither asks/records personal information, exact415

locations nor use GPS function of the device. Hence, privacy and battery-depletion issues416

are avoided at a minimum cost of precise location and marginal burden on the respondents.417

Since, many transportation planning models require information about the origin and418

destination zones, this trade-off (use of landmark zones rather than precise locations of419

the activities) is acceptable.420

5. Pilot Study421

In order to test the performance and productivity of the TSaaS platform, a pilot study422

is planned in Jaipur city.423

5.1. Survey methods424

For the comparison, three different type of survey methodologies are tested. As dis-425

cussed in Section 4.6, various survey URLs are created using one survey format. Three426

different type of surveys are conducted from 13th Jan. to 17th Jan.427

1. Traditional web-survey: In this method, the survey URLs were sent to residents of428

two different areas. The request to complete the survey was made by a resident of429

the area itself. The survey link was active for about two weeks and during this time,430

two reminders were sent to complete the survey.431

16

Page 17: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

2. Door-to-door: Two surveyors were sent to four different localities on four different432

days. Two smart-tablets were used to collect the data i.e. they used survey URLs433

on these two record the data. The surveyors asked the questions to the family mem-434

ber and entered the information. To get a good mix of the data in Jaipur using435

the door-to-door approach, areas which have different demographic characteristics436

are selected. These locations are area around Benar road, Kacchi Basti (slums) of437

Jawahar Nagar, Chandpole Market area (wall-city) and Vidhyadhar Nagar (com-438

paratively high-income households). In total about 34 hours was invested by both439

surveyors.440

3. Crowd-sourced web-survey: As the name depicts, in this approach, a group of person441

was asked to complete the survey on the spot. For this, the two supervisors went to442

two different institutions, students were asked to come to the computer lab (turn-443

by-turn) and a demonstration was given. The students completed the survey on444

the computers and on smart-tablets with the help of supervisors. In total about 12445

hours was invested by both supervisors.446

5.2. Results447

A summary of the number of responses, survey completion rate and time are shown448

in Table 2. The survey completion rate is defined as the ratio of number of completed449

surveys to the number of survey started. Survey completion time is difference of the time450

between survey start and survey end times.

Table 2: Survey completion rate and time for the pilot study

survey invested time number of responses completion completion

method (man-hours) recorded valid rate time (min)

traditional web-survey NA 13 3 23.1% 32.33

door-to-door survey 34 79 67 84.8% 11.67

crowd-sourced web-survey 12 214 147 68.7% 17.07

total 306 217 70.9%

451

In the traditional web-survey, it is difficult to explain and convince a respondent to452

complete the survey. As a result, only 13 persons started the survey and only 3 persons453

completed the survey which results in a very low survey completion rate. In the door-454

to-door survey approach, the surveyors can explain the importance of the survey and455

their role in it. Consequently, it has highest survey completion rate. Though, almost456

no respondent refused to complete the survey, the door-to-door survey completion rate is457

less than 100% which is a consequence of bad network in field. The survey completion458

rate of crowd-soured web-survey is significantly higher than traditional web-survey which459

is a result of ability to convince respondents face-to-face by explaining the important460

of the survey. However, the value (68.7%) is somewhat lesser than expected because of461

non-functional internet in the beginning of the survey in the computer lab. Overall, the462

survey completion rate turns out to be 70.9%.463

From the operational perspective, the time-cost for the door-to-door survey is about464

0.51 h/completed-survey (=34/67) and for the crowd-source web-survey is about 0.08465

h/completed-survey (=12/147). Even though, about 15-20 min was spent to explain the466

17

Page 18: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

(a) Survey Completion time (b) Number of family members in a family

Figure 9: Box plots for survey completion time and number of members in a family

survey for each batch (numbers depends on the availability of the working computers),467

the time-cost for crowd-sourced survey approach is about one sixth of the time-cost for468

the door-to-door survey. This is encouraging for the real-wold exercise in which large469

number of survey records are required in shorter period of time.470

The survey completion time for traditional web-survey is 32.33 min which is very471

high compared to other two approaches which is pointing to difficulties in understand-472

ing the survey by respondents. Further, the door-to-door survey not only has highest473

survey completion rate but also has least survey completion time. This is because of474

better explanations provided by the surveyors. The crowd-sourced web-survey has sur-475

vey completion time of about 17 min which is slightly higher than door-to-door survey.476

The higher completion time is explained by additional time required to understand the477

questionnaire by respondents. Figure 9 shows the box plots for survey completion time478

and number of family members in a family using crow-sourced web-survey and door-to-479

door survey.11 The variability in survey completion time for door-to-door survey is lesser480

than crowd-sourced web-survey (Figure 9(a)) which shows that surveyors are able to ex-481

plain the questionnaire quickly (mainly in the local language) compared to crowd-sourced482

11Due to insufficient data for traditional web-survey, it is excluded for this comparison.

18

Page 19: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

webs-survey where the burden of understanding and completing the questionnaire is on483

the respondents (or asking the supervisor in the laboratory). Further, the completion484

time also depends on the number of family members who stays at home, number of in-485

fants/children for which some of the data is auto-completed. Though the door-to-door486

survey approach is attempted in four different areas (Section 5.1) in Jaipur which also487

has significant difference in the income level, the variability in household size for door-to-488

door survey is lesser than crowd-sourced survey data. This highlights that crowd-sourced489

survey can capture the variability in household size better than door-to-door survey.490

6. Conclusions491

In the direction of efficient data collection techniques, the present study proposed492

TSaaS (Travel Survey as a Service) platform. It is an open-source, web-based survey493

platform which is suitable for crowd-sourced, self-completion and/or person-interview494

type survey techniques. Currently, it hosts two surveys, however, the scope of the present495

study is limited to only household travel diary survey. In order to lower the response496

time, the proposed survey attempted to collect limited but all important information497

to synthesize a large-scale agent-based and/or activity-based model. In contrast to the498

state-of-the-art surveys, additional information about use of mobile phone, using data499

for various purposes (e.g. navigation), Bluetooth, WiFi etc. is also collected to estimate500

the market penetration rates. It will be helpful in sampling and/or correcting biasedness501

for the models which generates trips from call data records (CDRs) or similar dataset.502

The collected data is immediately sent to server where it’s recorded in JSON format503

for post-processing. The presented approach has an edge over completely automated504

surveys where users are uncomfortable in sharing the personal locations throughout the505

trip or inclined to change their travel behavior under the impression of information getting506

recorded. Since, locations are tracked in terms of nearest landmark using a search option507

integrated with a map with marker on it, issues related to consumption of battery due to508

GPS usages are not present. The latitude and longitude are also recorded together with509

the location search which is useful in post-processing. An important advantage is that510

the a web-survey can be used for multiple locations simultaneously using various survey511

URLs. Each group of survey is assigned a unique identifier (embedded in the URL) such512

that subsets can be processed independently or jointly as per the requirement. To verify513

the approach, a pilot study was conducted in Jaipur city using three different survey514

approaches. The survey completion rate for crowd-sourced web-survey was 68.7% which515

is significantly higher than traditional web-survey and lesser than door-to-door survey.516

The time-cost with respect to each valid recorded survey, is least for crowd-sourced web-517

survey which is desirable to collect the large sample of household trip diaries in an urban518

agglomeration. Further, the variability in the household size is better in crowd-sourced519

web-survey which is also a desirable for such studies. In future, the pilot study will be520

extended to collect about 1-2% trips of the Jaipur city.521

Acknowledgments522

The authors wish to thank Indian Institute of Technology (IIT) Roorkee for providing523

financial support to set up the infrastructure, Mr. Piyush Anand for assistance in the524

designing of the graphics and survey forms and, Dr. I. P. Meel from SBCET, Jaipur and525

Mr. Anurag Thombre, IIT Roorkee for assistance in the Pilot study.526

19

Page 20: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

Author Contributions527

The authors confirm contribution to the paper as follows: study conception and design:528

A. Agarwal; front-end: H. Vardhan; back-end: I. Rai; pilot study results analysis and draft529

manuscript preparation: N. Kathait, A. Agarwal. All authors reviewed the results and530

approved the final version of the manuscript.531

References532

A. Agarwal. Mitigating negative transport externalities in industrialized and industrializ-533

ing countries. PhD thesis, TU Berlin, Berlin, 2017.534

A. Agarwal, D. Ziemke, and K. Nagel. Calibration of choice model parameters in535

a transport scenario with heterogeneous traffic conditions and income dependency.536

Transportation Letters: The International Journal of Transportation Research, 2019.537

doi:10.1080/19427867.2019.1633788.538

E. S. Ampt. Comparison of self-administered and personal interview methods for the539

collection of 24-hour travel diaries. In Transport Policy, Management and Technol-540

ogy Towards 2001: Selected Proceedings of the Fifth World Conference on Transport541

Research, volume 4, 1989.542

Assocham. India to have 859 million smartphones users in 2022: ASSOCHAM-PwC.543

website, accesssed, 2019. URL https://www.assocham.org/newsdetail.php?id=7099.544

J. Auld, C. Williams, A. Mohammadian, and P. Nelson. An automated GPS-based545

prompted recall survey with learning algorithms. Transportation Letters, 1(1):59–79,546

January 2009. doi:10.3328/tl.2009.01.01.59-79.547

J. Auld, V. Sokolov, A. Fontes, and R. Bautista. Internet-based stated response survey548

for no-notice emergency evacuations. Transportation Letters, 4(1):41–53, January 2012.549

doi:10.3328/tl.2012.04.01.41-53.550

M. Bagchi and P. R. White. The potential of public transport smart card data. Transport551

Policy, 12(5):464–474, September 2005. doi:10.1016/j.tranpol.2005.06.008.552

C. Bayart and P. Bonnel. How to combine survey media (web, telephone, face-to-face):553

lyon and rhone-alps case study. Transportation Research Procedia, 11:118–35, 2015.554

doi:10.1016/j.trpro.2015.12.011.555

E. Blumenberg and G. Pierce. Multimodal travel and the poor: evidence from the 2009556

national household travel survey. Transportation Letters, 6(1):36–45, December 2013.557

doi:10.1179/1942787513y.0000000009.558

Pierre-Leo Bourbonnais and Catherine Morency. Web-based travel survey: a demo. In559

Transport Survey Methods: Best Practice for Decision Making, pages 207–224. Emerald560

Group Publishing Limited, January 2013. doi:10.1108/9781781902882-010.561

J. L. Brown, P. A. Vanable, and M. D. Eriksen. Computer-assisted self-interviews:562

a cost effectiveness analysis. Behavior Research Methods, 40(1):1–7, February 2008.563

doi:10.3758/brm.40.1.1.564

20

Page 21: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

Census. Jaipur district : census 2011-2019 data, accessed, 2019. URL https://www.565

census2011.co.in/census/district/435-jaipur.html.566

Z. Chen and W. Fan. Extracting bus transit boarding stop information using smart567

card transaction data. Journal of Modern Transportation, 26(3):209–219, June 2018.568

doi:10.1007/s40534-018-0165-y.569

S. Colak, L. P. Alexander, B. G. Alvim, S. R. Mehndiratta, and M. C. Gonzalez. Analyzing570

cell phone location data for urban travel. Transportation Research Record: Journal of571

the Transportation Research Board, 2526(1):126–135, January 2015. doi:10.3141/2526-572

14.573

N. Gore, S. Arkatkar, G. Joshi, and A. Bhaskar. A novel methodology to derive vehicle574

occupancy using Wi-Fi sensors under heterogenous traffic conditions. In Transportation575

Research Board 98th Annual Meeting, number 19-03838, 2019.576

C. C. Gravlee. Mobile computer-assisted personal interviewing with handheld computers.577

Field Methods, 14(3):322–336, August 2002. doi:10.1177/1525822X0201400305.578

S. Greaves, A. Ellison, R. Ellison, D. Rance, C. Standen, C. Rissel, and M. Crane. A web-579

based diary and companion smartphone app for travel/activity surveys. Transportation580

Research Procedia, 11:297–310, 2015. ISSN 2352-1465. doi:10.1016/j.trpro.2015.12.026.581

R. Griffiths, A. J. Richardson, and M. E. H. Lee-Gosselin. Travel surveys. Transportation582

in the New Millennium, pages 1–7, February 2000. URL http://onlinepubs.trb.org/583

onlinepubs/millennium/00135.pdf.584

R. A. Hitlin, F. Spielberg, E. Barber, and S. J. Andrle. A comparison of telephone and585

door-to-door survey results for transit market research. Transportation Research Record,586

(1144), 1987.587

J. Hood, E. Sall, and B. Charlton. A GPS-based bicycle route choice588

model for San Francisco, California. Transportation Letters, (3):63–75, 2011.589

doi:10.3328/TL.2011.03.01.63-75.590

E. Hurst. The structure of movement and household travel behaviour. Urban Studies, 6591

(1):70–82, 1969. doi:10.1080/00420986920080051.592

A. Kanungo. Smartphone penetration in india. Technical Report 1–9, November 2017.593

K. Kazemzadeh, R. Camporeale, C. D‘Agostino, A. Laureshyn, and L. W. Hiselius. Same594

questions, different answers? a hierarchical comparison of cyclists’ perceptions of com-595

fort: in-traffic vs. online approach. Transportation Letters, pages 1–9, March 2020.596

doi:10.1080/19427867.2020.1737373.597

R. J. Lee, I. N. Sener, and J. A. Mullins. An evaluation of emerging data collection598

technologies for travel demand modeling: from research to practice. Transportation599

Letters, 8(4):181–193, January 2016. doi:10.1080/19427867.2015.1106787.600

M. McClintock. The traffic survey. The ANNALS of the American Academy of Political601

and Social Science, 133(1):8–18, 1927.602

21

Page 22: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

G. Michau, N. Pustelnik, P. Borgnat, P. Abry, A. Bhaskar, and E. Chung. Combining603

traffic counts and Bluetooth data for link-origin-destination matrix estimation in large604

urban networks: the Brisbane case study. Technical report, 2019.605

C. V. Mol. Improving web survey efficiency: the impact of an extra reminder and reminder606

content on web survey response. International Journal of Social Research Methodology,607

20(4):317–327, 2017. doi:10.1080/13645579.2016.1185255.608

A. K. Prajapati, A. Gora, A. Agarwal, and I. Ghosh. Use of computer vision to automatize609

traffic data collection under mixed traffic condition. In 2nd ASCE India Conference610

on Challenges of Resilient and Sustainable Infrastructure Development in Emerging611

Economies, 2020.612

A. C. Prelipcean, G. Gidofalvi, and Y. O. Susilo. MEILI: a travel diary collection, an-613

notation and automation system. Computers, Environment and Urban Systems, 70:614

24–34, July 2018. doi:10.1016/j.compenvurbsys.2018.01.011.615

D. Rajkumar, K. Sharmila, and S. Rebello. A study on mobile usage and data pene-616

tratiom in india using predictive analytics. International Journal of Latest Trends in617

Engineering and Technology, special issue SACAIM:260–265, November 2016. ISSN618

2278-621X. URL https://www.ijltet.org/journal/148263102744.T705.pdf.619

A. Richardson, E. Ampt, and A. Meyburg. Survey methods for transport planning. Eu-620

calyptus press, 1995. ISBN 0-646-21439-X.621

N. Rieser-Schussler. Capitalising modern data sources for observing and mod-622

elling transport behaviour. Transportation Letters, 4(2):115–128, April 2012.623

doi:10.3328/tl.2012.04.02.115-128.624

H. Safi, M. Mesbah, and L. Ferreira. ATLAS project – developing a mobile-based travel625

survey. In Proceedings of the Australian Transportation Research Forum, Brisbane,626

QLD, Australia, pages 2–4, 2013.627

P. Sowa, B. Pedzinski, M. Krzyzak, D. Maslach, S. Wojcik, and A. Szpak. The computer-628

assisted web interview methodas used in the national study of ict. Studies in Logic,629

Grammar and Rhetoric, 43(1):137–146, December 2015. doi:10.1515/slgr-2015-0046.630

J. Stipancic, L. MIranda-Moreno, A. Labbe, and N. Saunier. Measuring and vi-631

sualizing space–time congestion patterns in an urban road network using large-632

scale smartphone-collected GPS data. Transportation Letters, pages 1–11, 2017.633

doi:10.1080/19427867.2017.1374022.634

J. Wolf, S. Hallmark, M. Oliviera, R. Guensler, and W. Sarasua. Accuracy issues with635

route choice data collection by using global positioning system. Transportation Research636

Record, 1999. doi:10.3141/1660-09.637

A. Zalewski, D. Sonenklar, A. Cohen, J. Kressner, and G. Macfarlane. Public transit rider638

origin–destination survey methods and technologies. Technical Report TCRP Synthesis639

138, Transportation Research Board, 2019.640

M. Zilske and K. Nagel. Studying the accuracy of demand generation from mobile phone641

trajectories with synthetic data. Procedia Computer Science, 32:802–807, 2014. ISSN642

1877-0509. doi:10.1016/j.procs.2014.05.494.643

22

Page 23: Crowd-Sourced Web Survey for Household Travel Diariesfaculty.iitr.ac.in/~amitfce/pdfs/VardhanEtc2020_submitted_2903202… · Harsh Vardhan, Ishan Rai, Nidhi Kathait, Amit Agarwal

Q. Zou, X. Yao, P. Zhao, H. Wei, and H. Ren. Detecting home location and trip purposes644

for cardholders by mining smart card transaction data in Beijing subway. Transporta-645

tion, 45(3):919–944, December 2016. doi:10.1007/s11116-016-9756-9.646

23