Demo Soofi
Transcript of Demo Soofi
CrowdSkipprWafa Soofi
Yosemite trip, May 2011
We had a great time…
Though it might have been better if the scenery had looked
less like this
alanak
and morelikethis.
Gianluca Vegetti
The problemI want to go hiking at a time/day that works for me, but that also
minimizes the size of the crowds.
The problemI want to go hiking at a time/day that works for me, but that also
minimizes the size of the crowds.
I would like to predict the crowd size for a specific location and a
range of future dates.
The problemI want to go hiking at a time/day that works for me, but that also
minimizes the size of the crowds.
I would like to predict the crowd size for a specific location and a
range of future dates.
Then I can use that prediction to make an intelligent choice about
when to take my trip.
How do we predict crowds right now?
Government dataOften aggregated
Not always immediately accessible
Check-insSparse coverage
Prior knowledge/IntuitionNot always validated
There’s another way.
There’s another way.
We can crowdsource this problem!
CrowdSkippr: Inner workings
From flickr.com, extract the total
number of photos taken at a given
time/place).
Extract data on temperatures
from NOAA.gov for
a given time/place.
Using this information, create a prediction of how heavy the crowds will be at a given
future time/place.
TM
Gradient Boosting RegressionPredictors
Day of week (Flickr)Holiday flag (Flickr)Day of year (Flickr)
Daily temperature (NOAA)
ResponseNumber of photos taken (Flickr)
(proxy for size of crowd)
Photos (or visitors) per
month normalized by
total
Wait:Is # photos a good proxy for # visitors?
PhotosVisitors
Photos (or visitors) per
month normalized by
total
Wait:Is # photos a good proxy for # visitors?
PhotosVisitors
02000
4000
6000
8000
100001200014000
0100,000200,000300,000400,000500,000600,000700,000
R2 =0.89
No. Photos
Num
. Visi
tors
Day of Year
Temperature
Day of Week
Yosemite National Park
Holiday
Relative Feature Importance0 0.4 0.6 0.8 10.2
Thanks for your time!I’m Wafa.
For all 28-day windows in a given year,the median difference between crowd size on predicted
and actual best days is 4.6%.
(On the days that are predicted to have the lowest crowds, the crowd size is 29% of the worst possible
crowds within that window.)
Validation:Rocky Mountain National Park
Predicted crowd size
Actual crowd size (test data)