Digital Trails Dave King 1 5 10 Part 1 D3
-
Upload
dave-king -
Category
Technology
-
view
974 -
download
2
description
Transcript of Digital Trails Dave King 1 5 10 Part 1 D3
Digital Traces and Trails: Extracting Intelligence from the Collective Interactions
of Web and Mobile Users
Dave King
HICSS-44 Tutorial
January 5, 2010
Agenda
• Digital Traces & Trails – – Some examples – Expansion in the volume and types of traces
• Mining Digital Traces & Trails• Some Comments about Privacy
Pop Quiz
Fact Matching Letter Answer1 259 thousand a No of mobile phones in world2 3 million b No of YouTube videos viewed per day3 4.2 million c No of CCTV cameras in the UK4 5 million d No of songs downloaded from iTunes by 20085 10 million e No of riders on Oyster Transit system (UK) per day6 30 million f No of blogs indexed by Technorati since 20027 100 million g No of Tweets per day 8 100 million h Approximate number of Google searches daily9 133 million i No of users who log on to Facebook at least once each day
10 150 million j No of bookmarks entered in Del.icio.us web site11 1.7 billion k No of credit cards world wide12 2 billion l Estimate number of emails per day13 4.2 billion m Total number of Wikipedia articles in all languages14 5 billion n Size of worldwide GPS market in US$s15 210 billion o No of public wi-fi hotspots worldwide
Background
• Dave King, SVP of Product Development, Product Management - JDA Software
• Experience– 6 Years with JDA Software– 27 Years - Enterprise Software– 15 Years as a University Professor
• Education– Ph.D. in Sociology and Statistics from University of North Carolina
at Chapel Hill (long time ago)
Background
• 12 years as Co-Chair of the Internet & Digital Economy Track (HICSS)
• Long Time Interest in various aspects of E-Commerce & Business Intelligence
• Tutorial topic reflects a personal interest in– The data produced by various networks
and network devices,– The examination of those data with
advanced analytical techniques– And some of the social issues and
problems associated with that analysis.
Proliferation of Traces & Trails
Our lives have been leaving increasingly complete and detailed traces in cyberspace as two-way electronic communications devices have proliferated and diversified. Telephones were the first such devices to find widespread use; they soon yielded telephone billing data – records of when, where and by whom calls were made. Then bank ATM machines and point-of-sale terminals began to produce transaction records. As personal computer were plugged into commercial online networks, they too began to create electronic trails… There is more of this to come. As switched video networks become extensively used for everyday purposes – shopping, banking, selecting movies, social contact, political assembly – they potentially will grab and keep much more detailed portraits of private lives than have ever been made before. And wearable devices – ones that continuously monitor your medical condition, for example, or perhaps a cybersex suit that some journalists have avidly imagined – may construct the most up-close and intimate records.
Where there's data …
Data mining technologies are pervasive in our society. The are designed to capture, aggregate, and analyze our digital footprints, such as purchases, Internet search strings, blogs, and travel patterns in an attempt to profile individuals for a variety of applications (Jason Millar, Problem for Predictive Data Mining, Lessons from Identity Trail, 2009)
Digital Trails: Their Value & Misuse
The value of this data is unprecedented in the history of mankind. If you consider the sum of your online searching, mapping, communicating, blogging, news reading, shopping, and browsing, you should realize that you've revealed a very complete picture of yourself and placed it on the servers of a select few online companies.
The thin veneer of anonymity on the web is insufficient to protect you from revealing your identity. If you aren't even a little concerned, you should be. The value of this information is staggering and ripe for misuse.
Digital Traces and/or Trails: Informal Definitions
• Digital traces refer to the traces of activities and behaviors that people leave when they interact in digital environments (en.wikipedia.org/wiki/Digital_traces).
• Digital trails refer to the associations or interconnections of these traces with other traces and with other sources of information
Digital Traces & Trails: Intention
UnintentionalUnintentional IntentionalIntentional
LifeloggingEverydayActs
Interactionson Social
Media/Networks
Lifelogging
Steve Mann (the world’s first cyborg) – Cyborglogging (wearcam.org/netcam.html)
Jennifer Ringley – Lifecasting thru the JenniCam (1996-2003)
Mitch Maddox (aka DotComGuy) – 2002
Lisa Emily Batey– Lifecasting from Tokyo thru the Justin.tv (2007)
Daniel P.W. EllisAudio Lifelogging(2005-2007)
Lifelogging: MyLifeBits – Gordon Bell
“I’m losing my mind… By the way so are you.”
“Soon… you will have the capacity for Total Recall. You will be able to summon up everything you have ever see, heard, or done And you will be in total control, able to retrieve as much or as little as you want at any given time.”
Total Recall: Reminiscent of the Memex
“A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility.” Vannevar Bush (1945)
Total Recall: SenseCam
research.microsoft.com/en-us/um/cambridge/projects/sensecam/
www.businessweek.com/magazine/content/09_37/b4146051036364.htm
MyLifeBits: The Research Project
MyLifeBits store
database
Voice Voice annotation annotation tooltool
Text Text annotation annotation tooltool
Telephone Telephone capture toolcapture tool
TV capture TV capture tooltool
TV EPG TV EPG download download tooltool
Radio Radio capture capture & EPG& EPG
PocketPC PocketPC transfer transfer tooltool
PocketRadio PocketRadio playerplayer
Import filesImport files
MyLifeBits MyLifeBits ShellShell
files
Legacy Legacy applicationsapplications
Browser Browser tooltool
InternetInternet
IM captureIM capture
MAPI MAPI interfaceinterface
Legacy Legacy email clientemail client
GPS import & GPS import & Map displayMap display
SenseCamSenseCam
Screen saverScreen saver
Total Recall: Gordon Bell’s Trove
Total Recall: The 1 TB Life
• 1TB gives you 65+ years of:– 100 email messages a day (5KB each)– 100 web pages day (50KB each)– 5 scanned pages a day (100KB each)– 1 book every 10 days (1 MB each)– 10 photos per day (400 KB JPEG each)– 8 hours per day of sound - e.g. telephone,
voice annotations, and meeting recordings (8 Kb/s)– 1 new music CD every 10 days (45 min each at 128 Kb/s)
• It will take you 5 years to fill up your 80 GB drive• Want video? Buy more cheap drives (1 TB/year lets you record
4 hours/day of 1.5 Mb/s video)
Total Recall: The Benefits
• Ability to recover particular events, names, faces, and words– A log of your vital statistics and medical history– A digital memory of people you met, conversations you had,
places you visited, and events you participated in. – A complete archive of your work and play, and your work habits.
• Ability to sort and sift through your digital memories to uncover patterns in your life – Your life can be chronicled, condensed, cross-correlated, and
plotted out for you in useful and illuminating ways– Something you could never have gleaned with your unaided brain
Life Recorders
“Life Recorders May Be This Century’s Wrist Watch.” TechCrunch.com, Arrington, M. (Sept 6, 2009)
Digital Traces: How much is a free smartphone worth?
In the fall of 2008, 100 undergraduate students living in Random Hall at M.I.T. agreed for one year to swap their privacy for free smartphones in exchange for participating in a n MIT study aimed at understanding the impact of social interaction on social diffusion.
When the participating students dialed other students, sent e-mails, or listened to songs the researchers knew…Every moment the students had their Windows Mobile smartphones with them, the researchers knew where they were and who was nearby.
[Translated into 350,000 hours of data – e.g. 65,000 phone calls, 25,000 SMS messages, 3.3 million scanned bluetooth devices and 2.5 million scanned 802.11 WLAN APs]
Reality Mining: Human Dynamics Group
Reality Mining: Human Dynamics Group
Sociometric Badges: Modeling Workplace Interaction
Deployed their Sociometric badge platform for a period of one month (20 working days) at a Chicago-area data server configuration firm that consisted of 28 employees, with 23 participating in the study. Each employee was instructed to wear a Sociometric badge every day from the moment they arrived at work until they left their office.
Sociometric badge enable MIT researchers to track daily human activities, to extract speech features and non-linguistic signals in realtime, to locate individuals in the workplace (within 1.5 meters), and to detect other workers in close proximity, and to capture f2f interaction time.
In total we collected 1,900 hours of data, with a median of 80 hours per employee.
hd.media.mit.edu/badges/
Reality Mining: Results
• Mobile phone features can be used to accurately identify relationships between individuals. And predict the sharing of music within a social network.
• In the workplace complex problems are best solved with f2f interaction
senseable.mit.edu/engagingdata/papers/ED_SIII_Reality_Mining_and_Personal_Privacy.pdf
Social Media Landscape
http://www.briansolis.com/2008/08/introducing-conversation-prism/
UnintentionalUnintentional IntentionalIntentional
LifeloggingEverydayActs
Interactionson Social
Media/Networks
Universal McCann Survey of Social Media Usage, Attitudes & Interests
Wave 115 Countries7500 Internet
Users9/06
Wave 221 Countries
10000 Internet Users6/07
Wave 329 Countries
17000 Internet Users3/08
Wave 438 Countries
23000 Internet Users3/09
• Survey representative of the Active Internet Universe between16-54 (at least every other day)
• Online applications, platforms and media, which aim to facilitate interaction, collaboration and the sharing of content”
Universal McCann Survey: All Social Media have grown over the 4 Waves
Universal McCann Survey: Social Networks
Note: There are variations in the absolute numbers of users
Universal McCann Survey Trends
• Social networks continue to grow. Nearly two-thirds of active internet users have now joined a social network site, up from 57% in Wave 3..
• Social networks are now a regular part of the online experience with 64.1% of active internet users spending time managing their profile.
• Wave 4 reveals that social networks are becoming the dominant platform for content creation and content sharing. Users are starting to focus their digital life around the likes of Facebook, MySpace and Orkut.
Type of Profile Information (e.g. Facebook)
• Basic Information:– Networks– Sex– Birthday– Hometown– Relationship Status– Looking for
• Education and Work:– Grad School– College– High School– Employer (Name, Time Period,
Location)– Friends
• Personal Information:– Activities– Interests– Favorite Music– Favorite TV Shows– Favorite Movies– Favorite Books– Contact Information:– Email– Current City
Information Revelation on Facebook (4000 CMU Students)
Information Revelation and Privacy in Online Social Networks. ACM Workshop on Privacy in the Electronic Society 2005. Ralph Gross - Alessandro Acquisti.
Key Social Network Activities
1% Rule (or something like that)
434
It's an emerging rule of thumb that suggests that if you get a group of 100 people online then one will create content, 10 will "interact" with it (commenting or offering improvements) and the other 89 will just view it.
The Evidence of the 1% Rule
• YouTube -- each day there are 100 million downloads and 65,000 uploads - which is 1,538 downloads per upload - and 20m unique users per month.– That puts the "creator to consumer" ratio at just 0.5%, but it's early days yet;
not everyone has discovered YouTube (and it does make downloading much easier than uploading, because any web page can host a YouTube link).
• Wikipedia -- 50% of all article edits are done by 0.7% of users, and more than 70% of all articles have been written by just 1.8% of all users.
• Yahoo Groups discussion lists -- "1% of the user population might start a group; 10% of the user population might participate actively, and actually author content, whether starting a thread or responding to a thread-in-progress; 100% of the user population benefits from the activities of the above groups," he noted on his blog (www.elatable.com/blog/?p=5) in February.
Day in the life … Recording without even trying
http://newsinitiative.org/story/2006/08/15/digital_trails
We log onto computers at school and work, use our debit cards to buy lunch, scan our membership cards at the gym; the list goes on and on. With each of these everyday acts we leave a digital bread crumb that enables others to track our movements. But how often do we stop and wonder, who is following these virtual trails?
Unintentional Intentional
LifeloggingEverydayActs
Interactionson Social
Media/Networks
Day in the life …
What’s in the Trail?
• Type of trail• Time of trail• Initiated from (location)• Collected by• Data captured• Where stored• Accessible by (w/o sale)• Sold to• Privacy constraints• Government access
Sample Trail – Internet Activity
Sample Trail – Cell Phone
Web Trails: From IT to Marketing
Hits Pages Visits VisitorsSegment ExampleDemographic segments The country from which the visitor arrivedCustomer segments First-time versus returning buyersTechnographic segments Visitors using Macintosh versus Windows operating systems“Surfographic” segments Those visitors who surf daily versus those who only do so occasionallyCampaign segments Those visitors who saw one proposition or offer versus anotherPromotion types Those who saw a banner ad versus a paid searchReferral segments Those visitors who came from one blog versus those who came from anotherContent segments Those visitors who saw one page layout versus another
(http://www.internetworldstats.com/stats.htm).
Web Trails: Any Guesses
• June 30, 1998• Lou Montulli • Netscape Communications Corp (US)• Persistent client state in a hypertext transfer protocol
based client-server system• Answer – Cookie (aka tracking cookie, browser cookie,
HTTP cookie)
5774670
Web Trails: Cookies
1
2
3
• Text stored on a user's computer by a web browser.
• A cookie consists of one or more name-value pairs (e.g. user preferences, shopping cart contents, session identifier…)
• Sent as an HTTP header by a web server to a web browser and then sent back unchanged by the browser each time it accesses that server.
Web Trails: Tracking across Multiple Sites
Source: Croll & Power. Complete Web Monitoring. O’Reilly (2009)
Web Trails: What’s in a name
• Web Bug, Web Beacon, Tracking Bug, Tracking Pixel, Pixel Tag,1×1 Gif, Clear Gif, Transparent Gif
• This is what the user sees:
Pixel is here
What’s in a name
• Examples– <img src="http://ad.doubleclick.net/ad/pixel.quicken/NEW" width=1 height=1
border=0>– <img width=1 height=1 border=0 src="http://media.preferences.com/ping?
ML_SD=IntuitTE_Intuit_1x1_RunOfSite_Any &db_afcr=4B31-C2FB-10E2C&event=reghome&group=register& time=1999.10.27.20.5 6.37">
• What information can be tracked? Some examples– The IP address of the computer that fetched the Web Bug – The URL of the page that the Web Bug is located on – The URL of the Web Bug image – The time the Web Bug was viewed – The type of browser that fetched the Web Bug image – A previously set cookie value
Web Trails: Page Tag (JavaScript)
Source: Croll & Power. Complete Web Monitoring. O’Reilly (2009)
Web Trails: Ad Clicks & Analysis
Source: Croll & Power. Complete Web Monitoring. O’Reilly (2009)
Every Activities: Magnitude
When it comes to producing data, we’re prolific. Those of us wielding cell phones, laptops, and credit cards fatten our digital dossiers every day, simply by living…
In a single month, Yahoo alone gathers 110 billion pieces of data about its customers… Each person visiting sites in Yahoo’s network of advertisers leaves behind on average, a trail of 2,520 clues.
Every Activities: Magnitude
In a given year a conservative estimate of twenty digital transactions a day means that more than 7,000 transactions become associated with a particular individual – upwards of a half million in a lifetime (Jason Millar, Problem for Predictive Data Mining, Lessons from Identity Trail, 2009)..
Creating Digital Trails: The Internet of ThingsTraffic Cameras Electronic Tolls Traffic Cameras Transit Cards Passports Security Badges Time Clocks
Payment Cards Loyalty Cards Membership Cards Digital Cameras Video Recorders Voice Recorders
Health Recorders Sleep Recorders Wireless Scales Mobile Phones GPS Trainers GPS Devices
Photo GeotaggingWPS Devices RFID Tags Event RFID Tags Unisense Sensors
Digital Trails: Location-Based SystemsNetworked
(e.g. mobile phone triangulation)Handset
(e.g. GPS trialateration)
Hybrid(e.g. A-GPS or XPS)
Digital Trail: Skyhookwireless.com (and Loki.com)
Digital Trail: Skyhookwireless.com
Creating Digital Trails: RFID
Creating Digital Trails: RFID Critics
Tracking and Identifying:• Vehicles and Commuters• Animals• Product Inventory• People
Creating Digital Trails: Some Examples
RFID Tag Sensor
Company Task Tag LocationWilliam Ashley Retailer Locating sales people in store Name tagMS Retail Locating shoppers' children in store playground Lion toy arm bandLandfill Workers Locating workers in hazardous waste areas Construction helmetUniversity of South Florida Tracking inhabitants with dementia Wrist bandConference attendees Tracking visits to trade show booths Attendee badgeIntEqTec Tracking rider/horse location to activate cameras Rider's garmets/horse bridle
Ubisense.Net
What can you do with location data?
Pop Quiz
Fact Matching Letter Answer1 259 thousand a No of mobile phones in world2 3 million b No of YouTube videos viewed per day3 4.2 million c No of CCTV cameras in the UK4 5 million d No of songs downloaded from iTunes by 20085 10 million e No of riders on Oyster Transit system (UK) per day6 30 million f No of blogs indexed by Technorati since 20027 100 million g No of Tweets per day 8 100 million h Approximate number of Google searches daily9 133 million i No of users who log on to Facebook at least once each day
10 1.7 billion j No of credit cards world wide11 2 billion k Estimate number of emails per day12 4.2 billion l Total number of Wikipedia articles in all languages13 5 billion m Size of worldwide GPS market in US$s14 210 billion n No of public wi-fi hotspots worldwide