Digital Trails Dave King 1 5 10 Part 1 D3

58
Digital Traces and Trails: Extracting Intelligence from the Collective Interactions of Web and Mobile Users Dave King HICSS-44 Tutorial January 5, 2010

description

HICSS Tutorial - part 1

Transcript of Digital Trails Dave King 1 5 10 Part 1 D3

Page 1: Digital Trails   Dave King   1 5 10   Part 1 D3

Digital Traces and Trails: Extracting Intelligence from the Collective Interactions

of Web and Mobile Users

Dave King

HICSS-44 Tutorial

January 5, 2010

Page 2: Digital Trails   Dave King   1 5 10   Part 1 D3

Agenda

• Digital Traces & Trails – – Some examples – Expansion in the volume and types of traces

• Mining Digital Traces & Trails• Some Comments about Privacy

Page 3: Digital Trails   Dave King   1 5 10   Part 1 D3

Pop Quiz

Fact Matching Letter Answer1 259 thousand a No of mobile phones in world2 3 million b No of YouTube videos viewed per day3 4.2 million c No of CCTV cameras in the UK4 5 million d No of songs downloaded from iTunes by 20085 10 million e No of riders on Oyster Transit system (UK) per day6 30 million f No of blogs indexed by Technorati since 20027 100 million g No of Tweets per day 8 100 million h Approximate number of Google searches daily9 133 million i No of users who log on to Facebook at least once each day

10 150 million j No of bookmarks entered in Del.icio.us web site11 1.7 billion k No of credit cards world wide12 2 billion l Estimate number of emails per day13 4.2 billion m Total number of Wikipedia articles in all languages14 5 billion n Size of worldwide GPS market in US$s15 210 billion o No of public wi-fi hotspots worldwide

Page 4: Digital Trails   Dave King   1 5 10   Part 1 D3

Background

• Dave King, SVP of Product Development, Product Management - JDA Software

• Experience– 6 Years with JDA Software– 27 Years - Enterprise Software– 15 Years as a University Professor

• Education– Ph.D. in Sociology and Statistics from University of North Carolina

at Chapel Hill (long time ago)

Page 5: Digital Trails   Dave King   1 5 10   Part 1 D3

Background

• 12 years as Co-Chair of the Internet & Digital Economy Track (HICSS)

• Long Time Interest in various aspects of E-Commerce & Business Intelligence

• Tutorial topic reflects a personal interest in– The data produced by various networks

and network devices,– The examination of those data with

advanced analytical techniques– And some of the social issues and

problems associated with that analysis.

Page 6: Digital Trails   Dave King   1 5 10   Part 1 D3

Proliferation of Traces & Trails

Our lives have been leaving increasingly complete and detailed traces in cyberspace as two-way electronic communications devices have proliferated and diversified. Telephones were the first such devices to find widespread use; they soon yielded telephone billing data – records of when, where and by whom calls were made. Then bank ATM machines and point-of-sale terminals began to produce transaction records. As personal computer were plugged into commercial online networks, they too began to create electronic trails… There is more of this to come. As switched video networks become extensively used for everyday purposes – shopping, banking, selecting movies, social contact, political assembly – they potentially will grab and keep much more detailed portraits of private lives than have ever been made before. And wearable devices – ones that continuously monitor your medical condition, for example, or perhaps a cybersex suit that some journalists have avidly imagined – may construct the most up-close and intimate records.

Page 7: Digital Trails   Dave King   1 5 10   Part 1 D3

Where there's data …

Data mining technologies are pervasive in our society. The are designed to capture, aggregate, and analyze our digital footprints, such as purchases, Internet search strings, blogs, and travel patterns in an attempt to profile individuals for a variety of applications (Jason Millar, Problem for Predictive Data Mining, Lessons from Identity Trail, 2009)

Page 8: Digital Trails   Dave King   1 5 10   Part 1 D3

Digital Trails: Their Value & Misuse

The value of this data is unprecedented in the history of mankind. If you consider the sum of your online searching, mapping, communicating, blogging, news reading, shopping, and browsing, you should realize that you've revealed a very complete picture of yourself and placed it on the servers of a select few online companies.

The thin veneer of anonymity on the web is insufficient to protect you from revealing your identity. If you aren't even a little concerned, you should be. The value of this information is staggering and ripe for misuse.

Page 9: Digital Trails   Dave King   1 5 10   Part 1 D3

Digital Traces and/or Trails: Informal Definitions

• Digital traces refer to the traces of activities and behaviors that people leave when they interact in digital environments (en.wikipedia.org/wiki/Digital_traces).

• Digital trails refer to the associations or interconnections of these traces with other traces and with other sources of information

Page 10: Digital Trails   Dave King   1 5 10   Part 1 D3

Digital Traces & Trails: Intention

UnintentionalUnintentional IntentionalIntentional

LifeloggingEverydayActs

Interactionson Social

Media/Networks

Page 11: Digital Trails   Dave King   1 5 10   Part 1 D3

Lifelogging

Steve Mann (the world’s first cyborg) – Cyborglogging (wearcam.org/netcam.html)

Jennifer Ringley – Lifecasting thru the JenniCam (1996-2003)

Mitch Maddox (aka DotComGuy) – 2002

Lisa Emily Batey– Lifecasting from Tokyo thru the Justin.tv (2007)

Daniel P.W. EllisAudio Lifelogging(2005-2007)

Page 12: Digital Trails   Dave King   1 5 10   Part 1 D3

Lifelogging: MyLifeBits – Gordon Bell

“I’m losing my mind… By the way so are you.”

“Soon… you will have the capacity for Total Recall. You will be able to summon up everything you have ever see, heard, or done And you will be in total control, able to retrieve as much or as little as you want at any given time.”

Page 13: Digital Trails   Dave King   1 5 10   Part 1 D3

Total Recall: Reminiscent of the Memex

“A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility.” Vannevar Bush (1945)

Page 14: Digital Trails   Dave King   1 5 10   Part 1 D3

Total Recall: SenseCam

research.microsoft.com/en-us/um/cambridge/projects/sensecam/

www.businessweek.com/magazine/content/09_37/b4146051036364.htm

Page 15: Digital Trails   Dave King   1 5 10   Part 1 D3

MyLifeBits: The Research Project

MyLifeBits store

database

Voice Voice annotation annotation tooltool

Text Text annotation annotation tooltool

Telephone Telephone capture toolcapture tool

TV capture TV capture tooltool

TV EPG TV EPG download download tooltool

Radio Radio capture capture & EPG& EPG

PocketPC PocketPC transfer transfer tooltool

PocketRadio PocketRadio playerplayer

Import filesImport files

MyLifeBits MyLifeBits ShellShell

files

Legacy Legacy applicationsapplications

Browser Browser tooltool

InternetInternet

IM captureIM capture

MAPI MAPI interfaceinterface

Legacy Legacy email clientemail client

GPS import & GPS import & Map displayMap display

SenseCamSenseCam

Screen saverScreen saver

Page 16: Digital Trails   Dave King   1 5 10   Part 1 D3

Total Recall: Gordon Bell’s Trove

Page 17: Digital Trails   Dave King   1 5 10   Part 1 D3

Total Recall: The 1 TB Life

• 1TB gives you 65+ years of:– 100 email messages a day (5KB each)– 100 web pages day (50KB each)– 5 scanned pages a day (100KB each)– 1 book every 10 days (1 MB each)– 10 photos per day (400 KB JPEG each)– 8 hours per day of sound - e.g. telephone,

voice annotations, and meeting recordings (8 Kb/s)– 1 new music CD every 10 days (45 min each at 128 Kb/s)

• It will take you 5 years to fill up your 80 GB drive• Want video? Buy more cheap drives (1 TB/year lets you record

4 hours/day of 1.5 Mb/s video)

Page 18: Digital Trails   Dave King   1 5 10   Part 1 D3

Total Recall: The Benefits

• Ability to recover particular events, names, faces, and words– A log of your vital statistics and medical history– A digital memory of people you met, conversations you had,

places you visited, and events you participated in. – A complete archive of your work and play, and your work habits.

• Ability to sort and sift through your digital memories to uncover patterns in your life – Your life can be chronicled, condensed, cross-correlated, and

plotted out for you in useful and illuminating ways– Something you could never have gleaned with your unaided brain

Page 19: Digital Trails   Dave King   1 5 10   Part 1 D3

Life Recorders

“Life Recorders May Be This Century’s Wrist Watch.” TechCrunch.com, Arrington, M. (Sept 6, 2009)

Page 20: Digital Trails   Dave King   1 5 10   Part 1 D3

Digital Traces: How much is a free smartphone worth?

In the fall of 2008, 100 undergraduate students living in Random Hall at M.I.T. agreed for one year to swap their privacy for free smartphones in exchange for participating in a n MIT study aimed at understanding the impact of social interaction on social diffusion.

When the participating students dialed other students, sent e-mails, or listened to songs the researchers knew…Every moment the students had their Windows Mobile smartphones with them, the researchers knew where they were and who was nearby.

[Translated into 350,000 hours of data – e.g. 65,000 phone calls, 25,000 SMS messages, 3.3 million scanned bluetooth devices and 2.5 million scanned 802.11 WLAN APs]

Page 21: Digital Trails   Dave King   1 5 10   Part 1 D3

Reality Mining: Human Dynamics Group

Page 22: Digital Trails   Dave King   1 5 10   Part 1 D3

Reality Mining: Human Dynamics Group

Page 23: Digital Trails   Dave King   1 5 10   Part 1 D3

Sociometric Badges: Modeling Workplace Interaction

Deployed their Sociometric badge platform for a period of one month (20 working days) at a Chicago-area data server configuration firm that consisted of 28 employees, with 23 participating in the study. Each employee was instructed to wear a Sociometric badge every day from the moment they arrived at work until they left their office.

Sociometric badge enable MIT researchers to track daily human activities, to extract speech features and non-linguistic signals in realtime, to locate individuals in the workplace (within 1.5 meters), and to detect other workers in close proximity, and to capture f2f interaction time.

In total we collected 1,900 hours of data, with a median of 80 hours per employee.

hd.media.mit.edu/badges/

Page 24: Digital Trails   Dave King   1 5 10   Part 1 D3

Reality Mining: Results

• Mobile phone features can be used to accurately identify relationships between individuals. And predict the sharing of music within a social network.

• In the workplace complex problems are best solved with f2f interaction

senseable.mit.edu/engagingdata/papers/ED_SIII_Reality_Mining_and_Personal_Privacy.pdf

Page 25: Digital Trails   Dave King   1 5 10   Part 1 D3

Social Media Landscape

http://www.briansolis.com/2008/08/introducing-conversation-prism/

UnintentionalUnintentional IntentionalIntentional

LifeloggingEverydayActs

Interactionson Social

Media/Networks

Page 26: Digital Trails   Dave King   1 5 10   Part 1 D3

Universal McCann Survey of Social Media Usage, Attitudes & Interests

Wave 115 Countries7500 Internet

Users9/06

Wave 221 Countries

10000 Internet Users6/07

Wave 329 Countries

17000 Internet Users3/08

Wave 438 Countries

23000 Internet Users3/09

• Survey representative of the Active Internet Universe between16-54 (at least every other day)

• Online applications, platforms and media, which aim to facilitate interaction, collaboration and the sharing of content”

Page 27: Digital Trails   Dave King   1 5 10   Part 1 D3

Universal McCann Survey: All Social Media have grown over the 4 Waves

Page 28: Digital Trails   Dave King   1 5 10   Part 1 D3

Universal McCann Survey: Social Networks

Note: There are variations in the absolute numbers of users

Page 29: Digital Trails   Dave King   1 5 10   Part 1 D3

Universal McCann Survey Trends

• Social networks continue to grow. Nearly two-thirds of active internet users have now joined a social network site, up from 57% in Wave 3..

• Social networks are now a regular part of the online experience with 64.1% of active internet users spending time managing their profile.

• Wave 4 reveals that social networks are becoming the dominant platform for content creation and content sharing. Users are starting to focus their digital life around the likes of Facebook, MySpace and Orkut.

Page 30: Digital Trails   Dave King   1 5 10   Part 1 D3

Type of Profile Information (e.g. Facebook)

• Basic Information:– Networks– Sex– Birthday– Hometown– Relationship Status– Looking for

• Education and Work:– Grad School– College– High School– Employer (Name, Time Period,

Location)– Friends

• Personal Information:– Activities– Interests– Favorite Music– Favorite TV Shows– Favorite Movies– Favorite Books– Contact Information:– Email– Current City

Page 31: Digital Trails   Dave King   1 5 10   Part 1 D3

Information Revelation on Facebook (4000 CMU Students)

Information Revelation and Privacy in Online Social Networks. ACM Workshop on Privacy in the Electronic Society 2005. Ralph Gross - Alessandro Acquisti.

Page 32: Digital Trails   Dave King   1 5 10   Part 1 D3

Key Social Network Activities

Page 33: Digital Trails   Dave King   1 5 10   Part 1 D3

1% Rule (or something like that)

434

It's an emerging rule of thumb that suggests that if you get a group of 100 people online then one will create content, 10 will "interact" with it (commenting or offering improvements) and the other 89 will just view it.

Page 34: Digital Trails   Dave King   1 5 10   Part 1 D3

The Evidence of the 1% Rule

• YouTube -- each day there are 100 million downloads and 65,000 uploads - which is 1,538 downloads per upload - and 20m unique users per month.– That puts the "creator to consumer" ratio at just 0.5%, but it's early days yet;

not everyone has discovered YouTube (and it does make downloading much easier than uploading, because any web page can host a YouTube link).

• Wikipedia -- 50% of all article edits are done by 0.7% of users, and more than 70% of all articles have been written by just 1.8% of all users.

• Yahoo Groups discussion lists -- "1% of the user population might start a group; 10% of the user population might participate actively, and actually author content, whether starting a thread or responding to a thread-in-progress; 100% of the user population benefits from the activities of the above groups," he noted on his blog (www.elatable.com/blog/?p=5) in February.

Page 35: Digital Trails   Dave King   1 5 10   Part 1 D3

Day in the life … Recording without even trying

http://newsinitiative.org/story/2006/08/15/digital_trails

We log onto computers at school and work, use our debit cards to buy lunch, scan our membership cards at the gym; the list goes on and on.  With each of these everyday acts we leave a digital bread crumb that enables others to track our movements. But how often do we stop and wonder, who is following these virtual trails?

Unintentional Intentional

LifeloggingEverydayActs

Interactionson Social

Media/Networks

Page 36: Digital Trails   Dave King   1 5 10   Part 1 D3

Day in the life …

Page 37: Digital Trails   Dave King   1 5 10   Part 1 D3

What’s in the Trail?

• Type of trail• Time of trail• Initiated from (location)• Collected by• Data captured• Where stored• Accessible by (w/o sale)• Sold to• Privacy constraints• Government access

Page 38: Digital Trails   Dave King   1 5 10   Part 1 D3

Sample Trail – Internet Activity

Page 39: Digital Trails   Dave King   1 5 10   Part 1 D3

Sample Trail – Cell Phone

Page 40: Digital Trails   Dave King   1 5 10   Part 1 D3

Web Trails: From IT to Marketing

Hits Pages Visits VisitorsSegment ExampleDemographic segments The country from which the visitor arrivedCustomer segments First-time versus returning buyersTechnographic segments Visitors using Macintosh versus Windows operating systems“Surfographic” segments Those visitors who surf daily versus those who only do so occasionallyCampaign segments Those visitors who saw one proposition or offer versus anotherPromotion types Those who saw a banner ad versus a paid searchReferral segments Those visitors who came from one blog versus those who came from anotherContent segments Those visitors who saw one page layout versus another

(http://www.internetworldstats.com/stats.htm).

Page 41: Digital Trails   Dave King   1 5 10   Part 1 D3

Web Trails: Any Guesses

• June 30, 1998• Lou Montulli • Netscape Communications Corp (US)• Persistent client state in a hypertext transfer protocol

based client-server system• Answer – Cookie (aka tracking cookie, browser cookie,

HTTP cookie)

5774670

Page 42: Digital Trails   Dave King   1 5 10   Part 1 D3

Web Trails: Cookies

1

2

3

• Text stored on a user's computer by a web browser.

• A cookie consists of one or more name-value pairs (e.g. user preferences, shopping cart contents, session identifier…)

• Sent as an HTTP header by a web server to a web browser and then sent back unchanged by the browser each time it accesses that server.

Page 43: Digital Trails   Dave King   1 5 10   Part 1 D3

Web Trails: Tracking across Multiple Sites

Source: Croll & Power. Complete Web Monitoring. O’Reilly (2009)

Page 44: Digital Trails   Dave King   1 5 10   Part 1 D3

Web Trails: What’s in a name

• Web Bug, Web Beacon, Tracking Bug, Tracking Pixel, Pixel Tag,1×1 Gif, Clear Gif, Transparent Gif

• This is what the user sees:

Pixel is here

Page 45: Digital Trails   Dave King   1 5 10   Part 1 D3

What’s in a name

• Examples– <img src="http://ad.doubleclick.net/ad/pixel.quicken/NEW" width=1 height=1

border=0>– <img width=1 height=1 border=0 src="http://media.preferences.com/ping?

ML_SD=IntuitTE_Intuit_1x1_RunOfSite_Any &db_afcr=4B31-C2FB-10E2C&event=reghome&group=register& time=1999.10.27.20.5 6.37">

• What information can be tracked? Some examples– The IP address of the computer that fetched the Web Bug – The URL of the page that the Web Bug is located on – The URL of the Web Bug image – The time the Web Bug was viewed – The type of browser that fetched the Web Bug image – A previously set cookie value

Page 46: Digital Trails   Dave King   1 5 10   Part 1 D3

Web Trails: Page Tag (JavaScript)

Source: Croll & Power. Complete Web Monitoring. O’Reilly (2009)

Page 47: Digital Trails   Dave King   1 5 10   Part 1 D3

Web Trails: Ad Clicks & Analysis

Source: Croll & Power. Complete Web Monitoring. O’Reilly (2009)

Page 48: Digital Trails   Dave King   1 5 10   Part 1 D3

Every Activities: Magnitude

When it comes to producing data, we’re prolific. Those of us wielding cell phones, laptops, and credit cards fatten our digital dossiers every day, simply by living…

In a single month, Yahoo alone gathers 110 billion pieces of data about its customers… Each person visiting sites in Yahoo’s network of advertisers leaves behind on average, a trail of 2,520 clues.

Page 49: Digital Trails   Dave King   1 5 10   Part 1 D3

Every Activities: Magnitude

In a given year a conservative estimate of twenty digital transactions a day means that more than 7,000 transactions become associated with a particular individual – upwards of a half million in a lifetime (Jason Millar, Problem for Predictive Data Mining, Lessons from Identity Trail, 2009)..

Page 50: Digital Trails   Dave King   1 5 10   Part 1 D3

Creating Digital Trails: The Internet of ThingsTraffic Cameras Electronic Tolls Traffic Cameras Transit Cards Passports Security Badges Time Clocks

Payment Cards Loyalty Cards Membership Cards Digital Cameras Video Recorders Voice Recorders

Health Recorders Sleep Recorders Wireless Scales Mobile Phones GPS Trainers GPS Devices

Photo GeotaggingWPS Devices RFID Tags Event RFID Tags Unisense Sensors

Page 51: Digital Trails   Dave King   1 5 10   Part 1 D3

Digital Trails: Location-Based SystemsNetworked

(e.g. mobile phone triangulation)Handset

(e.g. GPS trialateration)

Hybrid(e.g. A-GPS or XPS)

Page 52: Digital Trails   Dave King   1 5 10   Part 1 D3

Digital Trail: Skyhookwireless.com (and Loki.com)

Page 53: Digital Trails   Dave King   1 5 10   Part 1 D3

Digital Trail: Skyhookwireless.com

Page 54: Digital Trails   Dave King   1 5 10   Part 1 D3

Creating Digital Trails: RFID

Page 55: Digital Trails   Dave King   1 5 10   Part 1 D3

Creating Digital Trails: RFID Critics

Tracking and Identifying:• Vehicles and Commuters• Animals• Product Inventory• People

Page 56: Digital Trails   Dave King   1 5 10   Part 1 D3

Creating Digital Trails: Some Examples

RFID Tag Sensor

Company Task Tag LocationWilliam Ashley Retailer Locating sales people in store Name tagMS Retail Locating shoppers' children in store playground Lion toy arm bandLandfill Workers Locating workers in hazardous waste areas Construction helmetUniversity of South Florida Tracking inhabitants with dementia Wrist bandConference attendees Tracking visits to trade show booths Attendee badgeIntEqTec Tracking rider/horse location to activate cameras Rider's garmets/horse bridle

Ubisense.Net

Page 57: Digital Trails   Dave King   1 5 10   Part 1 D3

What can you do with location data?

Page 58: Digital Trails   Dave King   1 5 10   Part 1 D3

Pop Quiz

Fact Matching Letter Answer1 259 thousand a No of mobile phones in world2 3 million b No of YouTube videos viewed per day3 4.2 million c No of CCTV cameras in the UK4 5 million d No of songs downloaded from iTunes by 20085 10 million e No of riders on Oyster Transit system (UK) per day6 30 million f No of blogs indexed by Technorati since 20027 100 million g No of Tweets per day 8 100 million h Approximate number of Google searches daily9 133 million i No of users who log on to Facebook at least once each day

10 1.7 billion j No of credit cards world wide11 2 billion k Estimate number of emails per day12 4.2 billion l Total number of Wikipedia articles in all languages13 5 billion m Size of worldwide GPS market in US$s14 210 billion n No of public wi-fi hotspots worldwide