Data Analysis - laulima.hawaii.edu › access › content › user › smosier › ics › 10… ·...

Post on 09-Jun-2020

0 views 0 download

Transcript of Data Analysis - laulima.hawaii.edu › access › content › user › smosier › ics › 10… ·...

Data Analysis

iClicker QuestionI know a lot about analyzing data.

A. Strongly agreeB. AgreeC. Don’t agree or disagreeD. DisagreeE. Strongly disagree

Overview

• What is data analysis?

• Data collection methods

• Big data

• Practical examples

• Analyzing data

• How can you get started?

Workspace 1

What do you think when you hear the words, “analyze data?”

Data analysis

• Data: information collected– Can be numerical or lexical

– Datum: single piece of information

• Analysis: careful study of information or a close look at information

• Data analysis: Careful study of information collected to create new information

Data collection methods

• Surveys

• Experiments

• Web clicks

• Search engine searches

• Purchases at a store

• Any possible recordable action

Big Data

Very large complex sets of data that are not easily analyzed.

Big data examples

• Google searches– Over 3 billion searches per day

– Ensure high ad revenue

• Retail stores– Predicting sales trends instead of reacting to them

– Identify hot holiday sales items

– Market to people based on geolocation, social media, Web browsing patterns, customer loyalty programs

Practice problems

• What is the difference between data, datum, analysis, and data analysis?

• How can data be collected?

• What is big data?

• How do search engine companies and retail stores use big data?

• Think of other possible examples, where big data can be applied.

Data analysis examples

Courtesy of Tony Felix

Car insurance – Thought questions

• Why do some states have variable car insurance rates?

• Which factors affect car insurance rates and why?– Men

– Women

– Age

– Having children

Car insurance

• High insurance rates are typically due to probability of accidents– Age (teenagers) and elderly typically have higher rates

• Individuals in the age range 25-50 typically have lower rates– Mature and typically drive safer

• Men typically have higher rates than women• Women with children typically have the lowest

rates due to safest driving• Prior tickets received

Supermarkets

• Club cards are used to collect data about customers

– Receive coupons for similar or related items

– May receive a coupon for creamer if you purchase coffee

• Question: What do you think data analysts found regarding their customers on Thursday evenings?

Supermarkets

• There was a correlation for men between beer and diapers.

– Typically purchased beer for the weekend

– Needed diapers for their children

• How did supermarkets respond? How would you respond?

Supermarkets

• There was a correlation for men between beer and diapers.– Typically purchased beer for the weekend

– Needed diapers for their children

• How did supermarkets respond? How would you respond?– Move beer to a physical location that is visible from

the diapers section

– Increase the cost of diapers or ensure it was not on sale

Amazon.com and Electronic Arts

Thought question: Who has richer data to mine regarding their

customers?

iClicker QuestionWho has “richer” data?

A. Amazon.comB. Electronic Arts

Amazon.com and Electronic Arts

Amazon.com• Knows what customers

purchase• When customers purchase

items• Frequency at which

replacement items are purchased

• Customer reviews of items purchased

• Broad view of customer preferences based on wide product variety

Amazon.com and Electronic Arts

Amazon.com• Knows what customers

purchase• When customers purchase

items• Frequency at which

replacement items are purchased

• Customer reviews of items purchased

• Broad view of customer preferences based on wide product variety

Electronic Arts

• Knows which video games customers own (by EA)

• Collects information regarding actions taken

• Subsequent games purchased

• Links between purchases and actions

Amazon.com does not know what you did with a

product after you purchase it.

Practice problems

• What factors affect car insurance rates?• Why does each factor affect care insurance rates?• How can supermarkets collect information about

customers?• What can supermarkets do with information they

collect regarding customers?• How do supermarkets use customer information to

increase sales?• What is the differences between the type of data

Amazon.com and Electronics Arts collect from their customers?

• Who (Amazon.com or EA) has richer data and why?

Analyzing data

Statistics

• Mean: average

• Mode: most frequent number

• Median: middle number

• Many more (online lecture): variance, standard deviation, kurtosis, skewness, etc.

5 10 1 3 2 3 4 5 10 2

Statistics

• Mean: 5

• Mode: 3

• Median: 3.5

5 10 1 3 2 3 4 5 10 2

0

0.5

1

1.5

2

2.5

3

3.5

0 2 4 6 8 10 12

rating

# of people

Bell curve

0

1

2

3

4

5

6

0 2 4 6 8 10 12

0

0.5

1

1.5

2

2.5

3

3.5

0 2 4 6 8 10 12rating

# of people

rating

# of people

mean, mode, median

mode

median

mean

Statistics

• Mean: 5

• Mode: 3

• Median: 3.5

5 10 1 3 2 3 4 5 10 2

0

0.5

1

1.5

2

2.5

3

3.5

0 2 4 6 8 10 12

Net Promoter Score (NPS)

• Customer loyalty metric (Reichheld)

• Calculated based on a single question (0-10 scale):How likely is it that you would recommend our company/product/service to a friend or colleague?

• Respondents are classified into 3 groups (promoter, passive, detractor)

iClicker QuestionWhat score is needed to be considered a promoter?

A. 10B. 9-10C. 8-10D. 7-10E. 6-10

Net Promoter Score (NPS)

• Promoter: 9-10

• Passive: 7-8

• Detractor: 0-6

Practice problems

• What are mean, mode, and median?

• How are mean, mode, and median related?

• How can mean, mode, and median be interpreted (or incorrectly interpreted)?

• What is NPS?

• What are the groupings for NPS?

• How can you use the different NPS groups?

How can you get started?

• Look at different types of data and consider how you can analyze them

• Think about how you can summarize data to make sense to others

• Determine different ways to create new information from data you have

• Excel is a powerful tool that can get you started

• We will be covering more details in the next lecture

On-line lecture

• Additional statistics

• Data analysis tools

• NPS usage

iClicker QuestionI learned a lot about analyzing data.A. Strongly agreeB. AgreeC. NeutralD. DisagreeE. Strongly disagree

Please do not pack up yet. We have a short iClicker midterm review on the next few slides.