ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

Post on 20-Jan-2016

218 views 0 download

Transcript of ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning.

ACIS 1504 - Introduction to Data Analytics & Business Intelligence

Text MiningData Cleaning

Concept MapText Mining

Implementation

Mixed Cell References

Design: Accuracy

Random

Search, Left, Right, Mid,

Len, &

Paste Values

Objectives

• Define Text Mining

• Demonstrate Excel features that support text mining.

Segment A:Text Mining

Text Analytics / Text Mining

• Software that searches vast amounts of textual data (unstructured) identifying patterns.

Nestle• Nestle processes Social Media

http://uk.reuters.com/article/video/idUKBRE89P07Q20121026?videoId=238680321

Segment B:Text Functions

Text Mining

• Search

• Parse

• Concatenate

• SEARCH

• LEFT, MID, RIGHT, LEN

• &

Name Example

Open Grades Textfile.xlsx.

Divide Last Name, First Name into two separate columns.

1. Locate the comma (SEARCH)2. Extract all characters to left of comma (LEFT)3. Locate end of full name (LEN)4. Extract almost all characters between comma

and end of name (RIGHT)

SEARCH Function

LEFT Function

LEN or Length Function

RIGHT Function

MID FunctionExtract the first initial of first name.

Concatenate• Combine First Name, space and Last

Name.

• & is the concatenate symbol

• Quotes are required around constant strings of text

Student ID Example

Extract each student’s PID from their email address.

Create a new student identifier by combining the first three letters of the last name with the last four digits of the student ID number.

Segment C:Data Cleaning & Generation

Data Cleaning• Delete Unnecessary Columns & Rows• Resize Columns• Format Numeric Values• Separate Distinct Values • Shorten Lengthy Values• Data Validation for Future Entries• Generate Values

Favorite Pie Example

Favorite Pie Example

1. Ensure pie flavor data is consistent.

2. Replace confidential clicker ID # with randomly generated 6 digit number.

3. Ensure new ID number is static and unique.

Favorite Pie Example

Original Sorted Consistent

Random Number Functions

• =RAND()

• =RANDBETWEEN(low#, high#)

Paste Special - Values

MAC: Edit Menu, Paste Special

Exam Feedback Example

Open Exam Feedback.xlsx