Download - Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Transcript
Page 1: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Students Doing BIG STUFF with BIG DATA Dan Matthews – Trine University

Page 2: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Trine University – Angola IndianaDepartment of Informatics

And Cybersecurity

Page 3: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

INFORMATICS – OUR WAY

“The success of computing is in the resolution of problems, found in areas that are predominately outside of computing..”

Page 4: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Data Mining AKA:

Information Harvesting

Knowledge Mining

Knowledge Discovery

Data Dredging

Data Pattern Processing

Data Archaeology

Database Mining

Siftware Analytics

Business Intelligence

And more…

Page 5: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

A DECENT DEFINITION

• The process of discovering meaningful new correlations, patterns, and trends but sifting through large amounts of stored data, using pattern recognition technologies and statistical and mathematical techniques.

Page 6: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

A number of technology skills are needed:

Data Mining

Database Managemen

t

Machine Learning

Artificial Intelligence

Analysis of Algorithms

Statistics

Visualization

Data Warehousing

Security

Technology Ethics

Page 7: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

“In order to discover anything, you must be looking for something.”

Laws of Serendipity

Page 8: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

I had to mine this data the hard way.

Page 9: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

What I won’t talk about today but these concepts are important to learn in a class on data mining.

Page 10: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Having fun “playing” with and mining data!

Page 11: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Visualization to gain insight and knowledge

David McCandless Data Visualization TED Talk

Page 12: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

WEKA: the software• Machine learning/data mining software written in Java

(distributed under the GNU Public License)• Used for research, education, and applications• Complements “Data Mining” by Witten & Frank• Main features:– Comprehensive set of data pre-processing tools, learning algorithms

and evaluation methods– Graphical user interfaces (incl. data visualization)– Environment for comparing learning algorithms

Page 13: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference
Page 14: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

Page 15: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Visual Analytics

BusinessIntegration

Tableau 8AnyData

FastPerformance

Web & MobileAuthoring

Page 16: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Visual Analytics

BusinessIntegration

Tableau 8AnyData

FastPerformance

Web & MobileAuthoring

Forecasting

Sets and visual groups

Shared Filters

Treemaps, bubble charts, word clouds

New marks card

Freeform dashboards

Data Blending improvements

Parallelized dashboards

Faster quick filters

Data Engine & Extract performance

Fast graphics and calculations

Performance recorder

Salesforce.com

Google Analytics & Google BigQuery

Cloudera Impala, Cassandra, HortonWorks, Hadapt, Karmasphere

SAP HANA

Data Extract API

JavaScript API

Data Server Security

Server Auditing

Distributed Data Engine

Web Authoring

iPad and Android authoring

Local rendering

Subscriptions

Page 17: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Tableau for Academia

Page 18: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Time to play!

Page 19: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Dan Matthews – Associate Professor – Trine [email protected]