INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … •...

29
by K. AMPHAWAN INTRODUCTION TO DATA SCIENCE CHAPTER-1 1

Transcript of INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … •...

Page 1: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

by K. AMPHAWAN

INTRODUCTION TO DATA SCIENCECHAPTER-1

1

Page 2: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

OUTLINE

• Defining data science

• Defining data science by its key components

2by K. AMPHAWAN

Page 3: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

WHAT IS DATA SCIENCE?

• Now, it is the era of “DATA” —computer, mobile device, camera, sensor, watch, wearable technologies—social media interaction, file, picture, query

• Digital and physical worlds!

3by K. AMPHAWAN

Page 4: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

WHAT IS DATA SCIENCE?

• In the past decade …

• Data engineers—finding innovative and powerful new ways to capture, collateปรับให้เหมาะสม, condenseย่อ,ทำให้น้อยลง massive volumes of data.

• Now …

• Data scientists—deriving valuable and actionable insights from that data.

4by K. AMPHAWAN

Page 5: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

DATA SCIENCE …

• represents process and resource optimization

• produces “data insights”—insights you can use to understand and improve your business, your investments, your help and even your lifestyle and social life

• you can find data science methods to help you know and predict the most direct route from where you are to where you want to be…

5by K. AMPHAWAN

Page 6: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

RECALL …

• Data science—practical of using computational

methods to derive valuable and actionable insights from raw datasets.

• Data engineering—engineering domain that’s dedicated to overcoming data-processing bottlenecks and data-handling problems from applications that utilize large volumes, varieties, and velocities of data.

6by K. AMPHAWAN

Page 7: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

DATA VARIETIES…

• Data science and data engineering works with …

• Structured data—stored, processed and manipulated in RDBMS

• Unstructured data—generated from human activities—doesn’t fit into a structured database format

• Semi-structure data—doesn’t fit into a structured data system, but is nonetheless structured by tags—used for creating a form of order and hierarchy in the data

7by K. AMPHAWAN

Page 8: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

PIECES OF DATA SCIENCE

• To practice data science

• you need the analytical know-how of math and statistics, coding skills and area of subject-matter expertise

8by K. AMPHAWAN

Page 9: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

DISCIPLINES USING DATA SCIENCE

• Ad Tech data scientist

• Director of Banking Digital Analyst

• Clinical Data Scientist

• Geo-Engineering Data Scientist

• Geospatial Analytics Data Scientist

• Retail Personalization Data Scientist

• Clinical Informatics Analyst in Pharmacometrics

9by K. AMPHAWAN

Page 10: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

ROLE OF DATA SCIENTISTS

• sometimes take role as data engineers—collects, queries, and consumes data during analysis process

• a DATA SCIENTIST can …

• work off of several datasets that are stored in one database, or even in several different data warehouses.

• query data—write commands to extract relevant datasets from data storage system

10by K. AMPHAWAN

Page 11: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

REQUIRES SKILLS …

• MATHEMATICS and STATISTICS—to understand data and its significance, use for predictive forecasting, decision modeling, and hypothesis testing

• mathematics—deterministic numerical methods, deductive reasoning to form quantitative description of the world

• statistics—stochastic, probabilities, inductive reasoning to form quantitative description of the world

11by K. AMPHAWAN

Page 12: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

APPLYING MATHEMATICAL MODELING TO DATA SCIENCE TASKS…

• Data scientists use mathematical methods to build decision models, to generate approximations, and to make predictions about the future,

12by K. AMPHAWAN

Page 13: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

USING STATISTICAL METHODS TO DERIVE INSIGHTS

• Statistical methods are useful for getting a better understanding of your data’s significance, for validating hypotheses, for simulating scenarios, and for making predictive forecasts of future events.

• linear regression, ordinary least squares regression, Monte Carlo simulations, time series analysis

13by K. AMPHAWAN

Page 14: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

CODING ON DATA SCIENCE

• Coding is unavoidable when you’re working on data science

• Coding—to manipulate, analyze and visualize your data

• Python, R—data manipulation, analysis, and visualization

• D3.js—interactive web-based data visualization

14by K. AMPHAWAN

Page 15: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

REQUIRES SKILLS …

• Communicating data insights…

• Data scientists …

• must have sharp oral and written communication skills.

• need to be able to explain data insights in a way that staff members can understand.

• need to be able to produce clear and meaningful data visualizations and written narratives.

15by K. AMPHAWAN

Page 16: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

by K. AMPHAWAN

– …

”Data science is nothing new! It’s just another name for what we’ve been doing all along”

16

Page 17: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

APPLYING DATA SCIENCE TO YOUR SUBJECT AREA

• Data scientists generate deep insights and then use their domain-specific expertise to understand exactly what those insights mean with respect to the area in which they’re working…

17by K. AMPHAWAN

Page 18: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

APPLYING DATA SCIENCE TO YOUR SUBJECT AREA

• Subject matter experts using data science to enhance performance in their respective industries…

• Engineers use machine learning to optimize energy efficiency in modern building design.

• Clinical data scientists work on the personalization of treatment plans and use healthcare informatics to predict and preempt future health problems in at-risk patients.

18by K. AMPHAWAN

Page 19: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

by K. AMPHAWAN

• Marketing data scientists use logistic regression to predict and preempt customer churn.

• Data journalists scrape websites for fresh data to discover and report the latest breaking news stories.

• Data scientists in crime analysis use spatial predictive modeling to predict, preempt, and prevent criminal activities.

• Data do-gooders use machine learning to classify and report vital information about disaster-affectd communities for real-time decision support in humanitarian response.

19

Page 20: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

DATA SCIENCE IMPLEMENTATION

• To implement data science across an organization, or even just across a department, 3 approaches are available…

• Build an in-house data science team

• Out source work to external data scientists

• Use a cloud-based solution

20by K. AMPHAWAN

Page 21: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

BUILDING YOUR OWN IN-HOUSE TEAM

• 3 options…

• Train existing employees—a lower cost, train staff into data skilled, highly specialized subject matter experts

• Training existing employees and hire some experts—train employees to do high-level data scientist tasks, bring on new hires to fulfill more advanced data science problem-solving

• Hire experts—hiring advanced data scientists or fresh graduates with degrees in data science. (Problem: few people in data science, high saraly)

21by K. AMPHAWAN

Page 22: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

OUTSOURCING TO DATA SCIENCE CONSULTANTS

• 2 options…

• Outsource for development of a comprehensive data science strategy serving entire organization—costly but can receive valuable insights in return

• Outsource for piecemeal, individual data science solutions to specific problems—small portion of work, deliver the benefits without reorganize structure and financials of organization

22by K. AMPHAWAN

Page 23: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

LEVERAGING CLOUD-BASED PLATFORM SOLUTIONS

• Cloud applications such as IBM’s Watson Analyt ics(www.ibm.com/analyt ics/watson-analytics) offers users code-free, automated data services—from clean and statistical modeling to analysis and data visualization.

• Need in-house staff trained and skilled to design, run and interpret the quantitative results from these plaforms

23by K. AMPHAWAN

Page 24: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

BENEFITS OF DATA SCIENCES ACROSS INDUSTRY SECTORS

• Benefits for corporations, small and medium-sized enterprises (SMEs), and e- commerce businesses: Production-costs optimization, sales maximization, marketing ROI increases, staff-productivity optimization, customer-churn reduction, customer lifetime-value increases, inventory requirements and sales predictions, pricing-model optimization, fraud detection, and logistics improvements

24by K. AMPHAWAN

Page 25: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

BENEFITS OF DATA SCIENCES ACROSS INDUSTRY SECTORS

• Benefits for governments : Business-process and staff-productivity optimization, management decision-support enhancements, finance and budget forecasting, expenditure tracking and optimization, and fraud detection

25by K. AMPHAWAN

Page 26: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

BENEFITS OF DATA SCIENCES ACROSS INDUSTRY SECTORS

• Benefits for academia : Resource-allocation improvements, student performance management improvements, drop-out reductions, business-process optimization, finance and budget forecasting, and recruitment ROI increases

26by K. AMPHAWAN

Page 27: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

by K. AMPHAWAN 27

Page 28: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

by K. AMPHAWAN

WRAPPING UP YOUR KNOWLEDGE!!!

28

Page 29: INTRODUCTION TO DATA SCIENCE - Burapha Universitykomate/889500... · DATA SCIENCE … • represents process and resource optimization • produces “data insights”—insights

by K. AMPHAWAN

–Komate AMPHAWAN, komate(at)gmail.com

“Any feedback can help to improve quality of my teaching”

29