Moving from Business Analyst to Data Analyst to Data Scientist…Advancing Your Career and Your Organization
Joanne Carswell
September 2015
• Definitions—BA, DA, DS
• Roles on Project Teams
• Types of Analysis
• Business Analyst Core Skills
• Data Analyst Required Skills
• Data Scientist Required Skills
• Evolving Your Business Analysts
• References
• Six Sigma Green Belt
• Certified Business Analyst Professional (CBAP)
• Certified Scrum Product Owner for Agile (CSPO)
• With Cox Automotive 6 years
• IT Manager for Data Services
• Over 15 years experience in IT SDLC
• Presented at IIBA BBC 2012 & 2013 Conferences, Project World & World Congress for Business Analysts 2013, & BA World 2013, 2014, & 2015
• Loves to travel
• Plays tennis
• Dislikes yard work
• Favorite food is steak
• Participates in 5Ks
• Bike enthusiast
• Completed a 9 class program for data science
• Favorite hobby is 251 finds!
Definitions and Roles
• Serves as a liaison between the business users and developers
• Key activities in SDLC
– Business Analysis Planning & Monitoring
– Elicitation
– Requirements Analysis
– Requirements Management & Communication
• Define business need and business case
• Modeling current state
• Modeling future state
• Gap analysis
• Functional requirements
• Assessing proposed solution
Others?
• Works in SDLC as an expert on the data
• May also have Business Analyst responsibilities
• Key activities
– Data definitions
– Data mapping
– Data quality
– Data governance
– Data modeling
– Data profiling
• Modeling current state of data
• Modeling future state of data
• Uncovering inconsistencies in data
• Data mapping for source and target
• Aligning to data governance standards
• Documenting data, metrics, and KPIs
• Determining reporting needs
• Investigation of large complex problems
• Employs techniques and theories from mathematics, statistics, and information technology
• Key activities
– Manage dataset(s)
– Analyze data
– Apply statistical techniques for discovery
– Create visualizations to aid in understanding
• Challenging the data and solutions
– What data is missing?
– What data is incomplete?
– What data is not structured well?
• Real time versus the right amount of time
• Are we missing something with this project?
– What is the data telling us?
• What does success with this project look like?
– Measuring success after implementation
• Communication
• Facilitation & Negotiation
• Leadership & Influencing
• Trustworthiness
• Teaching
• Teamwork
BABOK v2: Chapter 8, pages 141-154
• Business Knowledge
• Problem Solving
• Critical Thinking
• Systems Thinking
BABOK v2: Chapter 8, pages 141-154
• Descriptive: Summarize what happened
– Example: 20% of a company’s customers cancel each year
• Predictive: Predict the future based on the analysis of what happened
– Example: Customers who cancel have typically been with the company for less than 2 years and have lower end price point products. External variables could be used for predictive.
• Prescriptive: Recommend a course of action based on prediction and show implications of each decision option
– Example: Target those customers deemed high risk with offers to move to other products and marketing to remind them of the benefits of the company. Measure amount of these customers who do cancel to understand impact comparing to typical cancellations
• Customer retention
• Cross selling
• Marketing
• Fraud detection
• Medical decisions
Core Skills for Business Analysts
• Process Modeling
• Prototyping
• Root Cause Analysis
• State Diagrams
BABOK v2: Chapter 9
New
Established
Closed
Open >60 days with no
over draws
Suspended
>$0 balance
<$0 balance for 30 days
<$0 balance for 30 days<$0 balance for >10 days
Required Skills for Data Analysts
• Data Dictionary
• Database Knowledge
• Data Flow Diagrams
• Data Modeling
BABOK v2: Chapter 9
Required Skills for Data Scientists
• R Programming (or SAS)
– Reading data
– Cleaning data
– Combining data
– Changing column headings
– Exploring data
– Creating reproducible research
• Why R?
– Open source
– Community for R
– Extended packages
• R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
https://cran.r-project.org/bin/windows/base/
Choose the latest version
• R Studio is a set of integrated tools for working with R. It includes a console, editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
https://www.rstudio.com/products/rstudio/download/
Choose the one under Installers that matches your operating system
• data <-read.csv(file="simple.csv",head=TRUE,sep=",")
• data <- read.table( fileName, sep="\t" ) ##tab separated
• df <- read.csv.sql("sample.csv", "select id, name from file where age=23")
• df <- read.csv("myfile.csv", skip=5)
• names(df) <- c("new_name", "another_new_name")
• total <- merge(data frameA, data frameB, by=c("ID","Country"))
• final[complete.cases(final),]
• mtcars dataset
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (lb/1000)
[, 7] qsec 1/4 mile time
[, 8] vs V/S
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
A data frame with 32 observations on 11 variables.
Source
Henderson and Velleman (1981), Building multiple regression models interactively.
Biometrics, 37, 391–411.
Description
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel
consumption and 10 aspects of automobile design and performance for 32
automobiles (1973–74 models).
• names(mtcars)
• nrow(mtcars)
• ncol(mtcars)
• head(mtcars)
• tail(mtcars)
• summary(mtcars)
• summary(mtcars$mpg)
• hist(mtcars$mpg, 10, col=“blue”, xlab=“Miles Per Gallon”)
• boxplot(mtcars$mpg ~ mtcars$am, data=mtcars, outpch = 19, col = c(“red”, “blue”), ylab=“miles per gallon”, xlab=“type of transmission”, main=“mpg vs transmission”)
• boxplot(mtcars$mpg ~ mtcars$cyl, data=mtcars, outpch = 19, col = c(“blue”, “green”, “yellow”), ylab=“miles per gallon”, xlab=“number of cylinders”, main=“Mileage by Cylinder”)
• library(rgl)
• plot3d(mtcars$wt, mtcars$disp, mtcars$mpg, col="red", size=3)
• R Programming (or SAS)
– Statistical Inference
– Regression Modeling
– Practical Machine Learning
The process of deducing properties about a population: this includes testing hypotheses and deriving estimates. The observed data is assumed to be sampled from a larger population.
• Understanding complexities
– Confounding
– Missing data
– Biases (Population or variable)
Business knowledge is critical in statistical inference
• Regression analysis
• Linear models
Estimating the relationship between a dependent variable and one or more independent variables
• attach(mtcars)
• plot(wt, mpg, main="Scatterplot Example",
xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
• abline(lm(mpg~wt), col="red") # regression line (y~x)
• Building and applying prediction functions
– Data collection
– Feature creation
– Algorithms
– Evaluation
1. Separate data into testing and training sets
inTrain <- createDataPartition(y = mtcars$mpg,
p = .75, list = FALSE)
training <- mtcars[ inTrain,]
testing <- mtcars[-inTrain,]
2. Run code to determine relationships
3. Create algorithm by analyzing data
4. Test algorithm on testing set
• Telling a story with data
• Dashboards
• Graphs
• Storyboards
Platfora
Tableau
Datameer
Summary of Skills
Skills Business Analyst Data Analyst Data Scientist
Communication
Problem Solving
Critical Thinking
Business Knowledge
Systems Thinking
Trustworthiness
Teaching
Facilitation &
Negotiation
Leadership & Influencing
Teamwork
BABOK v2: Chapter 8, pages 141-154
Skills Business Analyst Data Analyst Data Scientist
Data Dictionary
Data Flow Diagrams
Data Modeling
Metrics & Key Performance
Indicators
Process Modeling
Prototyping
Root Cause Analysis
State Diagrams
R Programming (or SAS)
Statistical Modeling
Practical Machine Learning
BABOK v2: Chapter 9, pages 155-221
Evolving Your Business Analysts
• Evaluate how Data Scientists could be used in your organization
• Evaluate your existing Business Analysts and/or Data Analysts
– Do they have a desire to evolve?
– Level of business knowledge
– Level of technical background
• Identify gaps in skills of the group
– Individuals
– Group needs
• Create a plan for evolving
• Individual classes
• Group classes
• Online resources
– Coursera
– Udemy
• College certification programs
– Statistics
– Database management
– Data science
• Wikipedia: The Free Encyclopedia En.wikipedia.org
• Coursera.org Data Science Track courses
• IIBA Business Analysis Body of Knowledge (BABOK)
• R Datasets Package
• R Studio website
• Datameer.com for screenshot
• Platfora.com for screenshot
• Tableau.com for screenshot
• Microsoft Office Clipart and Photos Online
“Used with permission from Microsoft”
Name: Joanne Carswell
Email: [email protected]
Connect to me on LinkedIn
I look forward to hearing from you!
Moving from Business Analyst to Data Analyst to Data Scientist…Advancing Your Career and Your Organization
Questions?
Top Related