Post on 27-Jan-2015
description
Big Data Conference 2013: Analytics and Applications for Federal Big Data
Data Tactics Corp: A Blended Approach to Big Data Analytics
!Richard Heimann,
Data Scientist at Data Tactics Corporation
Data Tactics Analytics Practice The Team: (Nathan D., Shrayes R., David P., Adam VE., Geoffrey B., Rich H.) ! Graduates from top universities...! Advanced degrees include:
mathematics, computer science, astrophysics, electrical engineering, mechanical engineering, statistics, social sciences.
!Base competencies (horizontals): clustering, association rules, regression, naive bayesian classifier, decision trees, time-series, text analysis.
!Going beyond the base (verticals)...
Horizontals & Verticals
Clustering || Regression || Decision Trees || Text Analysis Association Rules || Naive Bayesian Classifier || Time Series Analysis
econ
ometr
ics
spatia
l econ
ometr
ics
graph
theo
ry alg
orithm
s
astrop
hysica
l time-s
eries a
nalys
is
path
plann
ing alg
orithm
sba
yesian
statis
tics
const
rained
optim
izatio
ns
numeric
al inte
gratio
n tec
hniqu
es
PCA
bagg
ing/bo
osting
hierar
chica
l mod
els
IRT
DLISA
latent
class
analy
sis
struc
tural e
quatio
n mod
eling
mixture
modelsSVM
maxent
CARTau
toreg
ressiv
e mod
els
ICAfac
tor an
alysis
Rando
m Fores
t
dimen
siona
l redu
ction
topic m
odels
sentim
ent a
nalys
is
Hierarchy of Data Scientists
Data Tactics Analytics Practice
Why Analytics [Business]??? Why are analytics important?
(Business, Analytics, Practical) !!
"We need to stop reinventing the cloud and start using it!"
(Dave Boyd) !!!!
!
Why are analytics important? (Business, Analytics, Practical)
!!No Free Lunch (NFL): no algorithm performs better than any other when their performance is averaged uniformly over all possible problems of a particular type. Algorithms must be designed for a particular domain or style of problem, and that there is no such thing as a general purpose algorithm.!!!!
Why Analytics [Analytics]???
If this guy doesn’t scale - none of us do.
Web Scales
Academic Publications Scale
IC Scales
N
t
t
Why Analytics [Practical]???
algo to users > algo to dataDevelopment
Deployment
Machine User
Parallel Distributed Objective Subjective
Valid
Nontrivial
Accurate
Useful
Novel
Comprehensible
M/R
MPP
HDFS
GPU
SOA
ShinyOpen Sourced by RStudio in November 2012!Not the first to wrap R in the browser but perhaps the easiest for R developers !Don’t need to know HTML, CSS and javascript to get started !Reactive Programming model !Web sockets for communication
server.R# Define server logic required to generate and plot a random !# distribution!shinyServer(function(input, output) {! ! # Expression that generates a plot of the distribution.! # renderPlot:! #! # 1: Is "reactive" and will therefore automatically ! # re-executed when inputs change.! # 2: Its output type is a plot. ! ! output$distPlot <- renderPlot({! ! # generate an rnorm distribution and plot it! dist <- rnorm(input$obs)! hist(dist)! })!})
ui.Rlibrary(shiny)!!# Define UI for application that plots random distributions !shinyUI(pageWithSidebar(! ! # Application title:! headerPanel("My Shiny App!"),! ! # Sidebar with a slider input for number of observations:! sidebarPanel(! sliderInput("obs", ! "Number of observations:", ! min = 0, ! max = 1000, ! value = 500)! ),! # Show a plot of the generated distribution:! mainPanel(! plotOutput("distPlot")! )!))
ui.R
headerPanel()
sidebarPanel() mainPanel()
server.R + ui.R = microscope
adjustable parameters (knobs): 0 < knobs < small k knobs = lighting, varying objectives, focusing (fine and course) !
knobs: fine and course filtering:
geographytimevariable of interest observations of interest
promote significant (objective) patternschange model parameters
BDE + Shiny
Latent Spatial Traffic Patterns
12
3
Overlapping SolutionsMultiple models allow more nuanced learning from data. !Convergent results serve as cross-validation. !Points of divergence provide additional insights and allow models to be calibrated further. !Different models can provide answers to different questions or answers to the same question for different analysts. !Multi-method excels to diverse teams with mutable missions. !smooth + rough = data !New paradigm where the question, “Are there multiple, overlapping ways to solve this problem” dominate.
Overlapping Solutions
Analyt
ic A
Analytic B
Analytic C
A + B + C
B + CA + C
A + B
Are there multiple, overlapping ways to solve this problem?
Summary:
# our blended approach !dt.philosophy <- lm(analytics ~ bigdata +
smalldata + objective + subjective:overlapping.solutions, data=data)
Overlapping Solutions
About (DS4G): !1: Improve on definitions of analytics.2: Outline optimal interactions with Data Scientists.3: Provide a life-cycle for Data Science.4: Most importantly, share a taxonomy to identify analytical questions one could ask of data (Causal Effects, Classification, Outlier Detection, Big Data and Analytics, Measurement Models, & Text Analysis) !Presented by Data Tactics Analytics TeamLocation: TBD Time: 1Q 2014Duration: ~ 5 hrs.Cost: FREEAudience: Government managers and Data Tactics partners with their customers.
Data Science for Government (DS4G)
http://www.meetup.com/Data-Science-DC/events/146953142/
LUBAP goes wild!421 attending!
Thank you...
Questions?Homepage: http://www.data-tactics.comBlog: http://datatactics.blogspot.comTwitter: @DataTactics
Or, me (Rich Heimann): rheimann@data-tactics-corp.comSlideshare: http://www.slideshare.net/DataTactics/presentations