Big Data Conference

21
Big Data Conference 2013: Analytics and Applications for Federal Big Data Data Tactics Corp: A Blended Approach to Big Data Analytics Richard Heimann, Data Scientist at Data Tactics Corporation

description

 

Transcript of Big Data Conference

Page 1: Big Data Conference

Big Data Conference 2013: Analytics and Applications for Federal Big Data

Data Tactics Corp: A Blended Approach to Big Data Analytics

!Richard Heimann,

Data Scientist at Data Tactics Corporation

Page 2: Big Data Conference

Data Tactics Analytics Practice The Team: (Nathan D., Shrayes R., David P., Adam VE., Geoffrey B., Rich H.) ! Graduates from top universities...! Advanced degrees include:

mathematics, computer science, astrophysics, electrical engineering, mechanical engineering, statistics, social sciences.

!Base competencies (horizontals): clustering, association rules, regression, naive bayesian classifier, decision trees, time-series, text analysis.

!Going beyond the base (verticals)...

Page 3: Big Data Conference

Horizontals & Verticals

Clustering || Regression || Decision Trees || Text Analysis Association Rules || Naive Bayesian Classifier || Time Series Analysis

econ

ometr

ics

spatia

l econ

ometr

ics

graph

theo

ry alg

orithm

s

astrop

hysica

l time-s

eries a

nalys

is

path

plann

ing alg

orithm

sba

yesian

statis

tics

const

rained

optim

izatio

ns

numeric

al inte

gratio

n tec

hniqu

es

PCA

bagg

ing/bo

osting

hierar

chica

l mod

els

IRT

DLISA

latent

class

analy

sis

struc

tural e

quatio

n mod

eling

mixture

modelsSVM

maxent

CARTau

toreg

ressiv

e mod

els

ICAfac

tor an

alysis

Rando

m Fores

t

dimen

siona

l redu

ction

topic m

odels

sentim

ent a

nalys

is

Page 4: Big Data Conference

Hierarchy of Data Scientists

Data Tactics Analytics Practice

Page 5: Big Data Conference

Why Analytics [Business]??? Why are analytics important?

(Business, Analytics, Practical) !!

"We need to stop reinventing the cloud and start using it!"

(Dave Boyd) !!!!

!

Page 6: Big Data Conference

Why are analytics important? (Business, Analytics, Practical)

!!No Free Lunch (NFL): no algorithm performs better than any other when their performance is averaged uniformly over all possible problems of a particular type. Algorithms must be designed for a particular domain or style of problem, and that there is no such thing as a general purpose algorithm.!!!!

Why Analytics [Analytics]???

Page 7: Big Data Conference

If this guy doesn’t scale - none of us do.

Web Scales

Academic Publications Scale

IC Scales

N

t

t

Why Analytics [Practical]???

Page 8: Big Data Conference

algo to users > algo to dataDevelopment

Deployment

Machine User

Parallel Distributed Objective Subjective

Valid

Nontrivial

Accurate

Useful

Novel

Comprehensible

M/R

MPP

HDFS

GPU

SOA

Page 9: Big Data Conference

ShinyOpen Sourced by RStudio in November 2012!Not the first to wrap R in the browser but perhaps the easiest for R developers !Don’t need to know HTML, CSS and javascript to get started !Reactive Programming model !Web sockets for communication

Page 10: Big Data Conference

server.R# Define server logic required to generate and plot a random !# distribution!shinyServer(function(input, output) {! ! # Expression that generates a plot of the distribution.! # renderPlot:! #! # 1: Is "reactive" and will therefore automatically ! # re-executed when inputs change.! # 2: Its output type is a plot. ! ! output$distPlot <- renderPlot({! ! # generate an rnorm distribution and plot it! dist <- rnorm(input$obs)! hist(dist)! })!})

Page 11: Big Data Conference

ui.Rlibrary(shiny)!!# Define UI for application that plots random distributions !shinyUI(pageWithSidebar(! ! # Application title:! headerPanel("My Shiny App!"),! ! # Sidebar with a slider input for number of observations:! sidebarPanel(! sliderInput("obs", ! "Number of observations:", ! min = 0, ! max = 1000, ! value = 500)! ),! # Show a plot of the generated distribution:! mainPanel(! plotOutput("distPlot")! )!))

Page 12: Big Data Conference

ui.R

headerPanel()

sidebarPanel() mainPanel()

Page 13: Big Data Conference

server.R + ui.R = microscope

adjustable parameters (knobs): 0 < knobs < small k knobs = lighting, varying objectives, focusing (fine and course) !

knobs: fine and course filtering:

geographytimevariable of interest observations of interest

promote significant (objective) patternschange model parameters

Page 14: Big Data Conference

BDE + Shiny

Page 15: Big Data Conference

Latent Spatial Traffic Patterns

12

3

Overlapping SolutionsMultiple models allow more nuanced learning from data. !Convergent results serve as cross-validation. !Points of divergence provide additional insights and allow models to be calibrated further. !Different models can provide answers to different questions or answers to the same question for different analysts. !Multi-method excels to diverse teams with mutable missions. !smooth + rough = data !New paradigm where the question, “Are there multiple, overlapping ways to solve this problem” dominate.

Page 16: Big Data Conference

Overlapping Solutions

Analyt

ic A

Analytic B

Analytic C

A + B + C

B + CA + C

A + B

Are there multiple, overlapping ways to solve this problem?

Page 17: Big Data Conference

Summary:

# our blended approach !dt.philosophy <- lm(analytics ~ bigdata +

smalldata + objective + subjective:overlapping.solutions, data=data)

Page 18: Big Data Conference

Overlapping Solutions

Page 19: Big Data Conference

About (DS4G): !1: Improve on definitions of analytics.2: Outline optimal interactions with Data Scientists.3: Provide a life-cycle for Data Science.4: Most importantly, share a taxonomy to identify analytical questions one could ask of data (Causal Effects, Classification, Outlier Detection, Big Data and Analytics, Measurement Models, & Text Analysis) !Presented by Data Tactics Analytics TeamLocation: TBD Time: 1Q 2014Duration: ~ 5 hrs.Cost: FREEAudience: Government managers and Data Tactics partners with their customers.

Data Science for Government (DS4G)

Page 20: Big Data Conference

http://www.meetup.com/Data-Science-DC/events/146953142/

LUBAP goes wild!421 attending!

Page 21: Big Data Conference

Thank you...

Questions?Homepage: http://www.data-tactics.comBlog: http://datatactics.blogspot.comTwitter: @DataTactics

Or, me (Rich Heimann): [email protected]: http://www.slideshare.net/DataTactics/presentations