XDATA · PDF fileOpen Source Alternative •Open source software –“Try before...

25
XDATA Overview

Transcript of XDATA · PDF fileOpen Source Alternative •Open source software –“Try before...

Page 1: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

XDATA Overview

Page 2: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

• The Defense Advanced Research Projects Agency (DARPA) was established in 1958 to

─ prevent strategic surprise from negatively impacting U.S. national security and

─ create strategic surprise for U.S. adversaries by maintaining the technological superiority of the U.S. military

• DARPA undertakes projects that are finite in duration but that create lasting revolutionary change.

─ Short duration

─ High impact

• Time limited Program Managers

What is DARPA?

Page 3: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Outline

• DARPA Tech Transfer– Government Acquisition

• The XDATA Stack

• XDATA Transition

• DARPA Support -> Community Support

Page 4: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

How do DARPA programs transition?

• During a DARPA program

– Technology development

– Prototype developed with transition partner

– Technology demonstration

– Declare success

• After DARPA program

– Services start acquisition “Program of record”

– Then…

Page 5: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

DoD Acquisition

This can take between 5-20 years to deliver

Page 6: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Open Source Alternative• Open source software

– “Try before you buy”

– Prototyping of complex systems from DARPA components

• XDATA Transition Model– DARPA resources applied, developers embedded at

receiving site

– Tools identified, gaining org resources, DARPA provides limited expertise

– Org contracts with XDATA Performer under own contract to act as conduit for tool integration

– Tools identified, no DARPA funding or expertise applied• As a data science research effort, we don’t build systems/applications,

rather modules that enable systems/applications

Page 7: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Outline

• DARPA Tech Transfer– Government Acquisition

• The XDATA Stack

• XDATA Transition

• DARPA Support -> Community Support

Page 8: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

The XDATA Program

• XDATA: 26 developers of big-data components

– Languages: Julia, SciDB, SciPy, Numba, Delite

– Platform: Spark, BayesDB, Blaze, Gunrock

– Math/ML: CVX, igraph, skylark, elemental, SNAP

– Analtics/NLP: MITIE, topic, GraphQuBE

– Viz: Tangelo, D3, Vega, Bokeh, Ozone, Aperture

• Transition investment up front: $8m/year investment to build customer-facing systems

• All open source, non-GPL (i.e. liberally licensed)

Page 9: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Batch (size)

Stream (rate)

Radar

Speech

Video

Text

Spreadsheets

Metadata

PCAP

IP Networks

Bio

Etc…

Disk / RAM / GPU

Da

ta In

gest

AP

ITi

kka,

OO

DT-

Win

gs

Wo

rkfl

ow

(M

DA

/JP

L), Z

eph

yr (

Sote

ra),

Op

tiW

ran

gler

(Sta

nfo

rd O

), O

ther

ETL

HDFS / HadoopMap / ReduceGiraphOozieHamaRHIPEHiveGoFS

USCAccumulo

BlinkDBStormSpark StreamingMesos

TachyonSparkSharkBagel

Java

HiveQL

PythonAnacondaBlazeNumba

Clojure

SNAPDelite

OptiCVXOptiGraphOptiML

g++

CUDABigdata DB

MPIElementalBLASOpenMPI

Single Machine

HPCTBD

GPUgpuGraph

RMatlabStataRiposteSNAP

New School Defaults and design

recommendations

Boeing / U Pitt Genie

Continuum Stencil DSL

HTML5

d3.js

Vega

Aperture

OWF / Neon

Disk / RAM / GPU

Sources

UCB

StanfordGA TechIBMContinuum

JHUStanfordPhronesisCMUPurdueStanford O

PurdueUSCSoteraMITLLData Tactics

Continuum

Bokeh

Graph / MatrixIBM Skylark(RNLA)Sotera Louvain USC BSP OptimizationGA Tech NNMFJHU VNOMStanford O SNAP

Time SeriesSotera ARIMAGA Tech Time SeriesSSCI/MIT/UL BayesDB

Machine LearningSSCI/MIT/UL BayesDBMITLL/BBN NLPBoeing/Upitt SMILEMITLL TopicIDDT/UMD/MSR

ClassificationUCB BlinkDB

Distance MeasuresTBD e.g, Cosine Sim, Page rank

GA Tech / Stanford B / Continuum

Convex OptimizationSotera

Correlation Approximation

Royal Caliper

UC DavisSYSTAP

Platform Abstractions Analytics

Da

ta A

nal

ytic

s A

PI

Lin

coln

Lab

sA

PI.

Hiv

eQL

AP

I, O

OD

T-W

ings

Wo

rkfl

ow

(M

DA

/JP

L)

Ad

-Ho

c R

etri

eval

AP

IM

on

go,

Imp

ala,

Sh

ark

(UC

B),

OO

DT-

Win

gs W

ork

flo

w (

MD

A/J

PL)

Stanford O

RAM / Cache / GPU

Visualization

Stepi+1

UCB CherryPy

Tangelo

Next Century

Oculus

Kitware

Stanford / Kitware

Continuum Logg

ing

AP

IA

tten

tio

n T

rack

ing

/ Ey

e Tr

acki

ng

/ U

ser

Inte

rfac

e A

ctio

n T

rack

ing

via

keys

tro

ke l

ogg

ing

and

mo

use

clic

kin

g(D

rap

er)

Inte

ract

ive

Qu

eryi

ng

Sear

ch /

Qu

ery

/ S

ub

sele

ct/

Sum

mar

ize

/ Li

nk-

Bru

sh /

Piv

ot

/ Zo

om

/ E

xpo

rt /

Inte

grat

e /

Inte

rsec

t

“Find me everything on this number”

“Find me a pattern of behavior on this entity”

“Where will this person be tomorrow”

“What groups does this group interact with”

Transforms

Tracklets

Speech to text

Content extraction

Entity ExtractionTopic Modelling

Pcapunpacking

Geocoding

Other basic transforms

SQL-Like

Graph-Like

Geo/Time-Like

Oculus / Kitware

Oculus / Kitware

Oculus / Kitware

Interaction Queries

Virtual EnvironmentsImmersive

3d

Physical Devices

(Multi-touch) ICT

Workflow Management / System ArchitectureMDA/JPL, UCB, Sotera, Draper, Data-Tactics

Quality of ServiceLatency from clickRAID and redundancy; #GPUs, RAM, SSD, CPUAdjustable tradeoffs: Volume, Time, Accuracy / Uncertainty

Resource Management/Planning Dependencies; Scheduler Performance and load estimatesUser Interface with workflow planning

ICTInteractive

Coding(Wakari,

Julia)MITLL / Continuum

Page 10: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Mesos

• Cloud scheduling via linux containers

• Plays nice with Hadoop/Spark/MPI clusters

Page 11: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Spark

• In-memory distributed database/computation

• Interfaces to HIVE/HDFS/Mesos

• Apache top-level

Page 12: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Anaconda/Numba

1. Python distribution for large-scale data processing, predictive analytics, and scientific computing

2. Python compiled to CPU/GPUs with autovectorization

Page 13: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Julia

• High-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users – Python/java call interfaces

– Simple distributed and concurrent programming

Simple Code

Parallelization

Page 14: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

DELITEOptiGraph, OptiCVX, OptiML

• Compiler framework for parallel domain specific languages

Page 15: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Tangelo

• Python/web-based visualization of big data

• Examples– http://xdata.kitware.com/examples/

Page 16: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Aperture and Aperture Tiles

• Scalable visualization of huge datasets from hadoop/spark/impala

Page 17: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Outline

• DARPA Tech Transfer– Government Acquisition

• The XDATA Stack

• XDATA Transition

• DARPA Support -> Community Support

Page 18: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

XDATA Challenge Datasets

# Tweets: 292.7M+ # Unique Users: 7.6MTotal Size: 232 GB

# Tweets: 1B+ # Unique Users: 94M+# Geolocated: 31M+Total Size: 146 GB

TWITTER

# Loans: 524,514 # Lenders 1.1M# Partners: 238

# Transactions: 4,069,217 # Total $ Loaned: $418M (USD) # Journal Entries: 307,831

KIVA MICROLOANS

VADER: 5 GBSR Hawk: 60 MBMAGIC: 1.2 GBLYNX II: 11 GB

HYDRA: 44 GBGMBIT: 2.2 TBGEOnet: 3.2 TB

GISR

# Senders: 5.4M+# Receivers: 6.3M+ # Bitcoins Xted: 1.4M+

# Transactions: 15.8M+ # Edges: 37.4M+

BITCOIN

# Hourly Summaries: 630 Million+ # Observed Hits: 734 Trillion+ # Bytes Transacted: 31 Exabytes+

TRACEROUTE

Text/Unstructured Data

#2.1 Million unique employment ads in Spanish from South America

# Game, Player, Team statistics# 6 Independent text types for each game

Page 19: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

XDATA – Summer Workshop 2014

20

• Summer 2014 – 8 Weeks

• 30+ teams; average of 45 performers/day onsite

• Phase 1 – Finalized documentation and publication of 75 open source software components and 135 academic papers on the DARPA Open Catalog

• Phase 2 – User testing, benchmarking, and new development

• In 2014, 4 new datasets added

• Cyber (Web Data Commons Hyperlinks - 3.5 billion pages, 128 billion edges)

• Cyber (Distributed Net Scans - 6 TB)

• Unstructured Text (Foreign language employment advertisements – 40 GB with 2.1 million unique jobs)

• Structured, unstructured (NBA statistical, news reports and crowd-sourced - 6 independent text types for each game)

Page 20: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

21

• Automated community detection based on structural characteristics

– Processing 7B+ traceroute-hops map the Internet as traveled

– Detects hierarchical groups of IP addresses with high density connections

Each dot represents a group of IP addresses; each color shows a detected community, often aligned with political boundaries

Ability to summarize data at an organizational level and provide multiple levels of abstraction / characterization

Zoom Out!

XDATA Application to Cyber: Reveal Organizations

Page 21: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Multi-level interactive geo-tiles with embedded analytics

XDATA Application to Social Media: Interactive Tile Apps

Tile features:

• Word clouds

• Time series

• Rank changes

• Histograms

• Heat maps

• …

Interactive using 1B+ Tweets

UNCLASSIFIED

UNCLASSIFIEDDistribution Statement A – Approved for Public Release, Distribution Unlimited

Page 22: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

Outline

• DARPA Tech Transfer– Government Acquisition

• The XDATA Stack

• XDATA Transition

• DARPA Support -> Community Support

Page 23: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

XDATA Transition + Open Catalog

• http://www.darpa.mil/opencatalog/

Page 24: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

DARPA’s open source office

• DARPA recognized that just open sourcing isn’t enough– Need for maintenance, improvement, support

• JPL (an FFRDC) is DARPA’s open source center of excellence– JPL has been part of the Apache foundation board

– Currently supporting 5 large open source projects

• Mission: build community, continued support for software maintenance

Page 25: XDATA  · PDF fileOpen Source Alternative •Open source software –“Try before you buy” –Prototyping of complex systems from DARPA components •XDATA Transition Model

www.darpa.mil

26

http://www.darpa.mil/opencatalog

Doug Gelbach, [email protected]

Mr. Wade Shen, [email protected]