Visualization Lifecycle
-
date post
18-Oct-2014 -
Category
Education
-
view
7.072 -
download
1
description
Transcript of Visualization Lifecycle
VisualizationLifecycle
Raffael Marty
datainsightSan Francisco 2011
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Art
“Transform a dataset into a captive story.”
Youʼre on your own‣Assess‣Parse‣Clean‣Visualize
Visualization Tools and Libraries
Copyright (c) 2011pixlcloud | collect. visualize. understand.
AudienceExpert
Beginner
Technical Overview
Fun
Boring
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Visualization Process
Data Sources (Data Store) Structured Data
filesdatabase
filteringaggregationcleansing
Contextual Data
Visual Representation
visualization
iterations
parsingfeature selection
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Data Sources‣File‣Database‣API‣Factual‣Freebase‣Infochimps‣OpenStreetMap
curl ‘http://freebase.com/api/service/search?query=al+gore&indent=1’
XML, JSON, CSV, TSV
mysql -u root -p mydatabase < dump.sql
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Explore Data‣What is the data about?‣What are the data features/columns?‣Is there a common structure in the data?‣What are the data types?
Nov 7 09:14:46 fwbox kernel: DROPPED IN=eth0 OUT= MAC=00:0c:29:e3:45:bd:00:0c:29:b5:5c:ee:08:00 SRC=10.1.222.31 DST=10.1.222.202 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=63849 DF PROTO=TCP SPT=58485 DPT=9111 WINDOW=5840 RES=0x00 SYN URGP=0
May 25 20:24:20 ram-laptop kernel: BLOCK any in: IN=eth1 OUT= MAC=00:13:02:ac:d8:ea:00:09:5b:3d:df:00:08:00 SRC=213.175.90.24 DST=192.168.0.15 LEN=576 TOS=0x00 PREC=0x00 TTL=115 ID=23513 PROTO=TCP SPT=9030 DPT=56772 WINDOW=65535 RES=0x00 ACK URGP=0
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Parsing and Normalization‣Parsing‣extraction of entities / features‣imposing structure ‣often use regexes
‣Normalize‣field normalization‣term normalization: block, deny, dropped
‣Generate a common output format for vis-tools (e.g., CSV)
Oct 13 20:00:43.874401 rule 193/0(match): block in on xl0: 212.251.89.126.3859 >: S 1818630320:1818630320(0) win 65535 <mss 1460,nop,nop,sackOK> (DF)
Oct 13 20:00:43 fwbox local4:warn|warning fw07 %PIX-4-106023: Deny tcp src internet: 212.251.89.126/3859 dst 212.254.110.98/135 by access-group "internet_access_in"
Oct 13 20:00:43 fwbox kernel: DROPPED IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:0f:cc:81:40:94:08:00 SRC=212.251.89.126 DST=212.254.110.98 LEN=576 TOS=0x00 PREC=0x00 TTL=255 ID=8624 PROTO=TCP SPT=3859 DPT=135 LEN=556
Copyright (c) 2011pixlcloud | collect. visualize. understand.
ParserOct 13 20:00:38.018152 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 62.2.32.250.53: 34388 [1au][|domain] (DF)
Oct 13 20:00:38.115862 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 192.134.0.49.53: 49962 [1au][|domain] (DF)
Oct 13 20:00:38.157238 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 194.25.2.133.53: 14434 [1au][|domain] (DF)
(.*) rule ([-\d]+\/\d+)\(.*?\): (pass|block) (in|out) on (\w+): (\d+\.\d+\.\d+\.\d+)\.?(\d*) [<>] (\d+\.\d+\.\d+\.\d+)\.?(\d*): (.*)
Oct 13 20:00:38.018152,57/0,match,pass,in,xl1,195.141.69.45,1030,62.2.32.250,53,34388 [1au][|domain] (DF)
Oct 13 20:00:38.115862,57/0,match,pass,in,xl1,195.141.69.45,1030,192.134.0.49,53,49962 [1au][|domain] (DF)
Oct 13 20:00:38.157238,57/0,match,pass,in,xl1,195.141.69.45,1030,194.25.2.133,53,14434 [1au][|domain] (DF)
Raw
Normalized(CSV)
Regex / Parser
Copyright (c) 2011pixlcloud | collect. visualize. understand.
‣grep‣cat file | grep –v “foo”‣awk ‣awk –F, ‘{printf(“%s,%s\n”,$2,$1);}’‣awk -F, -v OFS=, ‘{print $2,$1}’‣sed‣sed -e 's/fubar/foobar/g' filename
UNIX Tools
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Regular Expression Resources ‣ http://regexlib.com‣ http://www.regular-expressions.info
‣ http://gskinner.com/RegExr
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Data Cleansing‣Filter
‣Normalize (see earlier)
‣Aggregation
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Load CSV into Database# mysql -u <user> -p
mysql> create database data;
mysql> create table set1 (id int, address varchar(20), ...);
mysql> LOAD DATA LOCAL INFILE 'input_file' INTO TABLE set1 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
Sometimes you just load your data into a tool, and you can omit this step
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Contextual Data‣Either dump into DB or use via API calls to augment
‣IP -> Geo mapping‣Information about countries‣Port number -> service name
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Feature Selection‣What are the fields you are interested in?‣Compute new fields‣start time, end time -> duration‣IP subnets [ 10.2.4.2 -> 10.0.0.0/8 or 192.168.1.2 -> 192.168.1.0/24 ]‣Entropy: H ( X ) = E ( I ( X ) )
‣Dimensionality reduction‣See Bryan’s talk!
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Choose Your Poison
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Ode to the Pie
Copyright (c) 2011pixlcloud | collect. visualize. understand.
A Good Visual‣Chose the right graph
‣Reduce non-data ink
‣ Simultaneous views
‣ Interactivity
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Visual Transformations‣keep iterating on visual transformations, change‣color‣shape‣features display‣add new fields? ‣add more context?‣is the output expressive?‣capture output and prettify it for presentation
Data Visualization Toolsand Libraries
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Tools and Libraries‣http://datainsightsf.com/resources/‣Choose what’s appropriate!‣Data Analysis and Visualization LInuX‣davix.secviz.org‣GraphViz‣graphviz.org‣AfterGlow (CSV -> DOT)‣afterglow.sf.net
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Libraries‣Reporting Libraries‣HighCharts‣Flot‣Google Chart API‣Open Flash Chart‣JQuery Sparklines ‣Polymaps
‣Visualization Libraries‣TheJIT‣Graphael‣Protovis‣ProcessingJS‣Flare‣D3
-
Copyright (c) 2011pixlcloud | collect. visualize. understand.
HighCharts
‣Click-Through‣On load‣near real-time updates
‣Zoomwww.highcharts.com
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Google Visualization API
‣JavaScript‣Based on DataTables()‣Many graphs‣Playground
‣ http://code.google.com/apis/ajax/playground
http://code.google.com/apis/visualization/interactive_charts.html
Copyright (c) 2011pixlcloud | collect. visualize. understand.
ProtoVis‣JavaScript based visualization library‣Charting‣Treemaps‣BoxPlots‣Parallel Coordinates‣etc.
http://vis.stanford.edu/protovis/
Copyright (c) 2011pixlcloud | collect. visualize. understand.
TheJIT‣JavaScript InfoVis Toolkit‣Interactive‣Link Graphs
http://thejit.org/
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Processing‣ Visualization library‣ Java based‣ Interactive (event handling)‣ Number of libraries to‣draw in OpenGL‣read XML files
‣ Processing JS‣ JavaScript‣HTML 5 Canvas‣WebGL‣Web IDE http://processing.org/
http://processingjs.org/
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Visualization Tools‣Gephi‣R‣Matlab‣Mondrian‣PicViz‣Treemap 4.1‣Google Earth
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Gephi‣reads: CSV, DOT, etc.‣graph analysis algorithms‣highly interactive
http://gephi.org
Copyright (c) 2011pixlcloud | collect. visualize. understand.
PicViz
http://www.wallinfire.net/picviz/
Copyright (c) 2011pixlcloud | collect. visualize. understand.
Treemap 4.1
http://www.cs.umd.edu/hcil/treemap/
Copyright (c) 2011pixlcloud | collect. visualize. understand.
•KML data format for encoding data
Google Earth
pixlcloudcollect. visualize. understand.
@raffaelmarty
buy now