Primer: Data-Driven Startups

2.774 views 0 download

Transcript of Primer: Data-Driven Startups

Primer: Data-Driven StartupsDigital Incubation Centre, Ministry of Transportation and CommunicationsDoha, QatarHeather Leson March 9, 2016

Data Examples

Cultural: Data about cultural works and artefacts — for example titles and authors — and generally collected and held by galleries, libraries, archives and museums.

Science: Data that is produced as part of scientific research from astronomy to zoology.

Finance: Data such as government accounts (expenditure and revenue) and information on financial markets (stocks, shares, bonds etc).

Statistics: Data produced by statistical offices such as the census and key socioeconomic indicators.

Weather: The many types of information used to understand and predict the weather and climate.

Environment: Information related to the natural environment such presence and level of pollutants, the quality and rivers and seas.

Transport: Data such as timetables, routes, on-time statistics.

Types of Open Data

(Source: okfn.org)

Kasra and QCRI: Connecting Startups & Research

Metis:

Collaborating with CMU to get data working within the privacy/security guidelines

Academic Planning Made Easier.

Mumm:

Connecting with the local Cairo data science community.

Data for food.

Exantium:

Strategy firm connecting open data to government and business. Part of a global network.

Data-Driven Recipes

1. How to:

Technical Training/Business for Data Literacy

2. How to:

Host a Data Expedition

StorytellerRole: Generate Ideas, interesting questions, help defining the questions and assist in the information products/story outputs.

ScoutRole: Scouts hunt down data from across the web. They can be non-technical or technical, depending on how difficult it is to obtain data (whether it is easily downloadable or needs to be scraped etc).

AnalystRole: Analysts are the ones who crunch the data found by the scouts and test the hypotheses generated by the storytellers.

“Engineers” (Optional)Role: create information outputs (varying degrees of technical from coding to using ‘off the shelf’ tools

DesignersRole: Beautify the outputs and make sure the story really comes through the data.

3. How to:

Data Clinics to connect entrepreneurs, business and government

Data Discovery

DIY Data:

BQ Magazine’s Faces of Qatar

DIY Data:

QCRI Social Computing

Groundtruth Data Collection

Phones, photos and food consumption for Health Monitoring

You are a Smart City: Create a local map dataset

Data Pipeline

Qatar Data Expedition

What are the questions you seek to answer?What is the license? Can you reuse/publish the data?Is the source credible? Is the data credible?Where did they get their data?How much time do I have to search?How am I organizing my research?

Keen to learn more about verification? http://verificationhandbook.com/ (it is in Arabic too!)

Consider

Who is publishing about Qatar...on biodiversity?

United States 7,440 occurrences, 97.77% geo-referenced.

United Kingdom 832 occurrences, 8.29% geo-referenced.

Sweden 620 occurrences, 0.32% geo-referenced.

Netherlands 298 occurrences, 5.03% geo-referenced.

Source: Global Biodiversity Information Facility

What about data on tourism?

Source: Knoema Data Atlas, which aggregates the World Development Indicators, 2015

$6, 616,000,000 USD International Tourism expenditures for travel items

(Time for more boutique travel startups)

Location Data

OpenStreetMap: Free, open Dataset

Get data: http://planet.osm.org/

GADM: Administrative Boundaries

Bing Imagery

Ministry of Development Planning and StatisticsIn economic statistics:

Quarterly and annual Gross Domestic Product -GDP (constant and current) by economic activity

Monthly, quarterly and annual Consumer Price Index, Production Price Index-PPI, Foreign Trade Statistics (import and export), Building permits

In social statistics:Labor force statistics (through a labor force sample survey)Marriage, health, birth, fertility, education, disability, mortality statistics (in coordination with

other ministries)In environmental statistics:Monthly rainfall, Monthly and annual average concentrations of air pollutants, Capacities of

urban wastewater treatment plantsIn population statistics: Population growth rate, Population sex ratio

QALM portal (Qatar Information Exchange)

QALM is an ambitious national project, developed by a number of government partners including: The General Secretariat for Development Planning, The Statistics Authority, The Supreme Council of Health, The Supreme Education Council, Supreme Council of Family Affairs, ictQATAR, Ministerial Cabinet and the Permanent Population Committee.

http://www.qalm.gov.qa/

Data is available in multiple formats!

To get data from the Ministry of Development. Check their website. If you are looking for other data, they are an email away. ICU@mdps.gov.qa

Using Data

Learn how: http://datadrivenjournalism.net/

"Expenditure Components Of GDP at Current Prices (Mn Qatari Riyal)Source - Ministry of Development Planning and Statistics

"

"",""," ",,,,,,,,,,,,,,,,,,"","","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013",,,,,"2014",,,,"","","Total","Total","Total","Total","Total","Total","Total","Total","Total","Q1","Q2","Q3","Q4","Total","Q1","Q2","Q3","Q4","Total""Gross Domestic product","B.1G",115512.376669,162091.018049205,221610.304141365,290151.574403828,419582.826273579,355986.474251774,455445,618089.239045503,692654.670488044,186654.189573065,177830.420532429,185433.336051801,189857.929208376,739776,193880.888003083,189653.51105388,193080.129441538,194397.657502752,771013.233251822"Household Final Consumption Expenditure","P.3a",20166,25889.8602243444,36186.326795032,49728.6119489121,64675.8351579253,68622.9919301139,73645.7899114015,79905.6820538706,87682.19979384,24130.4586981125,24802.4947262859,23572.4447936237,26368.9939206421,98874.3921386642,26807.1948166319,27414.3657651239,26424.7106136522,28729.6901996358,109375.961395044"Government Final Consumption Expenditure","P.3b",15094,23171.9888517611,32616.2047008325,35989.9119915317,42695.8750950427,55652.33697478,63689.0870608494,77007.4825664626,89527.4435418714,24336.9460716118,24384.7648280038,24240.4862291342,25297.5589689309,98259.7560976807,26593.3225341388,26861.3831859924,27030.5661941075,27714.3396569197,108199.611571158"Gross capital formation","P.5",36399.044558,55609.5389690997,92830.0390858622,133518.050463385,172523.116020611,152947.14534688,142449.123027749,177621.474425169,194347.357152333,49488.7848033409,49657.1609781394,58089.4050290433,60871.3763188034,218106.851763655,53389.3706523124,58868.7621027634,67296.8526337788,77579.6276461965,257731.66028562"Exports (Goods & Services)-F.O.B","P.6",74122.332111,105496.630004,139210.733559638,174896,257467,182033,283832,442959.8,520182,141152,131890,134332,131751,539125,146457,134748,131592,116481,528682"Imports (Goods & Services)-F.O.B","P.7",-30269,-48077,-79233,-103981,-117779,-103269,-108171,-159405.2,-199084.33,-52454,-52904,-54801,-54431,-214590,-59366,-58239,-59264,-56107,-232976

"*Figures for 2013 & 2014 are Preliminary estimatesPowered by © QALM"

Census data extracted...not usable yet..

Qatar Census

(Source: Doha News 2016)

South African Census Data

Open Refine http://openrefine.org/

Sublime Text https://www.sublimetext.com/

There are many tools for software developers and data scientists too.

Note: you still need the Human API to analyze and make decisions for your business. Of course, if you can afford it, then you can get your business intelligence from KPMG, Gartner, Bloomberg, McKinley or PWC. Until then….

Some tools to Clean Datasets

Learn more with Lillian and her online courses.

Tools for Charts, Graphs and Infographics

http://tableau.com/

http://infogr.am/

http://piktochart.com/

https://www.canva.com/

More LMGTFY: http://www.creativebloq.com/design-tools/data-visualization-712402

(source: TuktukDesign, Noun Project ccby)

Map toolsMapbox: http://mapbox.com/CartoDB: http://academy.cartodb.com/Leaflet: http://leafletjs.com/Google: https://www.google.com/mapmakerARCgis: https://www.arcgis.com/features/

Time mapper: http://timemapper.okfnlabs.org/

Also: if you are collecting your own location data, try Field Papers or crowdsource map photos with Mapillary. (They just got 8M funding!)

(source: Mister Pixel, Noun Project, ccby)

QCRI Combining Data Sources: Real-Time Traffic Monitoring

● Collection and classification of traffic related tweets (script, research tool)

● Continuous Real-time querying of Google Traffic API

● Qatar Traffic Profiling & Modeling○ Geo: City, zone, district○ Time: Hourly, daily, weekly,

and monthly

● Usage: ○ Detection of abnormal

behaviors○ Predictions○ Monthly Public reports

■ Commute status■ Deadpoints

The best way to learn is to find data and make data information products.

Try to recreate the diagrams and track back the data.

Track how other startups use data. Copy. Remix.

Social Entrepreneurship & Social Good

ABC: Always be Charging

How can you have a Data-Driven Career?

What is your Data Plan for your startup?

Can you use Data-Driven Journalism techniques to improve your business?

What kind of data do you need to grow your business?

What type of training do you want/need?

Thank you

@heatherleson@qatarcomputing