Big Data analytics and models
-
Upload
bbva-innovation-center -
Category
Technology
-
view
108 -
download
1
description
Transcript of Big Data analytics and models
INNOVA CHALLENGE BigDataSpain 7/11
Esteban Moro Alejandro Llorente
www.iic.uam.es
Workshop BBVA – Open InnovaHon
AnalyHcs & Models
INNOVA CHALLENGE BigDataSpain 7/11
INNOVA CHALLENGE BigDataSpain 7/11
h*ps://www.centrodeinnovacionbbva.com/en/innovachallenge
INNOVA CHALLENGE BigDataSpain 7/11
Maps
AcHvity
Infrastructures/Places
Analysis
Models
App
Content
VisualizaHon
AnalyHcs and Models
Challenge par?cipant “roadmap”
Data Mining Development
INNOVA CHALLENGE BigDataSpain 7/11
IntroducHon to geo-‐tagged data Access to (open) geo-‐tagged data Example: development of geolocalized recommender app.
Summary
IntroducHon to geo-‐tagged data
INNOVA CHALLENGE BigDataSpain 7/11
IntroducHon to geo-‐tagged data
InformaHon: Person, event, infrastructure.
Geography: GPS
coordinates, zone, city
INNOVA CHALLENGE BigDataSpain 7/11
Geospa?al BigData Social Media
Sensors Satellite Images
Maps
Ac?vity (Transport)
GeospaHal Bigdata
INNOVA CHALLENGE BigDataSpain 7/11
With geo-‐tagged data we can Measure zone/area occupa?on & ac?vity Iden?fy flows of persons/money between different areas …
With those data we can build applicaHons in Geo-‐social analysis Geomarke?ng Op?mal alloca?on of resources Fraud detec?on Event detec?on …
Geo-‐tagged BigData applicaHons
INNOVA CHALLENGE BigDataSpain 7/11
Use of pervasive sensors (mobile phones, social media) to model movement and communica?on of people in urban areas.
Geo-‐social Analysis
INNOVA CHALLENGE BigDataSpain 7/11
!! Estudio de geolocalización en Madrid
! 34!
Localización:!!Puerta!del!Sol!!Número!de!checkins!totales:!2651!(30.5!al!día)!Número!de!usuarios!únicos!en!la!zona:!1231!!!!!!!!!!!!!!!!!!!!!!!!! ! hora
count
0
100
200
300
400
500
600
700
0 5 10 15 20 25
factor(tipo)arts_entertainmentfoodnightlifeshops
dia
count
0
500
1000
1500
lunes martes miércoles jueves viernes sábado domingo
factor(tipo)arts_entertainmentfoodnightlifeshops
timedays
count
0
50
100
150
abr−11 may−11 jun−11
factor(tipo0)arts_entertainmentfoodnightlifeshops
1
2
3
4
5
6
7
8
9
10
place
fnacstarbucks coffee
mercado de san miguelel corte inglés
mercado de san antónyelmo cines ideal 3d
vipsmcdonald's
café de orientesala joy eslava
n_checkins
316269251136113 87 84 78 77 71
1
2
3
4
5
6
7
8
9
10
user
amazel666runway4edaindil
maestrodariusivo_campos
despopedumaizadalogu8
desdealbert0mmetafetan
n_checkins
121 73 40 39 35 33 33 32 32 30
Characteriza?on of urban neighborhoods according to their social/commercial use
Geo-‐social analysis
INNOVA CHALLENGE BigDataSpain 7/11
Use merchant localiza?on and/or IP address in online transac?ons to detect fraud.
Fraud detecHon
INNOVA CHALLENGE BigDataSpain 7/11
Bars
Shops
GeomarkeHng
Manage sales risk
INNOVA CHALLENGE BigDataSpain 7/11
Bares
Tiendas
Iden?fy best placement for a new shop/branch
Op?mize cash holding in bank branches, minimizing costs associated with it.
OpHmal resource allocaHon
INNOVA CHALLENGE BigDataSpain 7/11
Detect unexpected behavior using social/mobile/urban sensors
Event detecHon
Access to (open) geographical data
INNOVA CHALLENGE BigDataSpain 7/11
Map
Infrastructure/places
AcHvity
Geographical data
INNOVA CHALLENGE BigDataSpain 7/11
Types of data
Maps Economic/Demographic data AcHvity
Twi*er BBVA API
INNOVA CHALLENGE BigDataSpain 7/11
Maps:: Google Maps
Google Maps has a number of different services/APIs, with different restric?ons and protocols. It allows to define maps, routes, markers, etc.
Example: get a staHc map (without authenHcaHon).
URL Base: h*p://maps.google.com/maps/api/sta?cmap Parameters:
• center: 40.4153,-‐3.6875 • size: 640x640 • maptype: mobile • format: png32 • sensor: true
INNOVA CHALLENGE BigDataSpain 7/11
Maps :: OpenStreetMap
Open and collabora?ve project to create and distribute free maps. Different APIs to get informa?on about routes, points, maps, etc. There are a number of Mapping projects (applica?ons) build on top of OSM with very different purposes
Example: get the route between two locaHons. MapQuest. URL Base: h*p://open.mapquestapi.com/guidance/v1/ Parameters:
• Key: authen?ca?on key • From: la?tud y longitud del origen en JSON. • To: la?tud y longitud del des?no en JSON.
INNOVA CHALLENGE BigDataSpain 7/11
Geospa?al vector data format for geographical informa?on • Regions, points, paths defined as points, lines, polygons • Each of them usually has a*ributes that describe it
Region Codes, Names, Popula?on, etc.
h*p://www.naturalearthdata.com/downloads/
pyshp: h*p://code.google.com/p/pyshp/ maptools: h*p://cran.r-‐project.org/web/packages/maptools
Mapas :: shapefiles
INNOVA CHALLENGE BigDataSpain 7/11
Edi?on and Visualiza?on of Shapefiles: h*p://www.qgis.org
Mapas :: shapefiles
INNOVA CHALLENGE BigDataSpain 7/11
CartoCiudad (Ministerio de Fomento): shapefiles for each province at municipality and postal code levels. They also include data about the urban background
h*p://www.cartociudad.es/portal/
Maps :: Spain cartography
INNOVA CHALLENGE BigDataSpain 7/11
Nomecalles (CAM): shapefiles, POIs (museums, theaters, health services ), subway (stations), etc.
h*p://www.madrid.org/nomecalles/DescargaBDTCorte.icm Resolu?on level: municipali?es, districts, postal codes, etc.
Maps :: Madrid cartography
INNOVA CHALLENGE BigDataSpain 7/11
Plan territorial metropolitano de Barcelona – Generalitat de Catalunya Link
Maps :: Barcelona province cartography
INNOVA CHALLENGE BigDataSpain 7/11
Open data gencat Catalonia Cartography Link
Maps :: Barcelona City cartography
INNOVA CHALLENGE BigDataSpain 7/11
Plan territorial metropolitano de Barcelona – Generalitat de Catalunya Link
This web has also data about mobility, economic development, popula?on, etc. at the district level There is nothing at this level of detail in Madrid. Solu?on: Use other data sources to es?mate them (see below).
Maps :: Barcelona city cartography
INNOVA CHALLENGE BigDataSpain 7/11
Demographic/Economic data :: Spain
Demographic Data: Ins?tuto Nacional de Estadís?ca (INE) Census by provinces / municipality / census sec?on. Link
Economic Data: Servicio Público de Empleo Estatal (SEPE). Unemployment by municipality.
Link
INNOVA CHALLENGE BigDataSpain 7/11
Demographic/Economic data :: Madrid
Madrid City Madrid City Council database: h*p://www-‐2.munimadrid.es/CSE6/jsps/menuBancoDatos.jsp Popula?on by districts, neighborhoods, etc.
Madrid Region
Comunidad de Madrid database: h*p://www.madrid.org/desvan/Inicio.icm?enlace=almudena Popula?on by municipality. Economical data by municipality
INNOVA CHALLENGE BigDataSpain 7/11
Demographic/Economic data :: Barcelona
Barcelona city Departament d’Estadís?ca h*p://www.bcn.cat/estadis?ca/castella/ Popula?on by district. Unemployment by district.
Catalonia region
Idescat (Ins?tut d’Estadís?ca de Catalunya) h*p://www.idescat.cat/es/ Popula?on by municipality Economical data by municipality.
INNOVA CHALLENGE BigDataSpain 7/11
Google API Console
Other data sources :: Google Points of Interest
INNOVA CHALLENGE BigDataSpain 7/11
Google API Console
Other data sources :: Google Points of Interest
INNOVA CHALLENGE BigDataSpain 7/11
Google API Console
Other data sources :: Google Points of Interest
INNOVA CHALLENGE BigDataSpain 7/11
Points of interest around Puerta del Sol (Madrid)
Service 1: Places Search Parameters :
location: 40.417, -3.703 radius: 1000
Service 2: Places Details
parameters: reference: place code
Other data sources :: Google Points of Interest
INNOVA CHALLENGE BigDataSpain 7/11
GFS: Global Forecast System
OpeNDAP protocol.
Python implementation : pydap Query format:
SERVER = http://nomads.ncep.noaa.gov:9090/dods/gfs_hd/ DATE = AAAAMMDD
HOUR = HH VAR = weather metric r (tmp2m, ugrd10m, pressfc, …)
LAT = latitude interval [259:263] (0.5º steps from South Pole) LON = longitude interval [710:714] (0.5º steps from Greenwich)
QUERY = SERVERgfs_hdDATE/gfs_hd_HOURz.dods?VAR[0:0][LAT][LON]
dataset = open_dods(QUERY)
Other data sources :: Weather forecast
INNOVA CHALLENGE BigDataSpain 7/11
Developers webpage http://dev.twitter.com
AcHvity :: data from TwiZer API
INNOVA CHALLENGE BigDataSpain 7/11
Developers webpage http://dev.twitter.com
AcHvity :: data from TwiZer API
INNOVA CHALLENGE BigDataSpain 7/11
Developers webpage http://dev.twitter.com
AcHvity :: data from TwiZer API
INNOVA CHALLENGE BigDataSpain 7/11
Developers webpage http://dev.twitter.com
Consumer Key
Consumer Secret
Access token
Access token secret
AcHvity :: data from TwiZer API
INNOVA CHALLENGE BigDataSpain 7/11
Consumer Key
Consumer Secret
Access token
Access token secret
OAuth Authentication
Rest API Stream API
Several queries with parameters
Number of requests
is limited
Only one query (with parameters)
Requests are not time-
limited
AcHvity :: data from TwiZer API
INNOVA CHALLENGE BigDataSpain 7/11
Stream API
Example: Geolocalized Tweets in the Madrid region
API Service: POST statuses/filter
parameters: locations: -4.59, 39.90, -3.04, 41.17
AcHvity :: data from TwiZer API
INNOVA CHALLENGE BigDataSpain 7/11
As we said before, there are no data in Madrid about administrative zones below the municipality. But we can estimate some of the with Twitter
• Example: population by postal codes
1. Round geographical coordinates to the 3rd decimal place (square cells of approx. 100 meters squared).
2. Analyze the most visited postal code by user. Define that as his/her residence. Count number of residents by postal code
3. Visualize.
Stream API
AcHvity :: data from TwiZer API
INNOVA CHALLENGE BigDataSpain 7/11
Stream API
AcHvity :: data from TwiZer API
INNOVA CHALLENGE BigDataSpain 7/11
Stream API
AcHvity :: data from TwiZer API
INNOVA CHALLENGE BigDataSpain 7/11
hZps://www.centrodeinnovacionbbva.com/signup
AcHvity :: data from BBVA API
INNOVA CHALLENGE BigDataSpain 7/11
https://developer.bbva.com/panel
AcHvity :: data from BBVA API
INNOVA CHALLENGE BigDataSpain 7/11
https://developer.bbva.com/panel
AcHvity :: data from BBVA API
INNOVA CHALLENGE BigDataSpain 7/11
https://developer.bbva.com/panel
AcHvity :: data from BBVA API
INNOVA CHALLENGE BigDataSpain 7/11
Ge\ng the authenHcaHon data:
Example: APP_ID = "iic_formacion_innovachallenge" APP_KEY = "0f1d750a5baea6c7022452d0d2ece01fc5901ad7” str_to_encode="iic_formacion_innovachallenge:0f1d750a5baea6c7022452d0d2ece01fc5901ad7” auth = strToBase64(str_to_encode) Request = H*pRequest(SERVICE, PARAMETERS, header = {‘Authoriza?on’ : auth})
1. With the APP_ID and APP_KEY, generate the authoriza?on code concatena?ng both strings with and codifying it to base64.
2. This authoriza?on code is added to the H*p Request Header.
AcHvity :: data from BBVA API
INNOVA CHALLENGE BigDataSpain 7/11
Parameters
Workshop 30thOctober
AcHvity :: CUSTOMER_ZIPCODES example
INNOVA CHALLENGE BigDataSpain 7/11
ExtracHng data
AcHvity :: CUSTOMER_ZIPCODES example
Workshop 30thOctober
INNOVA CHALLENGE BigDataSpain 7/11
Building the adjacency list
AcHvity :: CUSTOMER_ZIPCODES example
Workshop 30thOctober
INNOVA CHALLENGE BigDataSpain 7/11
Building and plo\ng the graph
AcHvity :: CUSTOMER_ZIPCODES example
Workshop 30thOctober
INNOVA CHALLENGE BigDataSpain 7/11
Economical flows from Puerta del Sol
Servicio API: customer_zipcodes
Parámetros: date_min:201304 date_max:201304 zipcode:28013 by:cards group_by:month
AcHvity :: CUSTOMER_ZIPCODES example
Example: development of a geolocalized recommender app.
INNOVA CHALLENGE BigDataSpain 7/11
ObjecHve: recommend users what areas to visit according to their profile, residence, preferences, etc. Using informaHon about what similar users do.
Data used:
1. API Innova Challenge – CARDS_CUBE. 2. API Innova Challenge – CUSTOMER_ZIPCODES.
Recommender systems :: IntroducHon
INNOVA CHALLENGE BigDataSpain 7/11
Use twi*er data to 1. Get what people are talking about in city areas.
2. Analyze user language in Twi*er
3. Compare user language with area language and recommend user most similar areas.
Recommender systems :: user language
INNOVA CHALLENGE BigDataSpain 7/11
CP 28013: Madrid city center
Recommender systems :: user language
INNOVA CHALLENGE BigDataSpain 7/11
CP 28009 : Retiro
Recommender systems :: user language
INNOVA CHALLENGE BigDataSpain 7/11
Use CARDS_CUBE service from the BBVA API
Recommender systems :: user demographic profile
INNOVA CHALLENGE BigDataSpain 7/11
• Use CARDS_CUBE service data
• For each merchant category Z (bars, fashion, health, etc.) build a matrix in which each entry is the number of different credit cards for a given profile X (gender, age) that went shopping to the postal code Y in a merchant of category Z.
Where do people like me go shopping? Which restaurants are visited by people similar to me?
Recommender systems :: user demographic profile
INNOVA CHALLENGE BigDataSpain 7/11
Example: Male, age 36-‐45
Fashion Bars and restaurants
Recommender systems :: user demographic profile
INNOVA CHALLENGE BigDataSpain 7/11
Use CUSTOMER_ZIPCODES service in the BBVA API
Recommender systems :: user geographic profile
INNOVA CHALLENGE BigDataSpain 7/11
• Use data from the CUSTOMER_ZIPCODES service
• For each merchant category Z (bars, fashion, health, etc.) we build a matrix in which each entry is the number of different credit cards from a postal code X that go shopping to postal code Y in merchant category Z.
Where do people in my district go shopping? What restaurants are visited by people living in my district?
Recommender systems :: user geographic profile
INNOVA CHALLENGE BigDataSpain 7/11
Fashion Bars and restaurants
Example: postal code 28045
Recommender systems :: user geographic profile
INNOVA CHALLENGE BigDataSpain 7/11
Geographical and demographic recommendation system
Recommender systems :: combinaHon
INNOVA CHALLENGE BigDataSpain 7/11
Fashion Bars and restaurants
Example: Male, age 36-‐45, living in postal code 28045.
Recommender systems :: combinaHon
From the data to the app
INNOVA CHALLENGE BigDataSpain 7/11
From data to the app
1. The idea.
2. What data do I need to carry out this idea? Which services of the Challenge API do I need? May I improve it with other informa?on sources?
3. Analysis: disHlling the idea and assessing its viability. Extrac?ng the hidden value of analy?cs and models.
4. How can the user take advantage of this idea?
5. Iterate 2,3 and 4 un?l the idea and the user profit show up.
6. Convert the value of the analysis to an applica?on.
INNOVA CHALLENGE BigDataSpain 7/11
Esteban Moro Alejandro Llorente
www.iic.uam.es
[email protected] @llorentealex [email protected] @estebanmoro