Post on 23-Jun-2020
Design Aspects of a Smart City Platform
credits go to:
Bin Cheng, Salvatore Longo, Flavio Cirillo, Martin Bauer, Ernoe
Kovacs
Building a Big Data Platform for Smart Cities: Experience and Lessons
from Santander. IEEE BigData, NY 2015.
Tobias Jacobs
NEC Laboratories Europe, Heidelberg, Germany
tobias.jacobs@neclab.eu
Page 2 © NEC Corporation 2015
Background
▌ Smart Cities:
Cities are still growing day by day (60~70 % of the global population is
expected to live in big cities by 2020)
More cities are connected by widely deployed sensors like Santander
in Spain, Chicago in US, Songdo in South Korea
▌ Santander City:
Having the largest IoT test-bed in Europe
Experimental test-bed for Smart Cities
~1,200 sensor nodes of 9 types (15,000 sensors)
network topology*
*Cited from “SmartSantander: IoT
experimentation over a smart city testbed”,
Journal of Computer Networks, Mar., 2014
Page 3 © NEC Corporation 2015
Technical Motivation and Challenges
▌ Motivation Cities are getting equipped with connected sensors
Variety of city applications from different business domains demand:
• outsourcing intensive data storage and processing into a horizontal platform layer
• sharing data processing and analytics
Towards smart cities, we target to build a flexible and efficient city data
and analytics platform.
▌ Challenges Big data storage:
• Need to handle the scale and diversity of city data (unstructured data like videos from cameras,
semi-structured data like JSON data from sensors, structured data like excel table from external
data sources)
Data processing and analytics:
• For both historical data and live data, incremental analytics for constantly growing data set
Application interfaces:
• To be flexible and generic for different requirements from applications
Page 4 © NEC Corporation 2015
Background
▌ NEC Cooperation investing in
commercial Smart City + IoT
Platform offerings
World-wide activities: in Spain
(Santander), Singapore, New
Zealand
▌ NEC Labs Europe (Heidelberg,
Germany) supporting by research
Various EU projects (IoT-A,
FIWARE, SMARTIE, Mob1net,
etc.)
research on overall platform
architectures, but also more
specific topics like sensor data
analytics, network edge
computing, processing of geo-data
etc.
▌ This presentation:
an NEC Labs activity which started
1 year ago
Experimental design &
implementation (system
integration) of Smart City Platform
re-using FIWARE technology
Based on requirements from
Santander City
Close cooperation with NEC Cloud
Competence Center (Spain)
approaches later re-used in
commercial NEC Cloud City
Operation Center
Work presented at IEEE BigData
2015
Page 5 © NEC Corporation 2015
CiDAP: City Data and Analytics Platform
applications
big data platform for smart cities
data sources
CityModel Server
IoT-broker
big data repository
documents
views
big data processing
Dashboard application
CouchDB
unstructured data (text/image/video)
Semi-structured data (JSON data)
applications
indexes
HDFS & Spark are deployed within the same cluster
External processing (batch+stream)
Internal processing
CityModel APIs
CouchDB APIs
NGSI APIs
Data/JSON
NGSI APIs
compute & data nodes
platform management portal
application developers
platform operator
others
name node
IoT-agents
CouchDB APIs
SPARK HDFS
Page 6 © NEC Corporation 2015
Big Data Repository: Multiple layer storage
Big files
Indexed documents
Indexed views
aggregated results for applications
applications
NoSQL database
HDFS
External processing (online or offline)
Internal processing (online, updated as new documents come)
Interactive queries over indexed views
notification for applications
filters
filters
Unstructured data (file in HDFS)
-> semi-structured data (JSON documents in CouchDB)
-> structured data (indexed views in CouchDB)
Page 7 © NEC Corporation 2015
Big Data Processing: Extendable Architecture
CouchDB: • saving all raw data and acting also as a message broker for all data processing
(internal+external)
• embedded map-reduce for incremental, real-time, light processing
Spark Cluster: (spark + spark streaming) • intensive heavy processing for stream data and historical data
Be flexible in the sense that the Spark Cluster part is optional for small
scales
Page 8 © NEC Corporation 2015
Application Interfaces: CityModel APIs
▌ Based on RESTful HTTP API
▌ Support both queries and subscriptions
Query: simple query and complex query, like ranging, grouping,
temporal and spatial parameters (district, section, month or year
range)
Subscriptions: subscribe to “cache data” or low latency results
directly from devices or the edge nodes
converter
applications
CityModel Server
IoT-broker
IoT-agent
Dashboard application
application A
Subscriptions from applications
REST APIs for CouchDB CouchDB
application B
usual update interval
subscriptions further forwarded to the physical world via IoT-broker
notification
notification
Sub-C
Sub-A
Sub-B
Page 9 © NEC Corporation 2015
Deployment and Measurement
▌ Real deployment
Integrated with the Smart
Santander test-bed
Support A dashboard service via
the CityModel APIs
Collecting sensor data 20GB per
month
▌ In-lab deployment for test
3 machines connected on the
same local network (1Gb/s)
Purposes: to test the system
performance and identify the
bottleneck via microbenchmark
Driven by real dataset from
Santander
Page 10 © NEC Corporation 2015
Preliminary Results from In-Lab deployment
▌ Throughput: # of queries executed per second (simple query, complex
query)
Slightly affected by on-going updates
complex query (for all indexed views based on all data) is 10 time slower than simple query
(for indexed views based on latest documents)
▌ Number of Update per second:
300 updates/seconds (with bulk update, can reach 5,000 documents
/second) Upper Bound of CiDAP with single CouchDB instance
Santander Test-bed update workload: ~20 updates/second on average
Page 11 © NEC Corporation 2015
Experience with Santander
▌ Positive results:
CouchDB is suitable for incremental updating
Flexible architecture:
• small scale, with only CouchDB
• big scale: add Spark Cluster to support external heavy processing
▌ Limits:
NoSQL database like CouchDB is not efficient when saving all raw sensor data:
• Time for compaction is very long with the increase of # documents
• Ad hoc query is time consuming when views are big (disk IO is the limitation)
• To scale up: CouchDB 2.0 support cluster
▌ What to do next:
Data semantics is important to be considered in the next step ( -> enhanced with semantics)
• Example: mapping from service type to node type
• E.g. Temperature sensor value is not only reported by temperature
sensors node.
Page 12 © NEC Corporation 2015
Lessons Learned (1): Edge Computing
▌ To achieve low latency analytics results, some processing must be done on the devices or at the network edge ( -> edge computing) The deployed sensor nodes are periodically updating and also constantly switch into sleep
mode for saving battery time and energy (all updates have more than 10 seconds delay)
Difficulty to support applications that require fast real-time data
Allow sensor nodes to actively report/processing real-time data when applications subscribe to low
latency data
Page 13 © NEC Corporation 2015
Lessons Learned (2): Anomaly detection for sensor nodes
▌ Three months deployment (09.2014 ~ 12.2014)
“abnormal” -> “not reporting latest value due to some reason”
200 nodes, abnormal before we start our experiment
100 nodes, became abnormal during our deployment
sensor nodes sorted by last reporting time
last re
po
rtin
g tim
e
Anomaly detection is needed to help applications to filter out noisy data
The analysis was done
checking the reporting
timestamp.
Page 14 © NEC Corporation 2015
Summary and Future Work
▌ City data and analytics platform (CiDAP): to enable efficient & flexible data analytics across various applications • Multiple layer storage: file in HDFS -> Document in CouchDB -> View in CouchDB
• Extenable architecture: data processing for constantly growing dataset, both historic data and refresh data
• CityModel APIs: spatial-temporal queries; subscriptions to cached data and real-time device data
▌ Experience, lessons learned, and future work • Reliability of IoT devices must be considered because reported data can be noisy or from a faulty
sensor node ( -> anomaly detection)
• To achieve low latency analytics results, some processing must be done on the devices or at the network edge ( -> edge computing)
• Data semantics is important to be considered in the next step ( -> enhanced with semantics)
Page 16 © NEC Corporation 2015
NEC brings together and integrates technology and expertise to create
the ICT-enabled society of tomorrow.
We collaborate closely with partners and customers around the world,
orchestrating each project to ensure all its parts are fine-tuned to local needs.
Every day, our innovative solutions for society contribute to
greater safety, security, efficiency and equality, and enable people to live brighter lives.
Page 17 © NEC Corporation 2015