Study: #Big Data in #Austria
-
Upload
semantic-web-company -
Category
Data & Analytics
-
view
879 -
download
0
Transcript of Study: #Big Data in #Austria
#Big Data in #AustriaBig Data – Challenges and Potentials
Mario Meir-Huber and Martin Köhler
European Data Economy Workshop, Semantics 201515.09.2015, Vienna, Austria
2
Study „#BigData in #Austria“ Study „#BigData in #Austria“ Project duration: 1.11.2013 – 30.04.2014
Project partners:• IDC Central Europe GmbH• AIT Austrian Institute of Technology, Mobility Department
Contact persons:• Mario Meir-Huber, IDC (Teradata)• Martin Köhler, AIT
Content:• State-of-the-Art in Big Data• Market analysis• Best practice for Big Data projects
Download (in german):• FFG „Studies of ICT of the future“: https://www.ffg.at/studien-aus-ikt-der-zukunft
#Big Data in #Austria has been funded in the funding frame „ICT of the future “ of the Austrian Research Promotion Agency (FFG) and the Austrian Ministry for Transport, Innovation and Technology (BMVIT).
3
Data-intensive science
© IDC Visit us at IDC.com and follow us on Twitter: @IDCVisit the project: http://bigdataaustria.wordpress.com
Enormous data archives are at hand
Various data sources
Often available in real-time
Investigating huge data volumes and driving research and industry
Science is moving increasingly from hypothesis-driven to data-driven discoveries
Correlation vs. Causality
Big Data Definition
401.05.2023
“Big Data” is a term encompassing the use of techniques to capture, process, analyse and visualize potentially large datasets in a reasonable timeframe not accessible to standard IT technologies. By extension, the platform, tools and software used for this purpose are collectively called “Big Data technologies”.NESSI White Paper, December 2012
4
Four characteristics:•Volume: In the last years the amount of generated data increased enormously
•Velocity: Analysing more data in shorter time frames
•Variety: Huge diversity of data formats (Arbitrary–> Relational > Freitext)
•Value: Extracting value (knowledge)
Hardware and software technologies for manageing and Analyzing huge amounts of data
Or simply saidIF DATA IS PART OF THE PROBLEM
Big Data Dimensions
Legal dimension
Social dimension
Economic dimension
Technological dimension
Application dimension
CopyrightPrivacy
User behaviourcollaboration
Social implikations
Business modelsBenchmarking
Pricing
Scalable data processingSignal processing
StatisticsLinguistics
HCI/Visualization
Electronic archivingDecision supportIndustry solutions
01/05/20235
Big Data Technology Stack
Hadoop Ecosystem
Big Data Platforms
Data Ingestion
AndProcessing
EfficiencyTrust
WorkloadGovernance
ToolsPlatform
ProgrammingParallel
Big Data Analytics
Data Science
Transformquestion toalgorithm
Machine LearningAnalysis
IntegrationQuery
PerformanceTransform
Warehousing
Big Data Utilization
DomainExpertise
Asking theright
question
Reporting & DashboardsAlerting &
Recommendations
Business Intelligence
Text Analysis and Search
01/05/20236
DataCenters
Big Data Management
Scalable Data Storage
IaaSCloud
VirtualizationNetworkComputeStorageDBMSNoSQL
Man
agem
ent
Secu
rity
Pr
ivac
y
Gov
erna
nce
Data
Value
7
Big Data Management Technologies for the efficient management of huge data
amounts• Storage and management of data• Provisioning and management of the infrastructure
Cloud Ressources (Internal) Data Centers
Storage
8
Big Data Platforms Technologies for (massively) parallel execution of data analytics on huge
amounts of data• Provisioning of parallelized and scalable execution systems• Real-time integration of sensor data
Massively parallel programming
Programming models for data-intensive applications
(e.g. MapReduce)
High-Level Query languages
Scripting languages and abstraction of low-level data-intensive query languages
Streaming
Real-time processing of (sensor-) data (which
can not be stored)
Ad-Hoc queries
Real-time access on huge data amounts (Query optimization – SQL vs. MapReduce)
Google PregelApache Drill
9
Big Data Analytics Technologies for extracting information/knowledge from huge data amounts
• Pattern recognition• Pattern matching• .
10
Big Data Utilization Technologies for extracting value
• Strengthening the market situation of an organization• Technologies for (simplified) utilization of data
Business Intelligence
Provisioning of efficient indicators based on data (Reporting, KPIs, Audit, …)
Knowledge Management
Management and representation of knowledge (Ontologies, LinkedData, Knowledge management systems)
Decision Support
Supporting decision making; incorporates data management, modelling, innovative and interactive user interfaces
Visualization
Interactive Visualization of complex informations and networks on different levels of abstractions (Visual Analytics)
Traditional versus Data-intensive Approach
– 11 –
HADOOPIterate over structure
Transform and analyze
Hadoop Approach• Apply schema on read• Support range of access patterns to
data stored in HDFS: polymorphic access
Batch Interactive Real-time
Right Engine, Right Job
In-memory
Traditional Approach• Apply schema on write• Heavily dependent on IT
Determine list of questions
Design solution
Collect structured data
Ask questions from list
Detect additional questions
Single Query EngineSQL
Technical and scientific challenges
Visual Analytics• Combine the strengths of human and
electronic data processing
Big Data Analytics• Techniques making use of complete data set,
instead of sampling
Real time analytics, (cross)-stream processing• Expect real-time or near real-time responses
from the systems
Content Validation• Validating the vast amount of information in
content networks, Trust
1201/05/2023
Distributed Storage (IaaS, NoSQL)
Datacenter
Parallel Stream ProcessingMapReduce Extensions
Use Cases and Enterprise Services
Scientific Data Life Sciences Business Reporting
DatacenterDatacenter
13
Market analysis State-of-the-art in methods and tools
• ~50 Big data toolkits
Analysis of Austrian market participants• ~60 Austrian and internationals companies• Industry analysis
Tertiary education• Overview of Big data topics in course of
studies• Research overview
Open data portals and data sets
© IDC Visit us at IDC.com and follow us on Twitter: @IDCVisit the project: http://bigdataaustria.wordpress.com
Global market IDC expects a growth of the
global market from 9,8 Billion USD in 2012 to 32,4 Billion USD in 2017
Yearly growth rate: 27%
Austrian market 2013:• ~ 23 Mio Euro
Code of practice for big data projectsSupport and orientation for the impementation of big data projects
Reference projects• Medicine• Mobility• Earth observation• Crisis and disaster management• Trade
15
Process model Maturity model
Reference architecture
Code of practice for big data projects
16
„We will soon have a huge skills shortage for data-related jobs.“
Neelie Kroes (ICT 2013, Nov.7, Vilnius)
„Data Scientist: The Sexiest Job of the 21st Century“http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
Code of practice for big data projects
17
Recommendations and implications
„Data is a commodity – competence is the key“
18
Adde
d Va
lue
Mar
ket L
eade
rshi
p
Loca
tion
attra
ctive
ness
Enha
nce
com
pete
nces
Visibility
Objectives
Competence
Enable data access
Legislation
Provide infrastructure
Current status
Focus, create and provide competences
Secure competences for the long-term
Establish holistic institution
Establish (international) legal certainty
Establish general framework for data markets
Incentives for Open Data
Enhance funding for SMEs
Steps