PowerPoint Presentation · TechEd 2012 Keywords: TechEd 2012 Created Date: 3/4/2014 9:37:35 AM ...

Post on 16-Nov-2020

2 views 0 download

Transcript of PowerPoint Presentation · TechEd 2012 Keywords: TechEd 2012 Created Date: 3/4/2014 9:37:35 AM ...

1080 ~ 2240

3Sources: The Economist, Feb ‘10; IDC

By 2016 the New Large Synoptic Survey Telescope in Chile will acquire 140 terabytes in 5

days - more than Sloan acquired in 10 years

In 2000 the Sloan Digital Sky Survey collected more data in its 1st week than was collected in

the entire history of Astronomy

The Large Hadron Collider at CERN generates 40 terabytes of data every second

Power Map for Excel is a three-

dimensional (3D) data

visualization tool for Excel 2013.

http://www.microsoft.com/en-us/powerbi

Big Datain Research

Microsoft Research ATL Europe, Munich

Marcel TillyProgram Manager

Big Data.

Sources: The Economist, Feb ‘10; DBMS2; Microsoft Corp

Cisco predicts that by 2013 annual internet traffic flowing will reach 667 exabytes

The Twitter community generates over 1 terabyte of tweets every day

Bing ingests > 7 petabyte a month

Talks• From Text to Entities and from Entites to Insight: a Perspective on

Unstructured Big Data

• Querying and Exploring Big Brain Data

• Big Data with Stratosphere

• SCOPE: Parallel Databases Meet MapReduce

• Online Data Processing with S4 and Omid

• Predictable Data Centers

• From Terabytes to Megabytes: Finding the Needle by Shrinking the

Haystack

• Incremental, Iterative, and Interactive Computation using

Differential Dataflow

• Big Data on Small Machines

• Graphs and Linear Measurements

• Partitioning & Clustering Big Graphs

• Online Team Formation in Social Networks

• Big Data and Enterprise Analytics

• Streaming Verification of Outsourced Computation

• Big Data Analytics: A Happy Marriage of Systems and Theory?

• Fast Algorithms for Perfect Matchings in Regular Bipartite Graphs

• Cuts, Trees, and Electrical Flows

• Neighborhood Sampling for Estimating Local Properties on a

Graph Stream

• What Can't We Compute on Data Streams?

• Querying Big, Dynamic, Distributed Datahttp://research.microsoft.com/en-

US/events/bda2013/default.aspx

Scope

We witness a rapid development of the

research and technology for efficient

processing of big data. There is a surge of

commercial and open source platforms for big

data analytics, including platforms for querying

of massive datasets, batch processing, real-time

analytics, streaming computations, iterative

computations, graph data processing, and

distributed machine learning.

Database queries

How can we efficiently resolve database queries on massive

amounts of input data?

Here the input data may be presented in the form of a distributed

data stream.

Machine learning

How can we efficiently solve large-scale machine learning problems?

Here the input data may be massive, stored in a distributed cluster of

machines.

Distributed computing

How can we efficiently solve large-scale optimization problems in

distributed computing environments? For example, how can we

efficiently solve large-scale combinatorial problems, e.g. processing of

large scale graphs?

0

2

RedFIR® is unrivaled worldwide as a tool

for analyzing performance in team sports,

making it possible to objectively analyze

games and assess players against a

consistent set of criteria.

http://www.orgs.ttu.edu/debs2013/index.php?goto=cfchal

lengedetails

“How to Fit when No One Size Fits”, Lim and al, CIDR 13