SQL Server 2012 and Big Data

SQL SERVER 2012 AND BIG DATAHadoop Connectors for SQL Server

TECHNICALLY – WHAT IS HADOOP

• Hadoop consists of two key services: • Data storage using the Hadoop Distributed File System (HDFS) • High-performance parallel data processing using a technique called

MapReduce.

HADOOP IS AN ENTIRE ECOSYSTEM

• Hbase as database• Hive as a Data Warehouse• Pig as the query language • Built on top of Hadoop and the Map-Reduce framework.

• HDFS is designed to scale seamlessly • That’s it’s strength!

• Scaling horizontally is non-trivial in most cases. • HDFS scales by throwing more hardware at it. • A lot of it!• HDFS is asynchronous• Is what links Hadoop to Cloud computing.

DIFFERENCES

• SQL Server & Windows 2008 R2′s NTFS?• Data is not stored in the traditional table column format.• HDFS supports only forward only parsing• Databases built on HDFS don’t guarantee ACID properties• Taking code to the data• SQL Server scales better vertically

UNSTRUCTURED DATA

• Doesn’t know/care about column names, column data types, column sizes or even number of columns.• Data is stored in delimited flat files• You’re on your own with respect to data cleansing• Data input in Hadoop is as simple as loading your data file

into HDFS• It’s very close to copying files on an OS.

NO SQL, NO TABLES, NO COLUMNSNO DATA?

• Write code to do Map-Reduce• You have to write code to get data

• The best way to get data • write code that calls the MapReduce framework to slices and dices

the stored data

• Step 1 is Map and Step 2 is Reduce.

MAP (REDUCE)

• Mapping• Pick your selection of keys from record (Linefeed)• Tell the framework what your Key is and what values that key will

hold• MR will deal with actual creation of the Map• Control on what keys to include or what values to filter out• End up with a giant hashtable

(MAP) REDUCE

• Reducing Data: Once the map phase is complete code moves on to the reduce phase. The reduce phase works on mapped data and can potentially do all the aggregation and summation activities.• Finally you get a blob of the mapped and reduced data.

JAVA… VS. PIG…

• Pig is a querying engine• Has a ‘business-friendly’ syntax• Spits out MapReduce code• syntax for Pig is called : Pig Latin (Don’t ask)• Pig Latin is very similar syntactically to LINQ.

• Pig converts into MapReduce and sends it off to Hadoop then retrieves the results• Half the performance• 10 times faster to write

• HBase is a key value store on top of HDFS• This is the NOSql Database• Very thin layer over raw HDFS• Data is grouped in a Table that has rows of data.• Each row can have multiple ‘Column Families’ • Each ‘Column Family’ contain(s) multiple columns.• Each column name is the key and it has it’s corresponding column

value.• Each row doesn’t need to have the same number of columns

• Hive is a little closer to RDBMS systems• Is a DWH system on top of HDFS and Hbase• Performs join operations between HBase tables

• Maintains a meta layer • data summation, ad-hoc queries and analysis of large data stores in

• High level language• Hive Query Language, looks like SQL but restricted• No, Updates or Deletes are allowed• partitioning can be used to update information

o Essentially re-writing a chunk of data.

WINDOWS HADOOP- PROJECT ISOTOPE

• 2 Flavours• Cloud

o Azure CTP

• On Permiseo integration of the Hadoop File System with Active Directoryo integrate System Center Operations Manager with Hadoopo BI Integration

• Are not all that interesting in and of themselves, but data and tools areo Sqoop

– Integration with SQL Servero Flume

– Access to Lots of data

• Is a framework that facilitates transfer between (RDBMS) and HDFS. • Uses MapReduce programs to import and export data; • Imports and exports are performed in parallel with fault

tolerance.

• Source / Target files being used by Sqoop can be: • delimited text files• binary SequenceFiles containing serialized record data.

SQL SERVER – HORTONWORKS - HADOOP

• Spin-off from Yahoo• Bridge the technological gaps between Hadoop and Windows

Server • CTP of the Hadoop-based distribution for Windows Server

( somewhere in 2012)• Will work with Microsoft’s business-intelligence tools• including

o Excelo PowerPivot o PowerView

HADOOP CONNECTORS

• SQL Server versions• Azure• PDW• SQL 2012• SQL 2008 R2

http://www.microsoft.com/download/en/details.aspx?id=27584

WITH SQL SERVER-HADOOP CONNECTOR, YOU CAN:

• Sqoop-based connector• Import• tables in SQL Server to delimited text files on HDFS• tables in SQL Server to SequenceFiles files on HDFS• tables in SQL Server to tables in Hive• Result of queries executed on SQL Server to delimited text files on HDFS• Result of queries executed on SQL Server to SequenceFiles files on HDFS• Result of queries executed on SQL Server to tables in Hive

• Export• Delimited text files on HDFS to SQL Server• DequenceFiles on HDFS to SQL Server• Hive Tables to tables in SQL Server

SQL SERVER 2012 ALONGSIDE THE ELEPHANT

• PowerView utilizes its own class of apps, if you will, that Microsoft is calling insights.• SQL Server will extend insights to Hadoop data sets• Interesting insights can be• Brought into a SQL Server environment using connectors• Drive analysis across it using BI tools.

WHY USE HADOOP WITH SQL SERVER

• Don’t just think about big data being large volumes• Analyze both structured and unstructured datasets• Think about workload, growth, accessibility and even location• Can the amount of data stored every day reliably written to a

traditional HDD

• Mapreduce is more complex then TSQL• Many companies try to avoid writing java for queries • Front ends are immature relative to the tooling available in the

relational database world• It’s not going to replace your database, but your database isn’t likely

to replace Hadoop either.

MICROSOFT AND HADOOP

• Broader access of Hadoop to: • End users• IT professionals • Developers

• Enterprise ready Hadoop distribution with greater security, performance, ease of management.• Breakthrough insights through the use of familiar tools such

as Excel, PowerPivot, SQL Server Analysis Services and Reporting Services.

ENTERPRISE HADOOP

• Installation wizard (IsotopeClusterDeployment)• Healtcheck and monitoring pages• Interactive Javascript Console

MICROSOFT ENTERPRISE HADOOP

• Machines in the Hadoop cluster must be running Windows Server 2008 or higher

• Ipv4 network enabled on all nodes• Deployment does not work on Ipv6 only network.

• The ability to create a new user account called “Isotope”. • Will be created on all nodes of the cluster. • Used for running Hadoop daemons and running jobs. • Must be able to copy and install the deployment binaries to each machine

• Windows File Sharing services must be enabled on each machine that will be joined to the Hadoop cluster.

• .Net Framework 4 installed on all nodes.• Minimum of 10G free space in C drive (JBOD HDFS configuration is

supported)

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server 2012 and Big Data

Technology

Transcript of SQL Server 2012 and Big Data

Big Data con Windows Azure HDInsight | Lanzamiento SQL Server 2014

We Practice What We Teach. Public Web Site Outages Web Servers SQL Server SQL Server SQL Server SQL Server SQL Server SQL Server SQL Server SQL Server.

Optimized your sql server operation using big data echo system

Chapter 4 SQL. SQL server Microsoft SQL Server is a client/server database management system. Microsoft SQL Server is a client/server database management.

Advanced SQL Server Troubleshooting SQL Server 2008... · Bring your SQL Server Bring your SQL Server installations to installations to a new level of excellence! Advanced SQL Server

Geek Guide > SQL Server on Linux - suse.com · Microsoft SQL Server is an offering that includes not ... SQL Server 2012, SQL Server 2014 and SQL Server 2016). SQL Server works on

Big Data: Big SQL web tooling (Data Server Manager) self-study lab

440900914440900 - kutub-download.com · 440900914440900 2 : 1-Microsoft SQL Server 2-Visual Basic.NET Visual Basic.NET SQL SQL Server SQL Server Microsoft SQL Server SQL

RDBMS Progress, Oracle and SQL Server in relationship with Big Brother/Big Data

Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL

How to Connect to CDL SQL Server Database via Internet · 5. Using SQL Server and SQL Server Linked Server Connect to CDL SQL Server using SQL server Management studio is very easy.

Big Data (NJ SQL Server User Group)

Overview SQL Server 2019 Big Data Cluster-CT · 2021. 2. 21. · Microsoft PowerPoint - Overview SQL Server 2019 Big Data Cluster-CT.pptx Author: niels Created Date: 9/15/2019 8:13:57

Big data con SQL Server 2014

The Big Data Accelerator Manual Partitioning on SQL Server ... · The Big Data Accelerator Manual Partitioning on SQL Server A Fully Worked Example By Version 0.01 1st January 2017

Table of Contents - Navicat · SQL Server Table Storage 160 SQL Server Views 160 SQL Server Functions/Procedures 161 SQL Server Indexes 163 SQL Server Synonyms 168 SQL Server Triggers

Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck

Installin SQL Server 2014 CTP2 without error - WordPress.com · Upgrade Advisor analyzes any SQL Server 2012, SQL Server 2008 RI, SQL Server 2008 or SQL Server 2005 components that

TIME FOR A CHANGE: ACCELERATING DATA, ACCELERATING BUSINESS, WITH SQL SERVER … · 2019-10-29 · SQL Server. SQL Server. SQL Server. SQL Server. Electrically isolate workloads for

SQL Server Setup - using SQL Server authentication