SQL Server 2012 Beyond Relational Performance and Scale

42
Beyond Relational Performance and Scale in SQL Server 2012 Michael Rys Principal Program Manager @SQLServerMike

description

Pragmatic Works SQL Server 2012 Webinar presentation

Transcript of SQL Server 2012 Beyond Relational Performance and Scale

Page 1: SQL Server 2012 Beyond Relational Performance and Scale

Beyond Relational Performance and Scale in SQL Server 2012

Michael RysPrincipal Program Manager@SQLServerMike

Page 2: SQL Server 2012 Beyond Relational Performance and Scale

My favorite Beyond Relational Application

Structured and unstructured Search

Related/”Semantic” Search

Page 3: SQL Server 2012 Beyond Relational Performance and Scale

Beyond Relational Data

Building and Maintaining Applications with relational and non-relational data is hard

Complex integrationDuplicated functionalityCompensation for unavailable services

Pain Points

Goals

Reduce the cost of managing all dataSimplify the development of applications over all dataProvide management and programming services for all data

Page 4: SQL Server 2012 Beyond Relational Performance and Scale

What is the Beyond Relational Mission?Efficient storage for all data

Tables, XML, Spatial, Documents, Digital Media, Scientific Records, Factoids…

Rich Data Processing Capabilities for all applications

Data formats and content natively understood for rich application and user experienceConsistent Application Model and Data Constructs to ease application development, migration and long-term retention

Rich Capabilities and Services over all dataProvide rich services, e.g.,

Query and Reason over data and extracted semanticsSearch across structural impedance of different data formatsIntegrated backup/restore for all data

Page 5: SQL Server 2012 Beyond Relational Performance and Scale

Beyond Relational Story

StructuredData

Query

T-SQL

B-treesManageabilit

yAvailability

Files

Programmability

Page 6: SQL Server 2012 Beyond Relational Performance and Scale

Beyond Relational Story

StructuredData

Query

T-SQL

B-trees

ManageabilityAvailability

Files

Programmability

Unstructured Data

Search

Page 7: SQL Server 2012 Beyond Relational Performance and Scale

Beyond Relational Story

StructuredData

Query and Type Operations

T-SQL/Data Types

B-trees

ManageabilityAvailability

Files

Programmability

Unstructured Data

Search

Filestream

Win 32

Semi-structuredData/XML

XML, FTS, SpatialIndices

XQuerySpatial ops

Spatial, XML, HierarchyID

Page 8: SQL Server 2012 Beyond Relational Performance and Scale

Beyond Relational Story

StructuredData

Query and Type Operations

T-SQL/Data Types

B-trees

Manageability& Availability

Programmability

Unstructured Data

SearchWin 32

Semi-structuredData/XML

Semantic

Platform

Efficient Storage for BR Data

Rich Query and Search Services over all Data

Rich Data ProgrammingCapabilities

Files

Filestream

XML, FTS, SpatialIndices

XQuerySpatial

ops

Spatial, XML, HierarchyID

Page 9: SQL Server 2012 Beyond Relational Performance and Scale

Beyond Relational in SQL Server 2012

Address important customer requests for Capabilities and rich services for Rich Unstructured Data (RUDS)

Scale Up for storage and searchEasy use/access to Unstructured data from all applicationsRich insight into unstructured data to make better decisions

We deliver what you asked for to build Spatial-aware Applications

Advanced 2D SpatialMake Spatial pervasive across platformImprove performance and scale

Service Broker Message Broadcast

Page 10: SQL Server 2012 Beyond Relational Performance and Scale

Rich Unstructured Data Performance and Scale

Scale Up for storage and search to 100m to 500m documentsMultiple containers for FileStream Scale Up Improved Scale Up for Search

Page 11: SQL Server 2012 Beyond Relational Performance and Scale

Rich Unstructured Data & Services Ecosystem

Fulltext Search

Semantic Similarity Search

Rich

S

erv

ices

Database

Disk1

Disk2

Disk3

Multiple Containers

Sca

le-u

p

Solu

tions

Database Applications

Transactional Access

Blobs

DB FileStre

DB FileStreams

Integrated Backup/Replication/AlwaysO

n

Integrated AdministrationIntegrated Administration?

Windows Apps

SMB Share Files/Folders

FileStream API

Streaming Win32 AccessStreaming Win32 Access??

Customer Application

Azure lib Centera lib

SQL FILESTREAM lib

SQL RBS API

Azure Centera SQL DB

Remote BLOB Storage

FileStreamsFileTable

SQL Apps

Page 12: SQL Server 2012 Beyond Relational Performance and Scale

FilestreamStorage Attribute on VARBINARY(MAX)

Works with integrated FTSUnstructured data stored directly in the file system (requires NTFS)Dual Programming Model

TSQL (Same as SQL BLOB)Win32 Streaming APIs with T-SQL transactional semantics

Data ConsistencyIntegrated Manageability

Back Up/RestoreAdministration

Size limit is the file system volume sizeSQL Server Security Stack

Store BLOBs in DB + File SystemApplication

BLOB

DB

Page 13: SQL Server 2012 Beyond Relational Performance and Scale

FILETABLE Overview

FileTable: A Table of Files/Directories

User created Table with a fixed schema

contains FILESTREAM and File Attributes

Each row represents a File or a Directory

System defined constraints maintain the tree integrity

File/Directory hierarchy view through a Windows Share

Supports Win32 APIs for File/Directory Management

DB Storage is Transparent to Win32 applications

SMB level of application compatibility

Virtual network name (VNN) path support for transparent Win32 application failover

Private Docs(Database1)

Office Docs(Database2)

LogFiles (FileTable)

Documents(FileTable)

Media(FileTable)

MSSQLSERVER

\\my_machine\MSSQLSERVER\Office Docs\Documents

FILESTREAM Share

Database Directories

FileTable Directories

FileTable Folder Hierarchy

User-Defined Directory Structure

Page 14: SQL Server 2012 Beyond Relational Performance and Scale

Some FileStream/FileTable performance tipsReading bigger buffers gives better performance

Volumes hosting FILESTREAM/FILETABLE data should have 8.3 name generation and LastAccessTime disabled

FILESTREAM/FILETABLE containers to reside on dedicated volumes

Have one volume per FILESTREAM/FILETABLE containerenables space management at volume level

“Magic” SMB buffer size = ~60KB Another “good” value is 480KB

ROWGUID unique index for aligned partitioning for FILESTREAM

AntiVirus programs should be configured not to delete infected files but to quarantine them

If using compressed volumes, use cluster size 4 KB

Page 15: SQL Server 2012 Beyond Relational Performance and Scale

FILESTREAM Read Performance (Remote)

240 KB 480 KB 1 MB 2 MB 4 MB 8 MB0

100

200

300

400

500

600

700

800

900

Filestream Win32 (Filesystem) Ac-cess

Filestream T-SQL

Varbinary

Filesystem Win32 Access Gain (%)T

hro

ug

hp

ut

(Mb

ps

)

Measured with SQL Server 2008

Page 16: SQL Server 2012 Beyond Relational Performance and Scale

FILESTREAM Write Performance (Remote)

240 KB 480 KB 1 MB 2 MB 4 MB 8 MB

-200

-100

0

100

200

300

400

500

600 Insert

Filestream Win32 (Filesys-tem) Access

Filestream T-SQL

Varbinary

Filesystem Win32 Access Gain (%)

Th

rou

gh

pu

t (M

bp

s)

Measured with SQL Server 2008

Page 17: SQL Server 2012 Beyond Relational Performance and Scale

Unstructured Data Scale-upMultiple Containers for FILESTREAM data

SQL 2008 R2Only one storage container/FILESTREAM filegroup

Limits storage capacity scaling and I/O scaling

SQL Server 2012Support for multiple storage containers/filegroup.

DDL Changes to Create/Alter Database statements

Ability to set max_size for the containers

DBCC Shrinkfile Emptyfile support

Scaling FlexibilityStorage scaling by adding additional storage drives

I/O scaling with multiple spindles

Page 18: SQL Server 2012 Beyond Relational Performance and Scale

Unstructured Data : Multiple containers

Use of multiple spindles for achieving better I/O Scalability

Page 19: SQL Server 2012 Beyond Relational Performance and Scale

RUDS Scale-up: FileStream Perf/ScaleImproved performance of T-SQL and File I/O access

Various enhancements to improve read/write throughput 5 fold increase in Read throughput

Linear scaling with large number of concurrent threads

2012 2012

Page 20: SQL Server 2012 Beyond Relational Performance and Scale

Full Text Search Improvements in SQL Server 2012Improved Performance and Scale:

Scale-up to 350M documents

iFTS query perf 7-10 times faster than in SQL Server 2008

Worst-case iFTS query response times < 3 sec for corpus

At par or better than main database search competitors

New Functionality:Property Search

customizable NEAR

New Wordbrakers: update existing WB, add Czech and Greek

Innovation in Search: Semantic Similarity Search

Page 21: SQL Server 2012 Beyond Relational Performance and Scale

Full Text Search Performance & Scale ImprovementsArchitectural Improvements

Improved internal implementation

Queries no longer block Index updates

Improved Query Plans: Better Plans for common queries

Fulltext predicate folding

Parallel Plan execution

Index and Query tested on scale up to 350Million documents with < ~2 Sec Response

~3X better w/o DML and ~9X better with DML throughput

Scale easily with increasing number of connections

Page 22: SQL Server 2012 Beyond Relational Performance and Scale

Scale-up: Full-Text Search

Queries over 350M documents database and random DMLs running in background. Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput

2012

2005/8

2005/8 vs 2012

Page 23: SQL Server 2012 Beyond Relational Performance and Scale

Scale-up: Full-Text Search

Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer playback benchmark

2012

2005/8

2005/8 vs 2012

Page 24: SQL Server 2012 Beyond Relational Performance and Scale

Performance and Scale for Spatial ApplicationsSupport Persisted computed spatial columnsNew geodetic SRID for faster calculationsImproved implementation of operations

Faster Spatial index creation for point data (4 to 5 times faster)Faster point data queriesOptimized STBuffer, lower memory footprintFaster “secondary” filter step

Improved default spatial indexing scheme and new hintsAutoGridQuery Window Grid density hint

Spatial Index CompressionImproved index-aware query plans

Nearest NeighborOptimized spatial query plan for STDistance and STIntersects like queries

Page 25: SQL Server 2012 Beyond Relational Performance and Scale

Support Persisted Computed Columns

Convert 2 columns (latitude, longitude) to geographyalter table MyTable

add geo as (geography::Point(lat, lon, 4326)) persisted

Page 26: SQL Server 2012 Beyond Relational Performance and Scale

Spatial Reference ID (SRID)Each Spatial object has an SRID associatedSRID is “locale” for spatial objects

Determines Coordinate systemMeasurementsProjection semanticsGeoid dimensions

Only objects of same SRID can operationally be combinedSRID for GEOMETRY (default: 0)

User-defined, no impact on operational semantics

SRID for GEOGRAPHY (default: WGS 84)Impacts operational semantics390 predefined SRIDs based on European Petroleum Survey Group List:select * from sys.spatial_reference_systemsSQL Server 2012: We added Microsoft specified UnitSphere SRID 104001 for a spherical globe!

Page 27: SQL Server 2012 Beyond Relational Performance and Scale

Spatial Indexing Basics

In general, split predicates in twoPrimary filter finds all candidates, possibly with false positives (but never false negatives)Secondary filter removes false positives

The index provides our primary filterOriginal predicate is our secondary filterSome tweaks to this scheme

Sometimes possible to skip secondary filter

A B

C

D A BD A BPrimary Filter (Index lookup)

Secondary Filter (Original predicate)E

Page 28: SQL Server 2012 Beyond Relational Performance and Scale

Spatial index tessellation

Better and more continuous coverage

64 cells 128 cells 256 cells

Fully contained

cellsPartially contained

cells

Page 29: SQL Server 2012 Beyond Relational Performance and Scale

Auto Grid Spatial Index

New spatial index Tessellations:

geometry_auto_gridgeography_auto_grid

Uses 8 Grid levels instead of the previous 4No GRIDS parameter needed (or available)

Fixed at HLLLLLLLdefault number of cells per object:

8 for geometry 12 for geography

More stable performance for windows of different sizefor data with different spatial density

For default values:Up to 2x faster for longer queries > 500 ms

More efficient primary filter Fewer rows returned

10ms slower for very fast queries < 50 ms

Increased tessellation time which is constant

Page 30: SQL Server 2012 Beyond Relational Performance and Scale

Spatial Index Performance

New grid gives much stable performance for query windows of different sizeBetter grid coverage gives fewer high peaks

Page 31: SQL Server 2012 Beyond Relational Performance and Scale

DEMOIndexing and Performance

Page 32: SQL Server 2012 Beyond Relational Performance and Scale

Query window number of cells

Typical spatial query performanceOptimal value (theoretical) is

somewhere between two extremes

Time needed to process false

positives

Default values:512 - Geometry AUTO grid768 - Geography AUTO grid1024 - MANUAL grids

SELECT * FROM table t WITH (SPATIAL_WINDOW_MAX_CELLS=256)WHERE t.geom.STIntersects(@window)=1;

Page 33: SQL Server 2012 Beyond Relational Performance and Scale

Query Window Hinting (SQL Server 2012)

• SELECT * FROM table t with(SPATIAL_WINDOW_MAX_CELLS=1024)WHERE t.geom.STIntersects(@window)=1

• Used if an index is chosen (does not force an index)• Overwrites the default (512 for geometry, 768 for geography)• Rule of thumb:

• Higher value makes primary filter phase longer but reduces work in secondary filter phase

• Set higher for dense spatial data • Set lower for sparse spatial data

Page 34: SQL Server 2012 Beyond Relational Performance and Scale

Query Hinting

demo

Page 35: SQL Server 2012 Beyond Relational Performance and Scale

Spatial Index Compression

CREATE SPATIAL INDEX idxGeog ON table(geography column) USING GEOGRAPHY_GRID WITH (  DATA_COMPRESSION = page | row  ); 

On the basis of internal tests, with compression- 40%-50% smaller

- 20% faster -15% slower queries- Per partition compression setting is not

supported.

Page 36: SQL Server 2012 Beyond Relational Performance and Scale

Additional Query Processing Support

• Index intersection• Enables efficient mixing of spatial and non-spatial

predicates• Matching

• New in SQL Server 2012: Nearest Neighbor query• Distance queries: convert to STIntersects• Commutativity: a.STIntersects(b) = b.STIntersects(a)• Dual: a.STContains(b) = b.STWithin(a)• Multiple spatial indexes on the same column

• Various bounding boxes, granularities• Outer references as window objects

• Enables spatial join to use one index

Page 37: SQL Server 2012 Beyond Relational Performance and Scale

Spatial Nearest Neighbor

Main scenarioGive me the closest 5 Italian restaurants

Execution plan SQL Server 2008/2008 R2: table scanSQL Server 2012: uses spatial index

Specific query pattern requiredSELECT TOP(5) *FROM Restaurants rWHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULLORDER BY r.pos.STDistance(@me)

Page 38: SQL Server 2012 Beyond Relational Performance and Scale

Nearest Neighbor Performance in SQL Server 2012

demo

Page 39: SQL Server 2012 Beyond Relational Performance and Scale

Nearest Neighbor Performance

NN query vs best current workaround (sort all points in 10km radius)

*Average time for NN query is ~236ms

Find the closest 50 business points to a specific location (out of 22 million in total)

Page 40: SQL Server 2012 Beyond Relational Performance and Scale

Spatial Tips on index settingsSome best practice recommendations (YMMV):• Start out with new default tessellation• Point data: always use HIGH for all 4 level. CELL_PER_OBJECT

are not relevant in the case.• Simple, relatively consistent polygons: set all levels to LOW or

MEDIUM, MEDIUM, LOW, LOW • Very complex LineString or Polygon instances:

• High number of CELL_PER_OBJECT (often 8192 is best)• Setting  all 4 levels to HIGH may be beneficial

• Polygons or line strings which have highly variable sizes: experimentation is needed. 

• Rule of thumb for GEOGRAPHY: if MMMM is not working, try HHMM

Page 41: SQL Server 2012 Beyond Relational Performance and Scale

What to do if my Spatial Query is slow?• Make sure you are running SQL Server 2008 SP1, 2008 R2 or

2012• Check query plan for use of index• Make sure it is a supported operation• Hint the index (and/or a different join type)• Do not use a spatial index when there is a highly selective non-

spatial predicate• Run above index support procedure:

• Assess effectiveness of primary filter (Primary_Filter_Efficiency)• Assess effectiveness of internal filter (Internal_Filter_Efficiency)• Redefine or define a new index with better characteristics

• More appropriate bounding box for GEOMETRY• Better grid densities

Page 42: SQL Server 2012 Beyond Relational Performance and Scale

Related ContentSome Rich Unstructured Data Presentations (with further links):

http://www.slideshare.net/MichaelRys/sql-bits-brrudshttp://www.slideshare.net/MichaelRys/filetable-and-semantic-search-in-sql-server-2012 http://www.sqlserverlaunch.com/WW/theater?sid=634

Some Spatial Presentations (with further links):http://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatialhttp://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatial-indexing

Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1

Find Us Later At…On Twitter: @SQLServerMike, @Spatial_EdBlogs: http://sqlblog.com/blogs/michael_rys, http://blogs.msdn.com/b/edkatibah/