Tame Your Unruly Data… With Constraints!

35
Tame Your Unruly Data… With Constraints! Rob Volk

description

Tame Your Unruly Data… With Constraints!. Rob Volk. Speaker Bio / Contact. SQL Server DBA/Developer since 1998 / 6.5 Started with dBase III & Clipper And of course MS Access Wireless telecom, utility billing, credit cards SQLTeam moderator & contributor - PowerPoint PPT Presentation

Transcript of Tame Your Unruly Data… With Constraints!

Page 1: Tame Your Unruly Data… With Constraints!

Tame Your Unruly Data…With Constraints!

Rob Volk

Page 2: Tame Your Unruly Data… With Constraints!

Speaker Bio / Contact

• SQL Server DBA/Developer since 1998 / 6.5• Started with dBase III & Clipper

• And of course MS Access• Wireless telecom, utility billing, credit cards• SQLTeam moderator & contributor• Blog: http://weblogs.sqlteam.com/robv/• Twitter: @sql_r• Email: [email protected]• I do deranged things in SQL Server, so you

don’t have to

Page 3: Tame Your Unruly Data… With Constraints!

What are Constraints?

• Constraints maintain the integrity of your database• Domain – what kind of data types/attributes• Entity – what kind of entities/objects/rows• Relational – control relations between entities

• Constraints don’t allow good data, they prevent “bad” data• But be mindful of what your constraints allow

• Let me tell you a story... (SalesOrderDetails)

constraint |kənˈstrānt|nouna limitation or restriction : the availability of water is the main constraint on food production | time constraints make it impossible to do everything.

Page 4: Tame Your Unruly Data… With Constraints!

Why Don’t We Use Constraints?

• Developer ignorance• Not necessarily stupid, just didn’t know

• Cross-database integrity not supported• Multi-platform support (or lack thereof)• Data cleansing as a career (e.g. Data Quality Services)• “The app does all the checking, we don’t need them”

• Yeah, right. We’ll get to that later

Page 5: Tame Your Unruly Data… With Constraints!

What aren’t Constraints?

• Identity columns• Application Logic• Triggers• Security• Development environment• Execution restrictions• Server Policies (PBM etc.)• XML Schema Collections (these are a good thing though)

Still important, just not managed using

Constraints

Still important, just not managed using

Constraints

Page 6: Tame Your Unruly Data… With Constraints!

Examples

• February 31• Incomplete phone numbers

• Still invalid if padded or trimmed• Duplicate rows*• Sales to non-existent customer accounts

• Did they get the merchandise? Did they pay us? WHO ARE THESE PEOPLE?

Page 7: Tame Your Unruly Data… With Constraints!

But really, how bad could it be?

• Ariane Flight 501 (16 bit integer overflow)• Mars Climate Orbiter (metric vs. US)• USS Yorktown (1997, divide by zero)• F-22 (crashes, Intl. Date Line) & F-35 (entire fleet grounded)• Qantas Flight 72 (didn’t crash, nose dived twice, injuries)• Patriot Missile (clock drift, missed target, soldiers died)• Therac-25 (deaths)

• Not data but process, lack of safeguards at every level• More: http://goo.gl/4Akqm

Page 8: Tame Your Unruly Data… With Constraints!

Domain Constraints

• Domain – fancy name for type or attribute (column in SQL)• Data type - what kind of data you’re storing (number, date, etc.)• Nullability - whether it’s required or not• Default• Check - what values are valid and invalid

• Demo (discovery script)

Page 9: Tame Your Unruly Data… With Constraints!

Domain Constraints - Types

• First constraint: Data type• Choose the right type• VARCHAR(max) is a bad sign• Float or real -> Bad!• Money -> Better (but still problematic)• Dates (use proper date/time types)

• Good data type = storage and performance efficiency• But don’t be TOO efficient

NVARCHAR(max) is overkill for a phone #

Ariane 501

Ariane 501

Page 10: Tame Your Unruly Data… With Constraints!

Domain Constraints – NOT NULL

• Limit or eliminate nullable columns• Nulls are not comparable, even to themselves

• Not applicable vs. unknown• Which one does Null represent?

• Nulls cannot be typed properly (or at all)• Read Chris Date• In SQL Server, nulls have storage overhead

Page 11: Tame Your Unruly Data… With Constraints!

Domain Constraints - Default

• Use this default when value is unknown or unsupplied• N/A vs. Unknown can be addressed with defined values• Can be useful in cascading foreign keys

• SET DEFAULT supported in SQL Server 2005+

Page 12: Tame Your Unruly Data… With Constraints!

Domain Constraints - CHECK

• Restrict values further than what data type allows• Dates• Numeric/Money• Character

• Read Alex Kuznetsov

Page 13: Tame Your Unruly Data… With Constraints!

Entity Constraints

• Entity – fancy name for row or record (tuple – even fancier)• Primary Key - value(s) that identify the entity• Unique - value(s) that must be unique• Table-level check - values that are valid or invalid

• One or more domains/attributes/columns• Demo (discovery script)

Page 14: Tame Your Unruly Data… With Constraints!

Domain Constraints - External

• Very often ignored, not all values are valid (special meaning)• Doesn’t matter if external system can’t provide invalid value• Read Joe Celko• Read Karen López (@DataChick)

Data, Measurements and Standards in SQL

Page 15: Tame Your Unruly Data… With Constraints!

Domain Constraints - Internal

• Very often ignored (and rarely documented)• $10,000 utility bills for single (unoccupied) apartment

• Percentage values (tax rate, discount)• See AdventureWorks Sales tables (smallmoney)

• App could handle, but then must also fix if wrong• Date range overlaps (Alex Kuznetsov)

Page 16: Tame Your Unruly Data… With Constraints!

Entity Constraints – Primary Key

• Uniquely identifies the row/record/tuple/entity• Must provide a value (NOT NULL by definition)• Logical construct, NOT physical• Surrogate vs. natural key• Celko: if you don’t have a key, you don’t have a table• Demo (discovery script)

Page 17: Tame Your Unruly Data… With Constraints!

Entity Constraints - Unique

• Candidate keys (natural keys)• Phone number is unique, but not good candidate for PK• Screen Actors Guild (billed/credited name)• SSN, Credit card # (unique but can’t be used for ID)

• Multiple columns are unique• Constraint vs. index (logical vs. physical)

Page 18: Tame Your Unruly Data… With Constraints!

Entity Constraints - CHECK

• Multi-column conditions• Pregnancy status (N/A for males or young girls)

• Unless you're British...http://goo.gl/msSeP• ANSI/ISO supports subqueries• Single table only• e.g. Force date- or time-only values in datetime columns (< SQL

2008)• Demo (discovery script)

Page 19: Tame Your Unruly Data… With Constraints!

Relational Constraints

• Foreign keys• Requires data in one table to reference another table’s data• Referenced table (parent)• Referencing table (child)• Can be one OR MORE columns

• Referential actions on UPDATE/DELETE in referenced• Demo (discovery script)

Page 20: Tame Your Unruly Data… With Constraints!

Demos

• AdventureWorks• Good (Birth and Hire dates)• Could be better (Salesperson commission pct.)• SalesOrderDetail

Page 21: Tame Your Unruly Data… With Constraints!

Performance

• UNIQUE/PRIMARY keys can improve performance• Demo• Can also save space in index structures

• CHECK and Foreign Key• Eliminate data access if WHERE clause violates CHECK constraint

• Must be trusted (next slides)

Page 22: Tame Your Unruly Data… With Constraints!

Trusted Constraints

• ALTER TABLE…CHECK/NOCHECK CONSTRAINT• CHECK WITH CHECK…• Performance improvement

• Demo• Partitioned views

Page 23: Tame Your Unruly Data… With Constraints!

Untrusted Constraints

• Constraint could be enabled but data could be invalid• Performance hit

• Demo• How to detect:

• sys.check_constraints.is_not_trusted = 1• sys.foreign_keys.is_not_trusted = 1

• Primary key and Unique are always trusted

Page 24: Tame Your Unruly Data… With Constraints!

Considerations

• Be wary of “meets our current needs”• Be wary of “will also meet all possible future needs”

• Aaron Bertrand “Bad Habits to Kick” blog• If there is a standard defined for your industry, use it (Celko – ISO)

• Sanity check, make sure you covered everything• If you must deviate from the standard, document why

• Multi-platform support• Doomed• Least-feature support = baseline

• Constraints (if any) go into app logic• Use DBCC CHECKCONSTRAINTS! (thanks to Neil Hambly for reminding me!)

• PASS Data Architecture Virtual Chapter! June 21 2012: http://dataarch.sqlpass.org/

Page 25: Tame Your Unruly Data… With Constraints!

DOMAIN Support

• ANSI/ISO• Combines data type, nullability, default and checks in one object• Not available in SQL Server

• Please vote on Connect: http://goo.gl/HbRo2• PostgreSQL support! Example:

CREATE DOMAIN us_postal_code AS varchar(10) NOT NULL DEFAULT('00000') CHECK( VALUE ~ '^\\d{5}$' OR VALUE ~ '^\\d{5}-\\d{4}$' );

• Can be simulated with user-defined types and rules…

Page 26: Tame Your Unruly Data… With Constraints!

Rules

• Deprecated vendor extension (Celko gripe)• Since 7.0, still supported in SQL 2012• Not part of formal definition of object

• Must reference data value and have condition on it• No “anonymous” rules like with constraints

• Only one rule can bind to an object• But that rule can have multiple conditions• And rule can bind to multiple objects

• Which is too bad, because…

Page 27: Tame Your Unruly Data… With Constraints!

Unusual Constraint Use

• Security• Prevent sa or sysadmin updates/inserts

• Execution• Ad-hoc updates/inserts• Cursors• Connection settings like ANSI_PADDING• Time-of-day• Production vs. Dev/QA/UAT environments (thanks Argenis!)

• See blog post: http://goo.gl/p23NL• Unusual data restrictions

Page 28: Tame Your Unruly Data… With Constraints!

Can you have too many?

• IMHO, no • What about performance?• Layers?

• App logic + stored procedure + triggers + constraints + rules• Don’t assume a single layer can do all the work

• What’s the cost of fixing bad data?• And will it actually get fixed?

• Importing/migrating bad data or design• Cruft from old systems• ETL – transform bad data, but don’t constrain it in destination

Never time to do it right,Always time to do it over

-Ron Soukup, SQL Server PM

Therac-25

Ariane 501

Page 29: Tame Your Unruly Data… With Constraints!

But…

• IF:• Customers don’t care• Boss doesn’t care• Devs/$$$ Contractor say app will cover

• Make them personally fix any bad data• Yes, that includes customers• Have contractor refund fee AND fix bad data at no cost

• If they don’t agree or can’t be convinced:• Add constraints anyway• Unit testing

An ounce of prevention is worth a

metric ton of cure

Page 30: Tame Your Unruly Data… With Constraints!

But…but…

• Now I’m just having fun…• Devs say Application does everything…

• And DB constraints will make sure it works• Then why use SQL Server? Write your own storage engine. Or use

NoSQL. The data will be right…eventually.• In 5 years we’ll rewrite your app in Ocaml, or Scala, or Blub

• Moral: you’re using SQL Server anyway, use what it offers• If app actually works correctly, constraints won’t interfere• Even if you switch RDBMS, USE CONSTRAINTS!

Credit: Andy Leonard & Buck Woody

Page 31: Tame Your Unruly Data… With Constraints!

And if you think I’m strict…

• SQLite testing procedure: http://goo.gl/GfvIH• 1177x as much test code as program code

• Space Shuttle Software Development: http://goo.gl/1k2kt• 11 versions * 400K+ lines each = 17 errors total

• F-35 (JSF) Software coding standard: http://goo.gl/4rPxu• 141 pages on C++ programming guidelines• Which is really ironic…

• Mars Spacecraft (Curiosity, Opportunity/Spirit, Phoenix)• Extremely limited resources (20 MHz/10 MB RAM)

Page 32: Tame Your Unruly Data… With Constraints!

Conclusion

• Types of constraints• Domain, Entity, Relational• Why you need them

• Process, procedure and environmental• Apps and triggers can’t reliably protect or enforce• New apps/processes must duplicate all logic

• Assume layers/components/processes will fail or change• Protect data and DB integrity with constraints• Constraints document integrity rules

• Use them! And be sure to check on them (DBCC)

Page 33: Tame Your Unruly Data… With Constraints!

References

• Books Online!• MSDN• Books

• Beginning SQL Server 2008 for Developers: From Novice to Professional by Robin Dewson (Apress)• Pro SQL Server 2008 Relational Database Design and Implementation by Louis Davidson, with Kevin Kline, Scott Klein, Kurt Windisch (Apress)• A Developer’s Guide to Data Modeling for SQL Server, by Eric Johnson and Joshua Jones (Addison-Wesley)• Defensive Database Programming by Alex Kuznetsov (Red Gate Books, PDF link below)• SQL Antipatterns: Avoiding the Pitfalls of Database Programming by Bill Karwin (Pragmatic Bookshelf)• SQL and Relational Theory: How to Write Accurate SQL Code by C. J. Date (O’Reilly)

• Blogs/Sites• http://weblogs.sqlteam.com/• http://sqlblog.com/ (lots of great people)• http://sqlskills.com/ (Paul Randal, Kimberly Tripp, Jonathan Kehayias, Bob Beuchemin, Erin Stellato, Joe Sack, Glenn Berry)• http://tsql.solidq.com/ (Itzik Ben-Gan)• http://www.simple-talk.com/sql/database-administration/constraints-and-the-test-driven-database/• http://www.simple-talk.com/books/sql-books/defensive-database-programming/• http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/28/using-rowversion-to-enforce-business-rules.aspx• http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/03/08/storing-intervals-of-time-with-no-overlaps.aspx• http://sqlblog.com/blogs/hugo_kornelis/archive/2007/03/29/can-you-trust-your-constraints.aspx• http://goo.gl/xCwdK (Datachix - Trusted Constraints)

• More Books• Microsoft SQL Server 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp, Conor Cunningham, Adam Machanic (MSPress)• SQL Server MVP Deep Dives by SQL Server MVPs for War Child International (Manning)• SQL Server MVP Deep Dives 2 by SQL Server MVPs for Operation Smile (Manning)• Microsoft SQL Server 2008: T-SQL Fundamentals by Itzik Ben-Gan (MSPress)• SQL for Smarties by Joe Celko (Morgan Kauffman)

Page 34: Tame Your Unruly Data… With Constraints!

Coming up…

#SQLBITS

Speaker Title Room

Leonard Lobel Geospatial Data Types in SQL Server 2012 Theatre

Mark Stacey Analysis Services for the DBA Exhibition B

Dennis Lloyd Jr Thwarting Database Defects Suite 3

Dmytro Andriychenko Kerberos for SQL Server and SharePoint the easy way Suite 1

Justin Langford Using Windows Azure to provide SQL Server Disaster Recovery Suite 2

Dave Ballantyne Estimation, Statistics and “Oh My!” Suite 4

Page 35: Tame Your Unruly Data… With Constraints!

Thanks!SQLBits Sponsors, Organizers & Volunteers!