SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

20
SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014

Transcript of SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Page 1: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

SAS Grid at Statistics Canada

BY: Yves DeGuire Statistics Canada

June 12, 2014

Page 2: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Agenda 

• SAS at Statistics Canada• What is the StatCan SAS

Grid?• Migration and Use Cases• Lessons Learned• Looking Forward

Page 3: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Statistics Canada

• Canada’s central statistical agency.• Mandate to collect, compile, analyse and

publish statistical information on the economic, social and general conditions of the country and its citizens.

• Mandate is fulfilled under the authority of the Statistics Act which prohibits the disclosure of identifiable information.

Crunching numbers is our business!

Page 4: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Processing Analysis

SAS@StatCan Where?

Collection Dissemination

Input Database

Clean Microdata

Output Database

Survey Lifecycle

Page 5: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

SAS@StatCan What?

• Data processing• Application development• Query and reporting• Statistical analysis• Exploratory data analysis• “Specialised” computations (time-series,

optimization, matrix operations, etc.)

Page 6: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

SAS@StatCan How?

• SAS/SHARE• SAS/STAT• SAS/TOOLKIT• Integration Technologies• Enterprise Guide • Enterprise Platform• DI Server• JMP• Grid Manager

• Base SAS • SAS/ACCESS • SAS/AF • SAS/CONNECT • SAS/ETS• SAS/GRAPH• SAS/IML• SAS/Intrnet• SAS/OR

Page 7: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

SAS@StatCan Some Numbers!

•2,500,000 SAS jobs run every year•4,000 PC-SAS installations•2,500 active SAS users•450 production applications•80 Windows servers•25 Unix servers•20 platforms •3 versions of SAS: 9.1.3, 9.2 and 9.3•1 grid!

Page 8: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

SAS@StatCan More than 2500 Users!

*

Page 9: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

What is the StatCan SAS Grid?

• A complete SAS Platform deployment utilizing the SAS Grid Manager 9.4.

• Available to the entire Agency via a Hosting service.• Part of the Network Transformation Initiative (NTI)• 3 objectives:

– Consolidate 100+ SAS servers (Phase 1)– Migrate processing from workstations to the grid (Phase 2)– Enable new computing initiatives/possibilities (Phase 1 & 2)

Page 10: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

StatCan Grid Milestones

• 2005-2010: Several “home-made” grids developed over the years using Base SAS and SAS/CONNECT

• 2011: first test grid based on Grid Manager• 2013: enhanced test grid released• May 2014: production grid released for IBSP (V1)• Q3 2014: full production grid will be released for

general availability (V2)

Page 11: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

A Few Impressive Results while Testing the Grid

• Capital stock calculation: 89% improvement on elapsed time (2005)

• Audit module in G-Confid: Over 90% improvement on elapsed time (2009)

• NHS-Tax Linkage project: from 59 hours to 50 minutes using G-Link V3 (2012)

• Simulations with CCHS data: hundreds of simulations run in a few hours compared to days on a workstation. (2013)

Page 12: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Why the StatCan Grid?

• Reduced costs $ $ $• Process Higher Volume of Data. • Process data in less time. • Scalable • Secure • Centrally managed• Usage metrics

Page 13: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Implementation Highlights (phase 1)

Shared File System

Clustered

2-tier storage

80 TB

SAS Metadata Server

Node1Node2Node3Node4Node5Node6Node7Node8Node9Node10Node11Node12Node13Node14Node15Node16

Node1Node2Node3Node4Node5Node6Node7Node8Node9Node10Node11Node12Node13Node14Node15Node16

16 cores

256GB ram

IntelX86_64

Grid Nodes

SAS Platform Clients

Web Clients and Services

SAS Mid-Tier

Page 14: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

The Transparent Grid

One of the objectives of the grid is to make the user experience as transparent as possible.

Single sign-onSamba shares

Helpers (Macros, Stored Processes)

Page 15: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

SAS Grid Data Tier

• Data Files (must “live” on the CFS)– Flat files / SAS files– PC files (Excel spreadsheets, etc.)– Exposed to Windows via SAMBA

• Databases:– SQL*Server– ORACLE– Sybase

Page 16: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Migration Requirements 

The StatCan SAS grid is a “pure” SAS compute service!

Platform clients only such as Enterprise Guide

No host commands available

SAS/Access to PC File formats with limitations

No direct access to Windows Shares

SAS 9.4 and SAS 9.3M1 supported

Page 17: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Use Cases

•Use Case #1: Ad hoc users•Users who need to process/analyze data “on-demand”•Large number of concurrent users

•Use Case #2: Batch Jobs•SAS Jobs that run unattended.•A new mainframe!!!

•Use Case #3: Parallel Processing•Jobs broken into smaller tasks and dispatched to the grid.•Myth: a SAS program will execute in parallel with no modifications!

Page 18: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Lessons Learned

• A SAS grid project is an also infrastructure project.

• Linux offers some challenges to integrate with a Windows.

• Managing users expectations is critical.

• Resistance to change must be managed.

• Start simple and build on success.

• Be proactive: plan/think about your next SAS environment.

Page 19: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Looking Forward

• Phase 1: consolidate 80 servers over the next 2 years.• Phase 2:

• Introduce a new grid at SSC Data Centre.• Complete servers consolidation started in Phase1.• Migrate workstation processing to the grid.

Are there opportunities to collaborate with other

departments?

Page 20: SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Thank You!

Yves DeGuireSection ChiefSystem Engineering DivisionStatistics CanadaR.-H.-Coats Building 14 A100, Tunney’s Pasture drivewayOttawa, Ont., K1A 0T6

(613) 951-1282

[email protected]