SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

Post on 11-Jan-2016

221 views 0 download

Transcript of SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.

SAS Grid at Statistics Canada

BY: Yves DeGuire Statistics Canada

June 12, 2014

Agenda 

• SAS at Statistics Canada• What is the StatCan SAS

Grid?• Migration and Use Cases• Lessons Learned• Looking Forward

Statistics Canada

• Canada’s central statistical agency.• Mandate to collect, compile, analyse and

publish statistical information on the economic, social and general conditions of the country and its citizens.

• Mandate is fulfilled under the authority of the Statistics Act which prohibits the disclosure of identifiable information.

Crunching numbers is our business!

Processing Analysis

SAS@StatCan Where?

Collection Dissemination

Input Database

Clean Microdata

Output Database

Survey Lifecycle

SAS@StatCan What?

• Data processing• Application development• Query and reporting• Statistical analysis• Exploratory data analysis• “Specialised” computations (time-series,

optimization, matrix operations, etc.)

SAS@StatCan How?

• SAS/SHARE• SAS/STAT• SAS/TOOLKIT• Integration Technologies• Enterprise Guide • Enterprise Platform• DI Server• JMP• Grid Manager

• Base SAS • SAS/ACCESS • SAS/AF • SAS/CONNECT • SAS/ETS• SAS/GRAPH• SAS/IML• SAS/Intrnet• SAS/OR

SAS@StatCan Some Numbers!

•2,500,000 SAS jobs run every year•4,000 PC-SAS installations•2,500 active SAS users•450 production applications•80 Windows servers•25 Unix servers•20 platforms •3 versions of SAS: 9.1.3, 9.2 and 9.3•1 grid!

SAS@StatCan More than 2500 Users!

*

What is the StatCan SAS Grid?

• A complete SAS Platform deployment utilizing the SAS Grid Manager 9.4.

• Available to the entire Agency via a Hosting service.• Part of the Network Transformation Initiative (NTI)• 3 objectives:

– Consolidate 100+ SAS servers (Phase 1)– Migrate processing from workstations to the grid (Phase 2)– Enable new computing initiatives/possibilities (Phase 1 & 2)

StatCan Grid Milestones

• 2005-2010: Several “home-made” grids developed over the years using Base SAS and SAS/CONNECT

• 2011: first test grid based on Grid Manager• 2013: enhanced test grid released• May 2014: production grid released for IBSP (V1)• Q3 2014: full production grid will be released for

general availability (V2)

A Few Impressive Results while Testing the Grid

• Capital stock calculation: 89% improvement on elapsed time (2005)

• Audit module in G-Confid: Over 90% improvement on elapsed time (2009)

• NHS-Tax Linkage project: from 59 hours to 50 minutes using G-Link V3 (2012)

• Simulations with CCHS data: hundreds of simulations run in a few hours compared to days on a workstation. (2013)

Why the StatCan Grid?

• Reduced costs $ $ $• Process Higher Volume of Data. • Process data in less time. • Scalable • Secure • Centrally managed• Usage metrics

Implementation Highlights (phase 1)

Shared File System

Clustered

2-tier storage

80 TB

SAS Metadata Server

Node1Node2Node3Node4Node5Node6Node7Node8Node9Node10Node11Node12Node13Node14Node15Node16

Node1Node2Node3Node4Node5Node6Node7Node8Node9Node10Node11Node12Node13Node14Node15Node16

16 cores

256GB ram

IntelX86_64

Grid Nodes

SAS Platform Clients

Web Clients and Services

SAS Mid-Tier

The Transparent Grid

One of the objectives of the grid is to make the user experience as transparent as possible.

Single sign-onSamba shares

Helpers (Macros, Stored Processes)

SAS Grid Data Tier

• Data Files (must “live” on the CFS)– Flat files / SAS files– PC files (Excel spreadsheets, etc.)– Exposed to Windows via SAMBA

• Databases:– SQL*Server– ORACLE– Sybase

Migration Requirements 

The StatCan SAS grid is a “pure” SAS compute service!

Platform clients only such as Enterprise Guide

No host commands available

SAS/Access to PC File formats with limitations

No direct access to Windows Shares

SAS 9.4 and SAS 9.3M1 supported

Use Cases

•Use Case #1: Ad hoc users•Users who need to process/analyze data “on-demand”•Large number of concurrent users

•Use Case #2: Batch Jobs•SAS Jobs that run unattended.•A new mainframe!!!

•Use Case #3: Parallel Processing•Jobs broken into smaller tasks and dispatched to the grid.•Myth: a SAS program will execute in parallel with no modifications!

Lessons Learned

• A SAS grid project is an also infrastructure project.

• Linux offers some challenges to integrate with a Windows.

• Managing users expectations is critical.

• Resistance to change must be managed.

• Start simple and build on success.

• Be proactive: plan/think about your next SAS environment.

Looking Forward

• Phase 1: consolidate 80 servers over the next 2 years.• Phase 2:

• Introduce a new grid at SSC Data Centre.• Complete servers consolidation started in Phase1.• Migrate workstation processing to the grid.

Are there opportunities to collaborate with other

departments?

Thank You!

Yves DeGuireSection ChiefSystem Engineering DivisionStatistics CanadaR.-H.-Coats Building 14 A100, Tunney’s Pasture drivewayOttawa, Ont., K1A 0T6

(613) 951-1282

Yves.Deguire@statcan.gc.ca