SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.
-
Upload
roger-oneal -
Category
Documents
-
view
221 -
download
0
Transcript of SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014.
SAS Grid at Statistics Canada
BY: Yves DeGuire Statistics Canada
June 12, 2014
Agenda
• SAS at Statistics Canada• What is the StatCan SAS
Grid?• Migration and Use Cases• Lessons Learned• Looking Forward
Statistics Canada
• Canada’s central statistical agency.• Mandate to collect, compile, analyse and
publish statistical information on the economic, social and general conditions of the country and its citizens.
• Mandate is fulfilled under the authority of the Statistics Act which prohibits the disclosure of identifiable information.
Crunching numbers is our business!
Processing Analysis
SAS@StatCan Where?
Collection Dissemination
Input Database
Clean Microdata
Output Database
Survey Lifecycle
SAS@StatCan What?
• Data processing• Application development• Query and reporting• Statistical analysis• Exploratory data analysis• “Specialised” computations (time-series,
optimization, matrix operations, etc.)
SAS@StatCan How?
• SAS/SHARE• SAS/STAT• SAS/TOOLKIT• Integration Technologies• Enterprise Guide • Enterprise Platform• DI Server• JMP• Grid Manager
• Base SAS • SAS/ACCESS • SAS/AF • SAS/CONNECT • SAS/ETS• SAS/GRAPH• SAS/IML• SAS/Intrnet• SAS/OR
SAS@StatCan Some Numbers!
•2,500,000 SAS jobs run every year•4,000 PC-SAS installations•2,500 active SAS users•450 production applications•80 Windows servers•25 Unix servers•20 platforms •3 versions of SAS: 9.1.3, 9.2 and 9.3•1 grid!
SAS@StatCan More than 2500 Users!
*
What is the StatCan SAS Grid?
• A complete SAS Platform deployment utilizing the SAS Grid Manager 9.4.
• Available to the entire Agency via a Hosting service.• Part of the Network Transformation Initiative (NTI)• 3 objectives:
– Consolidate 100+ SAS servers (Phase 1)– Migrate processing from workstations to the grid (Phase 2)– Enable new computing initiatives/possibilities (Phase 1 & 2)
StatCan Grid Milestones
• 2005-2010: Several “home-made” grids developed over the years using Base SAS and SAS/CONNECT
• 2011: first test grid based on Grid Manager• 2013: enhanced test grid released• May 2014: production grid released for IBSP (V1)• Q3 2014: full production grid will be released for
general availability (V2)
A Few Impressive Results while Testing the Grid
• Capital stock calculation: 89% improvement on elapsed time (2005)
• Audit module in G-Confid: Over 90% improvement on elapsed time (2009)
• NHS-Tax Linkage project: from 59 hours to 50 minutes using G-Link V3 (2012)
• Simulations with CCHS data: hundreds of simulations run in a few hours compared to days on a workstation. (2013)
Why the StatCan Grid?
• Reduced costs $ $ $• Process Higher Volume of Data. • Process data in less time. • Scalable • Secure • Centrally managed• Usage metrics
Implementation Highlights (phase 1)
Shared File System
Clustered
2-tier storage
80 TB
SAS Metadata Server
Node1Node2Node3Node4Node5Node6Node7Node8Node9Node10Node11Node12Node13Node14Node15Node16
Node1Node2Node3Node4Node5Node6Node7Node8Node9Node10Node11Node12Node13Node14Node15Node16
16 cores
256GB ram
IntelX86_64
Grid Nodes
SAS Platform Clients
Web Clients and Services
SAS Mid-Tier
The Transparent Grid
One of the objectives of the grid is to make the user experience as transparent as possible.
Single sign-onSamba shares
Helpers (Macros, Stored Processes)
SAS Grid Data Tier
• Data Files (must “live” on the CFS)– Flat files / SAS files– PC files (Excel spreadsheets, etc.)– Exposed to Windows via SAMBA
• Databases:– SQL*Server– ORACLE– Sybase
Migration Requirements
The StatCan SAS grid is a “pure” SAS compute service!
Platform clients only such as Enterprise Guide
No host commands available
SAS/Access to PC File formats with limitations
No direct access to Windows Shares
SAS 9.4 and SAS 9.3M1 supported
Use Cases
•Use Case #1: Ad hoc users•Users who need to process/analyze data “on-demand”•Large number of concurrent users
•Use Case #2: Batch Jobs•SAS Jobs that run unattended.•A new mainframe!!!
•Use Case #3: Parallel Processing•Jobs broken into smaller tasks and dispatched to the grid.•Myth: a SAS program will execute in parallel with no modifications!
Lessons Learned
• A SAS grid project is an also infrastructure project.
• Linux offers some challenges to integrate with a Windows.
• Managing users expectations is critical.
• Resistance to change must be managed.
• Start simple and build on success.
• Be proactive: plan/think about your next SAS environment.
Looking Forward
• Phase 1: consolidate 80 servers over the next 2 years.• Phase 2:
• Introduce a new grid at SSC Data Centre.• Complete servers consolidation started in Phase1.• Migrate workstation processing to the grid.
Are there opportunities to collaborate with other
departments?
Thank You!
Yves DeGuireSection ChiefSystem Engineering DivisionStatistics CanadaR.-H.-Coats Building 14 A100, Tunney’s Pasture drivewayOttawa, Ont., K1A 0T6
(613) 951-1282