Programmability in spss 14

40
Programmability in SPSS 14: A Radical Increase in Power A Platform for Statistical Applications Jon K. Peck Technical Advisor SPSS Inc. [email protected] May, 2006 Copyright (c) SPSS Inc, 2006

Transcript of Programmability in spss 14

Programmability in SPSS 14:

A Radical Increase in PowerA Platform for Statistical Applications

Jon K. PeckTechnical AdvisorSPSS [email protected], 2006

Copyright (c) SPSS Inc, 2006

1. External Programming Language (BEGIN PROGRAM)

2. Multiple Datasets

3. XML Workspace and OMS Enhancements

4. Dataset and Variable Attributes

5. Drive SPSS Processor Externally

Working together, they dramatically increase the power of SPSS.

SPSS becomes a platform that enables you to build statistical/data manipulation applications.GPL provides new programming power for graphics.

The Five Big ThingsThe Five Big Things

Copyright (c) SPSS Inc, 2006

Many datasets open at once

One is active at a time (set by syntax or UI) DATASET ACTIVATE command

Each dataset has a Data Editor window

Copy, paste, and merge between windows

Write tabular results to a dataset using Output

Management System Retrieve via Programmability

No longer necessary to organize jobs linearly

Multiple DatasetsMultiple Datasets

Copyright (c) SPSS Inc, 2006

XML WorkspaceXML Workspace

Store dictionary and selected results in workspace

Write results to workspace as XML with Output

Management System (OMS)

Retrieve selected contents from workspace via

external programming language

Persists for entire session

Copyright (c) SPSS Inc, 2006

OMS Output: XML or DatasetOMS Output: XML or Dataset

Write tabular results to Datasets with OMS Main dataset remains active Prior to SPSS 14, write to SAV file, close active, and open to use

results

Tables can be accessed via workspace or as datasets

XML workspace and XPath accessors are very general Accessed via programmability functions

Dataset output more familiar to SPSS users Accessed via programmability functions or traditional SPSS syntax Use with DATASET ACTIVATE command

Copyright (c) SPSS Inc, 2006

AttributesAttributes

Extended metadata for files and variables VARIABLE ATTRIBUTE, DATAFILE ATTRIBUTE

Keep facts and notes about data permanently with the data. E.g., validation rules, source, usage, question text, formula

Two kinds: User defined and SPSS defined

Saved with the data in the SAV file

Can be used in program logic

Copyright (c) SPSS Inc, 2006

ProgrammabilityProgrammability Integrates external programming language into SPSS syntax

BEGIN PROGRAM … END PROGRAM set of functions to communicate with SPSS

SPSS has integrated the Python language SDK enabling other languages available New: VB.NET available soon

External processes can drive SPSS Processor VB.NET works only in this mode

SPSS Developer Central has SDK, Python Integration Plug-In,

and many extension modules

Available for all SPSS 14 platforms

Copyright (c) SPSS Inc, 2006

The Python LanguageThe Python Language

Free, portable, elegant, object oriented, versatile, widely supported, easy to learn,…

Download from Python.org. Version 2.4.1 or later required

Python tutorial

Python user discussion list

The Cheeseshop: Third-party modules

Copyright (c) SPSS Inc, 2006

Legal NoticeLegal Notice

SPSS is not the owner or licensor of the Python

software. Any user of Python must agree to the

terms of the Python license agreement located on

the Python web site. SPSS is not making any

statement about the quality of the Python program.

SPSS fully disclaims all liability associated with

your use of the Python program.

Copyright (c) SPSS Inc, 2006

Programmability Enables…Programmability Enables…

Generalized jobs by controlling logic based on Variable Dictionary Procedure output (XML or datasets) Case data (requires SPSS 14.0.1) Environment

Enhanced data management

Manipulation of output

Computations not built in to SPSS

Use of intelligent Python IDE driving SPSS (14.0.1) statement completion, syntax checking, and debugging

External Control of SPSS Processor

Copyright (c) SPSS Inc, 2006

Programmability Makes Programmability Makes Obsolete…Obsolete…

SPSS Macro except as a shorthand for lists or constants Learning Python is much easier than learning Macro

SaxBasic except for autoscripts

but autoscripts become less important

These have not gone away.

The SPSS transformation language continues to be

important.

Copyright (c) SPSS Inc, 2006

DemonstrationDemonstration

Code and supporting modules can be downloaded

from SPSS Developer Central

examples are on the CD

Copyright (c) SPSS Inc, 2006

Initialization for ExamplesInitialization for Examples

* SPSS Directions, May 2006.

* In preparation for the examples, specify where SPSS

standard data files reside.

BEGIN PROGRAM.

import spss, spssaux

spssaux.GetSPSSInstallDir("SPSSDIR")

END PROGRAM.

This program creates a File Handle pointing to the SPSS installation

directory, where the sample files are installed

Copyright (c) SPSS Inc, 2006

* EXAMPLE 0: My first program.BEGIN PROGRAM.import spssprint "Hello, world!"END PROGRAM.

Inside BEGIN PROGRAM, you write Python code.

import spss connects program to SPSS.

Import needed once per session.

Output goes to Viewer log items.

Executed when END PROGRAM reached.

RunRun Copyright (c) SPSS Inc, 2006

Example 0: Hello, worldExample 0: Hello, world

*Run an SPSS command from a program; create file handle.

BEGIN PROGRAM.import spss, spssaux

spss.Submit("SHOW ALL.")spssaux.GetSPSSInstallDir("SPSSDIR")END PROGRAM.

Submit, in module spss is called to run one or more SPSS commands within BEGIN PROGRAM.

One of many functions (API's) that interacts with SPSS.

GetSPSSInstallDir, in the spssaux module, creates a FILE HANDLE to that directory

RunRun Copyright (c) SPSS Inc, 2006

Example 1: Run SPSS CommandExample 1: Run SPSS Command

* Print useful information in the Viewer and then get help on an API.BEGIN PROGRAM.spss.Submit("GET FILE='SPSSDIR/employee data.sav'.")varcount = spss.GetVariableCount()casecount = spss.GetCaseCount()print "The number of variables is " + str(varcount) + " and the number of cases is " + str(casecount)print help(spss.GetVariableCount)END PROGRAM.

There are API's in the spss module to get variable dictionary

information.

help function prints short API documentation in Viewer.

RunRun Copyright (c) SPSS Inc, 2006

Example 2: Some API'sExample 2: Some API's

Example 3a: Data-Directed Example 3a: Data-Directed AnalysisAnalysis

* Summarize variables according to measurement level.BEGIN PROGRAM.import spss, spssaux

spssaux.OpenDataFile("SPSSDIR/employee data.sav")

# make variable dictionaries by measurement levelcatVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal'])scaleVars = spssaux.VariableDict(variableLevel=['scale'])

print "Categorical Variables\n"for var in catVars:

print var, var.VariableName, "\t", "var.VariableLabel"

Continued

Copyright (c) SPSS Inc, 2006

# summarize variables based on measurement levelif catVars:

spss.Submit("FREQ " + " ".join(catVars.variables))if scaleVars:

spss.Submit("DESC "+" ".join(scaleVars.variables))

# create a macro listing scale variablesspss.SetMacroValue("!scaleVars", "

".join(scaleVars.variables))END PROGRAM.

DESC !scaleVars.

" ".join(['x', 'y', 'z']) produces 'x y z'

RunRun Copyright (c) SPSS Inc, 2006

Example 3a (continued)Example 3a (continued)

* Handle an error. Use another standard Python module.BEGIN PROGRAM.import sys

try:spss.Submit("foo.")

except:print "That command did not work! ", sys.exc_info()[0]

END PROGRAM.

Errors generate exceptions Makes it easy to check whether a long syntax job worked

Hundreds of standard modules and many others available from SPSS and third parties

RunRun Copyright (c) SPSS Inc, 2006

Example 5: Handling ErrorsExample 5: Handling Errors

* Create set of dummy variables for a categorical variable and a macro name for them.

BEGIN PROGRAM.import spss, spssaux, spssaux2mydict = spssaux.VariableDict()spssaux2.CreateBasisVariables(mydict.["educ"], "EducDummy", macroname = "!EducBasis")

spss.Submit("REGRESSION /STATISTICS=COEF /DEP=salary" + "/ENTER=jobtime prevexp !EducBasis.")END PROGRAM.

Discovers educ values from the data and generates

appropriate transformation commands.

Creates macro !EducBasisRunRun Copyright (c) SPSS Inc, 2006

Example 8: Create Basis VariablesExample 8: Create Basis Variables

* Automatically add cases from all SAV files in a directory.

BEGIN PROGRAM.import glob

savlist = glob.glob("c:/temp/parts/*.sav")if savlist:

cmd = ["ADD FILES "] + ["/FILE='" + fn + "'" for fn in savlist] + [".", "EXECUTE."]spss.Submit(cmd)print "Files merged:\n", "\n".join(savlist)

else:print "No files found to merge"

END PROGRAM.

The glob module resolves file-system wildcards

If savlist tests whether there are any matching files.

RunRun Copyright (c) SPSS Inc, 2006

Example 9: Merge Directory Example 9: Merge Directory ContentsContents

* Run regression; get selected statistics, but do not display the regular Regression output. Use OMS and Xpath wrapper functions.BEGIN PROGRAM.import spss, spssauxspssaux.OpenDataFile("SPSSDIR/CARS.SAV")try:

handle, failcode = spssaux.CreateXMLOutput(\ "REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse

year.", visible=False)horseCoef = spssaux.GetValuesFromXMLWorkspace(\

handle, "Coefficients", rowCategory="Horsepower", colCategory="B",cellAttrib="number")

print "The effect of horsepower on acceleration is: ", horseCoef

Rsq = spssaux.GetValuesFromXMLWorkspace(\handle, "Model Summary", colCategory="R Square", cellAttrib="text")

print "The R square is: ", Rsqspss.DeleteXPathHandle(handle)

except:print "*** Regression command failed. No results available."raise

END PROGRAM. RunRun Copyright (c) SPSS Inc, 2006

Example 10: Use Parts of Output - Example 10: Use Parts of Output - XMLXML

BEGIN PROGRAM.import spss, Transformspssaux.OpenDataFile('SPSSDIR/employee data.sav')newvar = Transform.Compute(varname="average_increase",

varlabel="Salary increase per month of experience if at least a year",\

varmeaslvl="Scale",\varmissval=[999,998,997],\varformat="F8.4")

newvar.expression = "(salary-salbegin)/jobtime"newvar.condition = "jobtime > 12"newvar.retransformable=Truenewvar.generate() # Get exception if compute failsTransform.timestamp("average_increase")spss.Submit("DISPLAY DICT /VAR=average_increase.")spss.Submit("DESC average_increase.")END PROGRAM.

RunRun Copyright (c) SPSS Inc, 2006

Example 11: Transformations in Example 11: Transformations in Python SyntaxPython Syntax

BEGIN PROGRAM.import spss, Transform

try:Transform.retransform("average_increase")Transform.timestamp("average_increase")

except:print "Could not update average_increase."

else:spss.Submit("display dictionary"+\ "/variable=average_increase.")

END PROGRAM.

Transformation saved using Attributes

RunRun Copyright (c) SPSS Inc, 2006

Example 11A: Repeat TransformExample 11A: Repeat Transform

BEGIN PROGRAM.import spss, viewerspss.Submit("DESCRIPTIVES ALL")spssapp = viewer.spssapp()try: actualName = spssapp.SaveDesignatedOutput(\

"c:/temp/myoutput.spo")except: print "Save failed. Name:", actualNameelse: spssapp.ExportDesignatedOutput(\

"c:/temp/myoutput.doc", format="Word") spssapp.CloseDesignatedOutput()END PROGRAM.

RunRun Copyright (c) SPSS Inc, 2006

Example 12: Controlling the Example 12: Controlling the Viewer Using AutomationViewer Using Automation

BEGIN PROGRAM.import spss, spssauxfrom poisson_regression import *spssaux.OpenDataFile(\ 'SPSSDIR/Tutorial/Sample_Files/autoaccidents.sav')

poisson_regression("accident", covariates=["age"], factors=["gender"])END PROGRAM.

Poisson regression module built from SPSS CNLR and transformations commands.

PROGRAMS can get case data and use other Python modules or code on it.

RunRun Copyright (c) SPSS Inc, 2006

Example 13: A New Procedure Example 13: A New Procedure Poisson RegressionPoisson Regression

* Mean salary by education level.BEGIN PROGRAM.import spssdatadata = spssdata.Spssdata(indexes=('salary', 'educ')) Counts ={}; Salaries={}

for case in data:cat = int(case.educ)Counts[cat] = Counts.get(cat, 0) + 1Salaries[cat] = Salaries.get(cat,0) + case.salary

print "educ mean salary\n"for cat in sorted(Counts):

print " %2d $%6.0f" % (cat, Salaries[cat]/Counts[cat])del dataEND PROGRAM.

RunRun Copyright (c) SPSS Inc, 2006

Example 14: Using Case DataExample 14: Using Case Data

BEGIN PROGRAM.

# <accumulate Counts and Salaries as in Example 14>desViewer = viewer.spssapp().GetDesignatedOutput()rowcats = []; cells = []for cat in sorted(Counts):

rowcats.append(int(cat))cells.append(Salaries[cat]/Counts[cat])

ptable = viewer.PivotTable("a Python table",tabletitle="Effect of Education on Salary",caption="Data from employee data.sav",rowdim="Years of Education",rowlabels=rowcats,collabels=["Mean Salary"],cells = cells,tablelook="c:/data/goodlook.tlo")

ptable.insert(desViewer)END PROGRAM.

RunRun Copyright (c) SPSS Inc, 2006

Example 14a: Output As a Pivot Example 14a: Output As a Pivot TableTable

get file='c:/spss14/cars.sav'.DATASET NAME maindata.DATASET DECLARE regcoef.DATASET DECLARE regfit.OMS /IF SUBTYPE=["coefficients"]/DESTINATION FORMAT = sav OUTFILE=regcoef.OMS /IF SUBTYPE=["Model Summary"]/DESTINATION FORMAT = sav OUTFILE=regfit.REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse year.OMSEND.

Use OMS directly to figure out what to retrieve programmatically

Copyright (c) SPSS Inc, 2006

Exploring OMS Dataset OutputExploring OMS Dataset Output

BEGIN PROGRAM.import spss, spssaux, spssdatatry:

coefhandle, rsqhandle, failcode = spssaux.CreateDatasetOutput(\"REGRESSION /DEPENDENT accel /METHOD=ENTER

weight horse year.", subtype=["coefficients", "Model Summary"])cursor = spssdata.Spssdata(indexes=["Var2",

"B"], dataset=coefhandle)for case in cursor:

if case.Var2.startswith("Horsepower"):print "The effect of horsepower on acceleration is: ", case.B

cursor.close()

Copyright (c) SPSS Inc, 2006

Example 10a: Use Bits of Output - Example 10a: Use Bits of Output - DatasetsDatasets

cursor =spssdata.Spssdata(indexes=["RSquare"], dataset=rsqhandle)

row = cursor.fetchone()print "The R Squared is: ", row.RSquarecursor.close()

except:print "*** Regression command failed. No

results available."raise

spssdata.Dataset("maindata").activate()spssdata.Dataset(coefhandle).close()spssdata.Dataset(rsqhandle).close()END PROGRAM.

RunRun Copyright (c) SPSS Inc, 2006

Example 10a: Use Bits of Output – Example 10a: Use Bits of Output – Datasets (continued)Datasets (continued)

Variable Dictionary access

Procedures selected based on variable properties

Actions based on environment

Automatic construction of transformations

Error handling

Variables that remember their formulas

Management of the SPSS Viewer

New statistical procedure

Access to case data

Copyright (c) SPSS Inc, 2006

What We SawWhat We Saw

SPSS Processor (backend) can be embedded and

controlled by Python or other processes

Build applications using SPSS functionality

invisibly

Application supplies user interface

No SPSS Viewer

Allows use of Python IDE to build programs Pythonwin or many others

Copyright (c) SPSS Inc, 2006

Externally Controlling SPSSExternally Controlling SPSS

Copyright (c) SPSS Inc, 2006

PythonWin IDE Controlling SPSSPythonWin IDE Controlling SPSS

Extend SPSS functionality

Write more general and flexible jobs

Handle errors

React to results and metadata

Implement new features

Write simpler, clearer, more efficient code

Greater productivity

Automate repetitive tasks

Build SPSS functionality into other applications

Copyright (c) SPSS Inc, 2006

What Are the Programmability What Are the Programmability Benefits?Benefits?

SPSS 14 (14.0.1 for data access and IDE)

Python (visit Python.org) Installation Tutorial Many other resources

SPSS® Programming and Data Management, 3rd Edition: A Guide for SPSS® and SAS® Users new

SPSS Developer Central Python Plug-In (14.0.1 version covers 14.0.2) Example modules

Dive Into Python (diveintopython.org) book or PDF

Practical Python by Magnus Lie Hetland

Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher

Python and Plug-In

On the CD in SPSS 15

Copyright (c) SPSS Inc, 2006

Getting StartedGetting Started

Five power features of SPSS 14

Examples of programmability using Python

How to get started: materials and resources

Copyright (c) SPSS Inc, 2006

RecapRecap

??

??

Copyright (c) SPSS Inc, 2006

QuestionsQuestions

Working together these new features give you a dramatically more powerful SPSS.

SPSS becomes a platform that enables you to build your own statistical applications.

1. Programmability

2. Multiple datasets

3. XML Workspace and OMS enhancements

4. Attributes

5. External driver application

Copyright (c) SPSS Inc, 2006

In ClosingIn Closing

Jon Peck can now be reached at:

[email protected]

Copyright (c) SPSS Inc, 2006

ContactContact