Tips and Tricks for Producing Easily Maintainable Code in SIR or Using SIR Compiler Directives to...

27
Producing Easily Producing Easily Maintainable Code in SIR Maintainable Code in SIR or or Using SIR Compiler Using SIR Compiler Directives to produce Directives to produce Data Driven Systems Data Driven Systems Frances Williams Frances Williams Institute for Social and Economic Institute for Social and Economic Research Research University of Essex University of Essex
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    6

Transcript of Tips and Tricks for Producing Easily Maintainable Code in SIR or Using SIR Compiler Directives to...

Tips and Tricks for Producing Tips and Tricks for Producing Easily Maintainable Code in Easily Maintainable Code in

SIRSIRor or

Using SIR Compiler Directives Using SIR Compiler Directives to produce Data Driven to produce Data Driven

SystemsSystems

Frances WilliamsFrances Williams

Institute for Social and Economic Institute for Social and Economic ResearchResearch

University of EssexUniversity of Essex

IntroductionIntroduction

Data driven systemsData driven systems SIR Compiler DirectivesSIR Compiler Directives

– GLOBALGLOBAL– CIFCIF– DO REPEAT DO REPEAT

Brief description of the projectBrief description of the project Towards a data driven systemTowards a data driven system Hidden time-bombsHidden time-bombs

Data Driven SystemsData Driven Systems

A large systemA large system– Lots of retrievals / programsLots of retrievals / programs– Requires substantial modification each Requires substantial modification each

yearyear Aim to define what needs changing Aim to define what needs changing

up topup top As little modification to underlying As little modification to underlying

code as possiblecode as possible

Compiler Directives - Compiler Directives - GLOBALGLOBAL

Defined anywhere within SIR Defined anywhere within SIR environmentenvironment– Keeps value across RETRIEVALsKeeps value across RETRIEVALs– Used in places where variables cannot Used in places where variables cannot

be usedbe used GLOBAL RECNAM = ARECGLOBAL RECNAM = AREC ………………………….... Process rec <RECNAM>Process rec <RECNAM>

Value substituted at compile timeValue substituted at compile time

Compiler Directives CIFCompiler Directives CIF Compile IFCompile IF Determines whether code is to be compiled or notDetermines whether code is to be compiled or not Usually used with GLOBALsUsually used with GLOBALs

GLOBAL WANTA = 1GLOBAL WANTA = 1

CIF DEF <WANTA>CIF DEF <WANTA>.. Call FAM.WANTACall FAM.WANTACIF FALSECIF FALSE.. Write “WANTA not called”Write “WANTA not called”CIF ENDCIF END

CIF EQ <WANTA>, 1CIF EQ <WANTA>, 1

Compiler Directives – DO Compiler Directives – DO REPEATREPEAT

For repetitive pieces of codeFor repetitive pieces of code Expanded by the compilerExpanded by the compiler Can have any number of repeat Can have any number of repeat

symbols which map to a parameter listsymbols which map to a parameter list Parameter lists should be of the same Parameter lists should be of the same

lengthlength Cannot be nestedCannot be nested Extremely usefulExtremely useful

Compiler Directives – DO Compiler Directives – DO REPEATREPEAT

DO REPEAT rtype = AREC BREC CREC / DO REPEAT rtype = AREC BREC CREC /

v1 = AVAR1 BVAR1 CVAR1 /v1 = AVAR1 BVAR1 CVAR1 /v2 = AVAR2 BVAR2 CVAR2 /v2 = AVAR2 BVAR2 CVAR2 /

.. process rec rtypeprocess rec rtype

.. Compute v1 = v2Compute v1 = v2

. . end recend rec END REPEATEND REPEAT

Compiler Directives – DO Compiler Directives – DO REPEATREPEAT

This is expanded by the compiler to:This is expanded by the compiler to:

Process rec ARECProcess rec AREC.. Compute AVAR1 = AVAR2Compute AVAR1 = AVAR2End recEnd recProcess rec BRECProcess rec BREC.. Compute BVAR1 = BVAR2Compute BVAR1 = BVAR2End recEnd recProcess rec CRECProcess rec CREC.. Compute CVAR1 = CVAR2Compute CVAR1 = CVAR2End recEnd rec

Brief Description of ProjectBrief Description of Project

Very BriefVery Brief Large Social Science SurveyLarge Social Science Survey

– Interview 15000 people each yearInterview 15000 people each year– Interview the same people each yearInterview the same people each year– ~20% questions change~20% questions change– 1313thth year year– Data added to Survey databaseData added to Survey database– Converted to User databaseConverted to User database

Brief Description of Project Brief Description of Project (cont)(cont)

Conversion ProcessConversion Process– Conversion done once a yearConversion done once a year

This year’s data addedThis year’s data added

– Structure flattenedStructure flattened– Variable names changedVariable names changed– Derived variables calculatedDerived variables calculated– Imputations doneImputations done– Weightings calculatedWeightings calculated– Output into SPSS, SAS and StataOutput into SPSS, SAS and Stata

Brief Description of Project Brief Description of Project (cont)(cont)

Code written by researchers, not Code written by researchers, not programmers long agoprogrammers long ago

It works but …It works but …– Needs a lot of modifying each yearNeeds a lot of modifying each year– Difficult to know where things need Difficult to know where things need

changingchanging– It contains errorsIt contains errors

Tight deadlinesTight deadlines Aim to create data driven systemAim to create data driven system

Brief Description of Project Brief Description of Project (cont)(cont)

If it aint broke don’t fix itIf it aint broke don’t fix it– Particularly if you don’t have much timeParticularly if you don’t have much time– But if code needs changing anyway …But if code needs changing anyway …

Step by Step approachStep by Step approach– Choose one section to rewriteChoose one section to rewrite– Rewrite so minimal changes required in futureRewrite so minimal changes required in future– Make sure changes workMake sure changes work– More can be done next yearMore can be done next year

Have rewritten the code for derived Have rewritten the code for derived variablesvariables

Towards a Data Driven SystemTowards a Data Driven System

Derived VariablesDerived Variables– Non questionnaire variables calculated Non questionnaire variables calculated

from questions asked or other derived from questions asked or other derived variablesvariables

– Same each year – calculated from core Same each year – calculated from core variablesvariables

– Some easy to calculate, some difficultSome easy to calculate, some difficult– Re-write so underlying code never needs Re-write so underlying code never needs

changingchanging

Towards a Data Driven Towards a Data Driven SystemSystem

Variable namingVariable naming– Wave prefix plus root nameWave prefix plus root name– Wave prefix Wave prefix

‘‘A’ year 1A’ year 1 ‘‘B’ year 2B’ year 2 ‘‘M’ year 13M’ year 13

– Root name invariable - HSROOMRoot name invariable - HSROOM– At wave 13, MHSROOMAt wave 13, MHSROOM

Towards a Data Driven SystemTowards a Data Driven SystemGLOBALSGLOBALS

GLOBAL WP = M GLOBAL WP = M GLOBAL CURYR = 2003GLOBAL CURYR = 2003 Globals for values and conditions Globals for values and conditions

that might changethat might change <WP>VOTE<WP>VOTE

– Calculated from one of two variables, Calculated from one of two variables, <WP>VOTE3 or <WP>VOTE5<WP>VOTE3 or <WP>VOTE5

Towards a Data Driven SystemTowards a Data Driven SystemGLOBALSGLOBALS

Code incorrect!Code incorrect! GLOBAL MAXVOTE = 17GLOBAL MAXVOTE = 17

Towards a Data Driven SystemTowards a Data Driven SystemGLOBALSGLOBALS

Ask respondents about income Ask respondents about income received from non-earningsreceived from non-earnings– PensionsPensions– Child benefitChild benefit– Disability paymentsDisability payments– Income supportIncome support– Rent etcRent etc

Towards a Data Driven SystemTowards a Data Driven SystemGLOBALSGLOBALS

Create derived variablesCreate derived variables– State benefit paymentsState benefit payments– Non-state pension paymentsNon-state pension payments– Rent payments etcRent payments etc

Payments change over time Payments change over time Replace hard coded conditions with Replace hard coded conditions with

GLOBALSGLOBALS

Towards a Data Driven SystemTowards a Data Driven SystemGLOBALSGLOBALS

……………….... ifthen (FICODE eq 1 orifthen (FICODE eq 1 or

(FICODE GE 5 and FICODE LE (FICODE GE 5 and FICODE LE 42))42))

CC Current condition for state benefit Current condition for state benefit paymentspayments

………………………………………………....

……………………………………………….... end ifend if

Towards a Data Driven SystemTowards a Data Driven SystemGLOBALSGLOBALS

GLOBAL FIBCOND = GLOBAL FIBCOND =

$FICODE eq 1 or (FICODE GE 5 and $FICODE eq 1 or (FICODE GE 5 and FICODE LE 42)$FICODE LE 42)$

.. ifthen (<FIBCOND>)ifthen (<FIBCOND>)…………………………………………………………....

…………………………………………………………......

.. end ifend if

Towards a Data Driven System Towards a Data Driven System CIFCIF

Lots of retrievals which update databaseLots of retrievals which update database Want to test code firstWant to test code first

– GLOBAL UPDATE = UPDATE,GLOBAL UPDATE = UPDATE,– GLOBAL DEBUG = 1,GLOBAL DEBUG = 1,– CC– Retrieval <UPDATE>Retrieval <UPDATE>

………………………… CIF DEF <UPDATE>CIF DEF <UPDATE> .. Put vars <WP>VARA = VARAPut vars <WP>VARA = VARA CIF FALSECIF FALSE .. CIF EQ <DEBUG>, 1CIF EQ <DEBUG>, 1 .. Write “<WP>VARA would be updated to Write “<WP>VARA would be updated to

”, VARA”, VARA .. CIF ENDCIF END CIF ENDCIF END ………………....

– End retrievalEnd retrieval

Towards a Data Driven System Towards a Data Driven System DO REPEAT and GLOBALSDO REPEAT and GLOBALS

Determines which job is the last job a Determines which job is the last job a respondent has hadrespondent has had– Current job if there is one, otherwise use latest Current job if there is one, otherwise use latest

job last time respondent was interviewedjob last time respondent was interviewed– Procedure about 250 lines in lengthProcedure about 250 lines in length– New bits to be added each year throughout the New bits to be added each year throughout the

procedureprocedure Lots of similar pieces of code e.g. highest Lots of similar pieces of code e.g. highest

educational qualificationseducational qualifications

Towards a Data Driven System Towards a Data Driven System DO REPEAT and GLOBALSDO REPEAT and GLOBALS

GLOBAL RWP = $L K J I H G F E D C B A$GLOBAL RWP = $L K J I H G F E D C B A$ GLOBAL RVAL = $24 23 22 21 20 19 18 17 16 15 14 13 $GLOBAL RVAL = $24 23 22 21 20 19 18 17 16 15 14 13 $ CC BEGINBEGIN .. process record XWAVEID with (PID)process record XWAVEID with (PID) CC .. do repeat do repeat WP = <RWP> /WP = <RWP> / VAL = <RVAL> /VAL = <RVAL> / .. get vars WP!HID WP!PNO get vars WP!HID WP!PNO .. ifthen (WP!IVFIO eq 1)ifthen (WP!IVFIO eq 1) .. old rec is WP!INDRESP (WP!HID WP!PNO)old rec is WP!INDRESP (WP!HID WP!PNO) .. get vars JLID=WP!JLIDget vars JLID=WP!JLID .. ifthen ( JLID ge 0 and JLID le 12 )ifthen ( JLID ge 0 and JLID le 12 ) .. compute JLID = VALcompute JLID = VAL .. end ifend if .. end recordend record .. exit beginexit begin .. end ifend if . end repeat. end repeat .. end recend rec END BEGINEND BEGIN

Towards a Data Driven System Towards a Data Driven System DO REPEAT and GLOBALSDO REPEAT and GLOBALS

Code never needs changing Code never needs changing Globals defined once at beginning of Globals defined once at beginning of

codecode Work every year (until year 27!)Work every year (until year 27!) Used as often as requiredUsed as often as required

Hidden Time BombsHidden Time Bombs

Database variables have wave prefixDatabase variables have wave prefix Procedure calculates identifier of last Procedure calculates identifier of last

job as JLID and then updates DB job as JLID and then updates DB variable <WP>JLIDvariable <WP>JLID

Calculates ‘latest job found’ as Calculates ‘latest job found’ as temporary variable LJLIDtemporary variable LJLID

Fine until wave 12Fine until wave 12 Even more at wave 14Even more at wave 14

– N<rootvar> used as counter in many N<rootvar> used as counter in many placesplaces

Hidden Time BombsHidden Time Bombs

Ensure names are different from any Ensure names are different from any DB variable name DB variable name – CurrentCurrent– Or futureOr future– Eg X_TEMPV if underscores never used Eg X_TEMPV if underscores never used

for DB namesfor DB names– Much easier with SIR XSMuch easier with SIR XS

Would not have detected if run in Would not have detected if run in non-updating modenon-updating mode

ConclusionConclusion

Make use of SIR compiler directives Make use of SIR compiler directives to produce data driven systemto produce data driven system

Rewrite sections so that they will Rewrite sections so that they will never need rewriting againnever need rewriting again

No need to do everything at onceNo need to do everything at once Name temporary variables carefullyName temporary variables carefully