Aldi Kraja October 17, 2008. A. where is the location of my work? [email protected]...

19
Aldi Kraja October 17, 2008

Transcript of Aldi Kraja October 17, 2008. A. where is the location of my work? [email protected]...

Aldi KrajaOctober 17 2008

A where is the location of my work mylogindsgclusterdsgwustledu dsgusermylogin Where do I go from here For example dsgmntdsg200geneticsusersjimimpfhs dsgmntdsg200geneticsdatajimimpfhs Letrsquos assume that the user created a lot of

programs and data (or a few of them)

pgms

data

B After work the results will go in a central directory maybe in the

dsgmntdsg200geneticsdatafhsanalysisimpute Who is in charge of that directory Is that directory related with my

project What are the permissions of that directory Do I need to change permissions of the directory or files to read only Do I need to notify anybody

Examples drwx------ 2 aldi genetics 4096 Feb 4 2008 mail cd dsgmntdsg200genetics [aldiblade6-3-1 genetics]$ ls -al total 16 drwxrwsr-x 4 aldi genetics 160 Sep 15 1138 drwxr-xr-x 4 root root 128 Oct 14 1138 drwxrwsr-x 12 aldi genetics 312 Oct 15 1200 data -rw-rw-r-- 1 warwick genetics 0 Aug 26 1050 lokihead -rw-rw-r-- 1 warwick genetics 0 Aug 26 1050 lokinohead drwxrwsr-x 16 aldi genetics 1824 Oct 16 1426 users

Line command editors emacs xemacs vi less etc

sas amp Data sas viewer (free software) sas jmp

through samba (windows explorer go to network mercury5 login and passwd dsg1 dsg2 dsg3 dsg4 dsg100 dsg200 dsgweb

Batch work through LSF bsub sas pgmsas proc print data=mydata (obs=10) title ldquomy

datardquo run proc contents data=mydata title ldquocontentsrdquo

run

My SAS output will be called ldquoresultsrdquo Do I need to add anything else in the name of the set

Do I need to keep the output all in one directory Does a colleague has the right to change your

data Do I have the right to delete your data Do I have the right to delete your tmp data

Can I open your data in a sas viewer Can I delete your program run

Assume I am reading text data with sas but they are very large

SAS viewer does not open the created sas data because they have more than 37000 columns What do you do

JMP can go beyond that limit proc print by keeping a few selected columns compare with the original

If my variables are genotypes of this kind 13 22 33 14 11 what do I need to do

to save space in our servers

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname

M1M2M3M4

markname

M1M2M3M4

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

map1 locdes1

genefreq1

ganon1

-------------------------------------- splitsas split data based on a decided number magicNumber= By Aldi Kraja October 3 2008 -------------------------------------- options mprint mlogic symbolgen

macro split(magicNumber=200 chroms=1 chrome=1) do j=ampchroms to ampchrome

libname indampj dsgmntdsg200geneticsdatamydataimputecampj libname outampj dsgmntdsg200geneticsdatamydatasplitcampj data genefreqcampj set indampjgenefreqcampj run data mlgenoampj set indampjmlgenocampj run data _null_ file myscriptsh put cd dsgmntdsg200geneticsdatamydatasplitcampj put find -name sas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampampj run sysexec chmod +x myscriptsh sysexec myscriptsh data _null_ set indampjmapcampj end=eod if eod then call symput(totmtrim(left(_n_))) run let counter=0

do m=1 to amptotm by ampmagicNumber let counter=eval(ampcounter +1) let s=ampm let e=eval(amps + ampmagicNumber -1) if ampm = eval((amptotm ampmagicNumber ) ampmagicNumber +1) then do let stringcompampcounter =quote( amps lt= _n_ lt= amptotm ) end else do let stringcompampcounter =quote( amps lt= _n_ lt= ampe ) end put ampampstringcompampcounter end

do k=1 to ampcounter data outampjmapcampjpampk format markname $12 length markname $12 set indampjmapcampj markname=compress(SNP) if ampampstringcompampk then output run

data _null_ set outampjmapcampjpampk end=eod call symput(m||trim(left(_n_))trim(left(markname))) if eod then call symput(ptotmtrim(left(_n_))) run

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

mapcampj

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

A where is the location of my work mylogindsgclusterdsgwustledu dsgusermylogin Where do I go from here For example dsgmntdsg200geneticsusersjimimpfhs dsgmntdsg200geneticsdatajimimpfhs Letrsquos assume that the user created a lot of

programs and data (or a few of them)

pgms

data

B After work the results will go in a central directory maybe in the

dsgmntdsg200geneticsdatafhsanalysisimpute Who is in charge of that directory Is that directory related with my

project What are the permissions of that directory Do I need to change permissions of the directory or files to read only Do I need to notify anybody

Examples drwx------ 2 aldi genetics 4096 Feb 4 2008 mail cd dsgmntdsg200genetics [aldiblade6-3-1 genetics]$ ls -al total 16 drwxrwsr-x 4 aldi genetics 160 Sep 15 1138 drwxr-xr-x 4 root root 128 Oct 14 1138 drwxrwsr-x 12 aldi genetics 312 Oct 15 1200 data -rw-rw-r-- 1 warwick genetics 0 Aug 26 1050 lokihead -rw-rw-r-- 1 warwick genetics 0 Aug 26 1050 lokinohead drwxrwsr-x 16 aldi genetics 1824 Oct 16 1426 users

Line command editors emacs xemacs vi less etc

sas amp Data sas viewer (free software) sas jmp

through samba (windows explorer go to network mercury5 login and passwd dsg1 dsg2 dsg3 dsg4 dsg100 dsg200 dsgweb

Batch work through LSF bsub sas pgmsas proc print data=mydata (obs=10) title ldquomy

datardquo run proc contents data=mydata title ldquocontentsrdquo

run

My SAS output will be called ldquoresultsrdquo Do I need to add anything else in the name of the set

Do I need to keep the output all in one directory Does a colleague has the right to change your

data Do I have the right to delete your data Do I have the right to delete your tmp data

Can I open your data in a sas viewer Can I delete your program run

Assume I am reading text data with sas but they are very large

SAS viewer does not open the created sas data because they have more than 37000 columns What do you do

JMP can go beyond that limit proc print by keeping a few selected columns compare with the original

If my variables are genotypes of this kind 13 22 33 14 11 what do I need to do

to save space in our servers

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname

M1M2M3M4

markname

M1M2M3M4

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

map1 locdes1

genefreq1

ganon1

-------------------------------------- splitsas split data based on a decided number magicNumber= By Aldi Kraja October 3 2008 -------------------------------------- options mprint mlogic symbolgen

macro split(magicNumber=200 chroms=1 chrome=1) do j=ampchroms to ampchrome

libname indampj dsgmntdsg200geneticsdatamydataimputecampj libname outampj dsgmntdsg200geneticsdatamydatasplitcampj data genefreqcampj set indampjgenefreqcampj run data mlgenoampj set indampjmlgenocampj run data _null_ file myscriptsh put cd dsgmntdsg200geneticsdatamydatasplitcampj put find -name sas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampampj run sysexec chmod +x myscriptsh sysexec myscriptsh data _null_ set indampjmapcampj end=eod if eod then call symput(totmtrim(left(_n_))) run let counter=0

do m=1 to amptotm by ampmagicNumber let counter=eval(ampcounter +1) let s=ampm let e=eval(amps + ampmagicNumber -1) if ampm = eval((amptotm ampmagicNumber ) ampmagicNumber +1) then do let stringcompampcounter =quote( amps lt= _n_ lt= amptotm ) end else do let stringcompampcounter =quote( amps lt= _n_ lt= ampe ) end put ampampstringcompampcounter end

do k=1 to ampcounter data outampjmapcampjpampk format markname $12 length markname $12 set indampjmapcampj markname=compress(SNP) if ampampstringcompampk then output run

data _null_ set outampjmapcampjpampk end=eod call symput(m||trim(left(_n_))trim(left(markname))) if eod then call symput(ptotmtrim(left(_n_))) run

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

mapcampj

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

B After work the results will go in a central directory maybe in the

dsgmntdsg200geneticsdatafhsanalysisimpute Who is in charge of that directory Is that directory related with my

project What are the permissions of that directory Do I need to change permissions of the directory or files to read only Do I need to notify anybody

Examples drwx------ 2 aldi genetics 4096 Feb 4 2008 mail cd dsgmntdsg200genetics [aldiblade6-3-1 genetics]$ ls -al total 16 drwxrwsr-x 4 aldi genetics 160 Sep 15 1138 drwxr-xr-x 4 root root 128 Oct 14 1138 drwxrwsr-x 12 aldi genetics 312 Oct 15 1200 data -rw-rw-r-- 1 warwick genetics 0 Aug 26 1050 lokihead -rw-rw-r-- 1 warwick genetics 0 Aug 26 1050 lokinohead drwxrwsr-x 16 aldi genetics 1824 Oct 16 1426 users

Line command editors emacs xemacs vi less etc

sas amp Data sas viewer (free software) sas jmp

through samba (windows explorer go to network mercury5 login and passwd dsg1 dsg2 dsg3 dsg4 dsg100 dsg200 dsgweb

Batch work through LSF bsub sas pgmsas proc print data=mydata (obs=10) title ldquomy

datardquo run proc contents data=mydata title ldquocontentsrdquo

run

My SAS output will be called ldquoresultsrdquo Do I need to add anything else in the name of the set

Do I need to keep the output all in one directory Does a colleague has the right to change your

data Do I have the right to delete your data Do I have the right to delete your tmp data

Can I open your data in a sas viewer Can I delete your program run

Assume I am reading text data with sas but they are very large

SAS viewer does not open the created sas data because they have more than 37000 columns What do you do

JMP can go beyond that limit proc print by keeping a few selected columns compare with the original

If my variables are genotypes of this kind 13 22 33 14 11 what do I need to do

to save space in our servers

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname

M1M2M3M4

markname

M1M2M3M4

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

map1 locdes1

genefreq1

ganon1

-------------------------------------- splitsas split data based on a decided number magicNumber= By Aldi Kraja October 3 2008 -------------------------------------- options mprint mlogic symbolgen

macro split(magicNumber=200 chroms=1 chrome=1) do j=ampchroms to ampchrome

libname indampj dsgmntdsg200geneticsdatamydataimputecampj libname outampj dsgmntdsg200geneticsdatamydatasplitcampj data genefreqcampj set indampjgenefreqcampj run data mlgenoampj set indampjmlgenocampj run data _null_ file myscriptsh put cd dsgmntdsg200geneticsdatamydatasplitcampj put find -name sas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampampj run sysexec chmod +x myscriptsh sysexec myscriptsh data _null_ set indampjmapcampj end=eod if eod then call symput(totmtrim(left(_n_))) run let counter=0

do m=1 to amptotm by ampmagicNumber let counter=eval(ampcounter +1) let s=ampm let e=eval(amps + ampmagicNumber -1) if ampm = eval((amptotm ampmagicNumber ) ampmagicNumber +1) then do let stringcompampcounter =quote( amps lt= _n_ lt= amptotm ) end else do let stringcompampcounter =quote( amps lt= _n_ lt= ampe ) end put ampampstringcompampcounter end

do k=1 to ampcounter data outampjmapcampjpampk format markname $12 length markname $12 set indampjmapcampj markname=compress(SNP) if ampampstringcompampk then output run

data _null_ set outampjmapcampjpampk end=eod call symput(m||trim(left(_n_))trim(left(markname))) if eod then call symput(ptotmtrim(left(_n_))) run

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

mapcampj

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

Line command editors emacs xemacs vi less etc

sas amp Data sas viewer (free software) sas jmp

through samba (windows explorer go to network mercury5 login and passwd dsg1 dsg2 dsg3 dsg4 dsg100 dsg200 dsgweb

Batch work through LSF bsub sas pgmsas proc print data=mydata (obs=10) title ldquomy

datardquo run proc contents data=mydata title ldquocontentsrdquo

run

My SAS output will be called ldquoresultsrdquo Do I need to add anything else in the name of the set

Do I need to keep the output all in one directory Does a colleague has the right to change your

data Do I have the right to delete your data Do I have the right to delete your tmp data

Can I open your data in a sas viewer Can I delete your program run

Assume I am reading text data with sas but they are very large

SAS viewer does not open the created sas data because they have more than 37000 columns What do you do

JMP can go beyond that limit proc print by keeping a few selected columns compare with the original

If my variables are genotypes of this kind 13 22 33 14 11 what do I need to do

to save space in our servers

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname

M1M2M3M4

markname

M1M2M3M4

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

map1 locdes1

genefreq1

ganon1

-------------------------------------- splitsas split data based on a decided number magicNumber= By Aldi Kraja October 3 2008 -------------------------------------- options mprint mlogic symbolgen

macro split(magicNumber=200 chroms=1 chrome=1) do j=ampchroms to ampchrome

libname indampj dsgmntdsg200geneticsdatamydataimputecampj libname outampj dsgmntdsg200geneticsdatamydatasplitcampj data genefreqcampj set indampjgenefreqcampj run data mlgenoampj set indampjmlgenocampj run data _null_ file myscriptsh put cd dsgmntdsg200geneticsdatamydatasplitcampj put find -name sas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampampj run sysexec chmod +x myscriptsh sysexec myscriptsh data _null_ set indampjmapcampj end=eod if eod then call symput(totmtrim(left(_n_))) run let counter=0

do m=1 to amptotm by ampmagicNumber let counter=eval(ampcounter +1) let s=ampm let e=eval(amps + ampmagicNumber -1) if ampm = eval((amptotm ampmagicNumber ) ampmagicNumber +1) then do let stringcompampcounter =quote( amps lt= _n_ lt= amptotm ) end else do let stringcompampcounter =quote( amps lt= _n_ lt= ampe ) end put ampampstringcompampcounter end

do k=1 to ampcounter data outampjmapcampjpampk format markname $12 length markname $12 set indampjmapcampj markname=compress(SNP) if ampampstringcompampk then output run

data _null_ set outampjmapcampjpampk end=eod call symput(m||trim(left(_n_))trim(left(markname))) if eod then call symput(ptotmtrim(left(_n_))) run

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

mapcampj

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

My SAS output will be called ldquoresultsrdquo Do I need to add anything else in the name of the set

Do I need to keep the output all in one directory Does a colleague has the right to change your

data Do I have the right to delete your data Do I have the right to delete your tmp data

Can I open your data in a sas viewer Can I delete your program run

Assume I am reading text data with sas but they are very large

SAS viewer does not open the created sas data because they have more than 37000 columns What do you do

JMP can go beyond that limit proc print by keeping a few selected columns compare with the original

If my variables are genotypes of this kind 13 22 33 14 11 what do I need to do

to save space in our servers

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname

M1M2M3M4

markname

M1M2M3M4

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

map1 locdes1

genefreq1

ganon1

-------------------------------------- splitsas split data based on a decided number magicNumber= By Aldi Kraja October 3 2008 -------------------------------------- options mprint mlogic symbolgen

macro split(magicNumber=200 chroms=1 chrome=1) do j=ampchroms to ampchrome

libname indampj dsgmntdsg200geneticsdatamydataimputecampj libname outampj dsgmntdsg200geneticsdatamydatasplitcampj data genefreqcampj set indampjgenefreqcampj run data mlgenoampj set indampjmlgenocampj run data _null_ file myscriptsh put cd dsgmntdsg200geneticsdatamydatasplitcampj put find -name sas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampampj run sysexec chmod +x myscriptsh sysexec myscriptsh data _null_ set indampjmapcampj end=eod if eod then call symput(totmtrim(left(_n_))) run let counter=0

do m=1 to amptotm by ampmagicNumber let counter=eval(ampcounter +1) let s=ampm let e=eval(amps + ampmagicNumber -1) if ampm = eval((amptotm ampmagicNumber ) ampmagicNumber +1) then do let stringcompampcounter =quote( amps lt= _n_ lt= amptotm ) end else do let stringcompampcounter =quote( amps lt= _n_ lt= ampe ) end put ampampstringcompampcounter end

do k=1 to ampcounter data outampjmapcampjpampk format markname $12 length markname $12 set indampjmapcampj markname=compress(SNP) if ampampstringcompampk then output run

data _null_ set outampjmapcampjpampk end=eod call symput(m||trim(left(_n_))trim(left(markname))) if eod then call symput(ptotmtrim(left(_n_))) run

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

mapcampj

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

Assume I am reading text data with sas but they are very large

SAS viewer does not open the created sas data because they have more than 37000 columns What do you do

JMP can go beyond that limit proc print by keeping a few selected columns compare with the original

If my variables are genotypes of this kind 13 22 33 14 11 what do I need to do

to save space in our servers

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname

M1M2M3M4

markname

M1M2M3M4

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

map1 locdes1

genefreq1

ganon1

-------------------------------------- splitsas split data based on a decided number magicNumber= By Aldi Kraja October 3 2008 -------------------------------------- options mprint mlogic symbolgen

macro split(magicNumber=200 chroms=1 chrome=1) do j=ampchroms to ampchrome

libname indampj dsgmntdsg200geneticsdatamydataimputecampj libname outampj dsgmntdsg200geneticsdatamydatasplitcampj data genefreqcampj set indampjgenefreqcampj run data mlgenoampj set indampjmlgenocampj run data _null_ file myscriptsh put cd dsgmntdsg200geneticsdatamydatasplitcampj put find -name sas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampampj run sysexec chmod +x myscriptsh sysexec myscriptsh data _null_ set indampjmapcampj end=eod if eod then call symput(totmtrim(left(_n_))) run let counter=0

do m=1 to amptotm by ampmagicNumber let counter=eval(ampcounter +1) let s=ampm let e=eval(amps + ampmagicNumber -1) if ampm = eval((amptotm ampmagicNumber ) ampmagicNumber +1) then do let stringcompampcounter =quote( amps lt= _n_ lt= amptotm ) end else do let stringcompampcounter =quote( amps lt= _n_ lt= ampe ) end put ampampstringcompampcounter end

do k=1 to ampcounter data outampjmapcampjpampk format markname $12 length markname $12 set indampjmapcampj markname=compress(SNP) if ampampstringcompampk then output run

data _null_ set outampjmapcampjpampk end=eod call symput(m||trim(left(_n_))trim(left(markname))) if eod then call symput(ptotmtrim(left(_n_))) run

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

mapcampj

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname

M1M2M3M4

markname

M1M2M3M4

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

map1 locdes1

genefreq1

ganon1

-------------------------------------- splitsas split data based on a decided number magicNumber= By Aldi Kraja October 3 2008 -------------------------------------- options mprint mlogic symbolgen

macro split(magicNumber=200 chroms=1 chrome=1) do j=ampchroms to ampchrome

libname indampj dsgmntdsg200geneticsdatamydataimputecampj libname outampj dsgmntdsg200geneticsdatamydatasplitcampj data genefreqcampj set indampjgenefreqcampj run data mlgenoampj set indampjmlgenocampj run data _null_ file myscriptsh put cd dsgmntdsg200geneticsdatamydatasplitcampj put find -name sas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampampj run sysexec chmod +x myscriptsh sysexec myscriptsh data _null_ set indampjmapcampj end=eod if eod then call symput(totmtrim(left(_n_))) run let counter=0

do m=1 to amptotm by ampmagicNumber let counter=eval(ampcounter +1) let s=ampm let e=eval(amps + ampmagicNumber -1) if ampm = eval((amptotm ampmagicNumber ) ampmagicNumber +1) then do let stringcompampcounter =quote( amps lt= _n_ lt= amptotm ) end else do let stringcompampcounter =quote( amps lt= _n_ lt= ampe ) end put ampampstringcompampcounter end

do k=1 to ampcounter data outampjmapcampjpampk format markname $12 length markname $12 set indampjmapcampj markname=compress(SNP) if ampampstringcompampk then output run

data _null_ set outampjmapcampjpampk end=eod call symput(m||trim(left(_n_))trim(left(markname))) if eod then call symput(ptotmtrim(left(_n_))) run

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

mapcampj

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

-------------------------------------- splitsas split data based on a decided number magicNumber= By Aldi Kraja October 3 2008 -------------------------------------- options mprint mlogic symbolgen

macro split(magicNumber=200 chroms=1 chrome=1) do j=ampchroms to ampchrome

libname indampj dsgmntdsg200geneticsdatamydataimputecampj libname outampj dsgmntdsg200geneticsdatamydatasplitcampj data genefreqcampj set indampjgenefreqcampj run data mlgenoampj set indampjmlgenocampj run data _null_ file myscriptsh put cd dsgmntdsg200geneticsdatamydatasplitcampj put find -name sas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampampj run sysexec chmod +x myscriptsh sysexec myscriptsh data _null_ set indampjmapcampj end=eod if eod then call symput(totmtrim(left(_n_))) run let counter=0

do m=1 to amptotm by ampmagicNumber let counter=eval(ampcounter +1) let s=ampm let e=eval(amps + ampmagicNumber -1) if ampm = eval((amptotm ampmagicNumber ) ampmagicNumber +1) then do let stringcompampcounter =quote( amps lt= _n_ lt= amptotm ) end else do let stringcompampcounter =quote( amps lt= _n_ lt= ampe ) end put ampampstringcompampcounter end

do k=1 to ampcounter data outampjmapcampjpampk format markname $12 length markname $12 set indampjmapcampj markname=compress(SNP) if ampampstringcompampk then output run

data _null_ set outampjmapcampjpampk end=eod call symput(m||trim(left(_n_))trim(left(markname))) if eod then call symput(ptotmtrim(left(_n_))) run

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

mapcampj

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

do m=1 to amptotm by ampmagicNumber let counter=eval(ampcounter +1) let s=ampm let e=eval(amps + ampmagicNumber -1) if ampm = eval((amptotm ampmagicNumber ) ampmagicNumber +1) then do let stringcompampcounter =quote( amps lt= _n_ lt= amptotm ) end else do let stringcompampcounter =quote( amps lt= _n_ lt= ampe ) end put ampampstringcompampcounter end

do k=1 to ampcounter data outampjmapcampjpampk format markname $12 length markname $12 set indampjmapcampj markname=compress(SNP) if ampampstringcompampk then output run

data _null_ set outampjmapcampjpampk end=eod call symput(m||trim(left(_n_))trim(left(markname))) if eod then call symput(ptotmtrim(left(_n_))) run

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

markname chrom pos M1 1 10001 M2 1 10011

M3 1 10015

M4 1 10021 M5 2 20001

mapcampj

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

data outampjmlgenocampjpampk set mlgenocampj (keep=subject do ii=1 to ampptotm ampampmampii end) run

data outampjgenefreqcampjpampk set genefreqcampj do ii=1 to ampptotm if markname =ampampmampii then output end run

end k counter loop

end j chromosome loop

mend split split

markname alele percent

M1 1 100M1 3 900M2 2 250M2 4 750M3 2 010M3 4 999M4 1 010M4 2 999

genefreqcampj

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

Subject M1 M2 M3 M4hellip Mn1 11 24 22 22 hellip 11 2 13 22 24 11 hellip 11

3 33 44 24 22 hellip 11 4 13 24 44 11 hellip 11

hellip

mlgenocampj

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

------------------------------------------------------------ program parallel mixedinterfacesas ------------------------------------------------------------ Purpose run mixed model By Aldi Kraja February 17 2008 ------------------------------------------------------------

macro parallel(v= dsplit= pheno= type= rotation= subject= pedid= fid= mid= sex= dirout= genlabel= markname=)

data _null_ file rmscriptsh put cd ampdirout put find -name mixedsas7bdat | xargs binrm -f put find -name freqsas7bdat | xargs binrm -f put cd dsgmntdsg200geneticsusersmydirsplitcampv run sysexec chmod +x rmscriptsh sysexec rmscriptsh

do y=1 to ampdsplit sysexec rm -f testampysas

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

--------------------------------- let start to program in parallel --------------------------------- data _null_ file testampysas lrecl=36000 put options nofmterr options nomprint nomlogic nosymbolgen put include dsgmntdsg200geneticsusersmydir1macroutilsas put include dsgmntdsg200geneticsusersmydir2mixedsas put libname ph22 dsgmntdsg200geneticsdatamydirpheno put libname trip dsgmntdsg200geneticsdatamydirgeno put libname snpcampv dsgmntdsg200geneticsdatamydirsplitcampv

put ------------- map ------------------ put data cmapampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvmapcampvpampy (where=( chrom=ampv )) put run put sortit(cmapampvpampycmapampvpampy ampmarknamenodupkey)

put ------------- locdes ------------------ put sortit(cmapampvpampyclocdesampvpampyampmarknamenodupkey) put data cmapampvpampy put merge cmapampvpampy (in=in1) clocdesampvpampy (in=in2) put by ampmarkname if in1 and in2 put run put sortit(cmapampvpampy cmapampvpampy ampmarkname)

put ------------- genefreq ------------------ put data genefreqampvpampy put format ampmarkname $12 length ampmarkname $12 put set snpcampvgenefreqcampvpampy put run

put sortit(genefreqampvpampygenefreqampvpampyampmarkname) put data genefreqampvpampy put merge genefreqampvpampy (in=in1) clocdesampvpampy (in=in2 ) cmapampvpampy put by ampmarkname if in1 and in2 run

put ------------- ganon ------------------ put data aganonampvpampy set snpcampvmlgenocampvpampy run

put ------------- triplet ------------------ put data trip set tripphenos (keep=amppedid ampsubject ampfid ampmid ampsex ) run put sortit(triptripampsubject nodupkey )

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

put -------------phenodata------------------ put sortit(ph22fanomissphenodata (keep=ampsubject amppheno )ampsubject) put data phenodata merge phenodata (in=in1) trip (in=in2) by ampsubject put if in1 and in2 put if ampsubject = then do put put ------------------------------------------------------------------- put put This is a note for showing that you have missing ampsubject in the data put put ------------------------------------------------------------------- put put _all_ abort abend put end put run put sortit(phenodataphenodataampsubject ) put sortit(aganonampvpampyaganonampvpampyampsubject ) put data aganonampvpampy merge aganonampvpampy (in=in1) phenodata (in=in2) by ampsubject put if in1 put run put proc datasets lib=work delete phenodata run put ------------- finally set up the macro ------------------ put alleled( put datain=aganonampvpampy ldquo put clocdes=clocdesampvpampy put cmap=cmapampvpampy put dist=position put cgenefreq=genefreqampvpampy put triplet=trip put chrom=chrom put marshvar=cmap put markvar=markname put pedid=amppedid put subject=ampsubject put fid=ampfid put mid=ampmid put sex=ampsex put qualaff= put phenodata=aganonampvpampy put pheno=amppheno put qualtrait= put qualcovars=ampsex put quantcovars= put correctForMean=NO put bymark=500 put programd=MIXED put model=a put time= put genlabel=ampgenlabelampy put barplot=0

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

put whohasmarkers=NO)

run end end of y loop for each split dataset

data _null_ file mixedscriptsh do j=1 to ampdsplit put bsub nohup sas -memsize 1G testampjsas put sleep 5 end put echo Finished the submission for chrom ampv a total of ampdsplit jobs run

X chmod +x mixedscriptsh

sysexec mixedscriptsh

mend parallel parallel(v=22 dsplit=170 pheno=Factor1mleod type=allf1 rotation=varimax subject=subject pedid=pedid fid=dadsubj mid=momsubj sex=sex dirout=dsgmntdsg200geneticsdatamydirresultswcampvamptypeamprot genlabel=f1fhs markname=markname)

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

------------------------------------------------------------ summarizesas Purpose summarize the results of Mixed model ------------------------------------------------------------ global nobs let nobs=0 include dsgmntdsg200geneticsusersmydirmacroutilsas let study=mystudy libname inncbi dsgmntdsg200geneticsdatageneraldirannot macro test(totloop=968 1105 872 816 841 912 717 738 611 693 651 625 521 420 362 357 293 385 186 318 170 170 chroms=1 chrome=22 fact=1 dirout=dsgmntdsg200geneticsdatamydirresultsfinalampstudyfampfact) libname outampstudy ampdirout let count=1 let tloop1=scan(amptotloopampcount) do while(scan(amptotloopampcount) ^= ) let count=eval(ampcount+1) let tloopampcount=scan(amptotloopampcount) end let count=eval(ampcount-1) do k=ampchroms to ampchrome delete what you will append to proc datasets lib=outampstudy delete mixedampstudycampk ampstudycampk run

libname freqampk dsgmntdsg200geneticsdatamydirsplitcampk do f=ampfact to ampfact libname fhsampkfampf dsgmntdsg200geneticsdatamydirresultswcampkallfampfvar

do j=1 to ampamptloopampk isempty(data=fhsampkfampfamixedfampfampstudyampj)

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

if ampnobs =0 then do --------------work on empty set check-------------- data a factor=ampf notempty= empty=ampj output run if ampj=1 then do data outampstudyampstudycampk set a run proc datasets lib=work delete a run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run proc datasets lib=work delete a run end end of loop j is not 1 end ----------------------------- else do the nobs is not 0 ======================= data a factor=ampf notempty=ampj empty= output run -------------------------------------------- capture the frequencies present in the data -------------------------------------------- data b (drop=markname) format markn $12 length markn $12 set fhsampkfampfameanfreqpermarkfampfampstudyampj markn=substr(markname2) markn=compress(markn||XXXXXXXXX) run data b set b (rename=(markn=markname)) run sortit(bb markname model) proc transpose data=b out=c (rename=(COL1=AA COL2=AB COL3=BB)) by markname var freq run -----------work on freq data p set freqampkgenefreqcampkpampj run proc sort data=p by markname percent allele run proc transpose data=p out=q (rename=(COL1=a1 COL2=a2)) by markname var allele run proc transpose data=p out=r (rename=(COL1=MAF COL2=MOA)) by markname var percent run proc sort data=q by markname run proc sort data=r by markname run data s (drop=_NAME_) merge q (in=in1) r (in=in2) by markname if in1 and in2 run sortit(ssmarkname) proc datasets lib=work delete p q r run -------------------------------------------- work on the mixed model output -------------------------------------------- if ampj=1 then do data outampstudyampstudycampk set a run sortit(fhsampkfampfamixedfampfampstudyampjdmarkname)

sortit(fhsampkfampfameanfreqpermarkfampfampstudyampjbmarkname model)

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 and in2 factor=ampf

run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set d run proc datasets lib=work delete a c d s run end loop when j is 1 else do data outampstudyampstudycampk set outampstudyampstudycampk a run

sortit(fhsampkfampfamixedfampfampstudyampjdmarkname) sortit(cc(drop=_NAME_)markname) data d merge c (in=in1) d (in=in2) s by markname if in1 or in2 run data outampstudymixedampstudycampk (drop=model analysis _LABEL_) set outampstudymixedampstudycampk d factor=ampf run proc datasets lib=work delete a c d s run end end of loop j is not 1

end end of condition the nobs is not 0

end end of j loop (splits of the same chromosome) end end of f loop (factors) -------------------------------------------------------------- finally annotate based on the newest version of NCBI dbSNP -------------------------------------------------------------- sortit(inncbincbisnpb128_campkannotampkmarknamenodupkey) sortit(outampstudymixedampstudycampkoutampstudymixedampstudycampkmarkname) data outampstudymixedampstudycampk merge outampstudymixedampstudycampk (in=in1) annotampk (in=in2) by markname if in1 run proc datasets lib=work delete annotampk run end end of k loop (chromosome) mend test test

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

Remove all split data Do NOT remove the original source data gzip unused results (you can gunzip results

at the time you need them again) Multi-threaded systems and parallel

programming are common tools in modern software applications and are used to enhance the scalability and performance of large jobs

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data

The statistical tasks included but were not limited to

1048697 Drafting your analysis objectives 1048697 Creating specifications for analysis datasets 1048697 Creating specifications for tables figures and

listings 1048697 Identifying first look analyses 1048697 Performing data checking 1048697 Parallel programming analysis datasets 1048697 Checking data results and parallel

programming tables figures and listings 1048697 Removal of the un-necessary data