Data Management: Procedures and Principles

22
Data Management: Procedures and Principles Elizabeth Garre-Mayer, PhD February 26, 2013

description

Data Management: Procedures and Principles. Elizabeth Garrett-Mayer, PhD February 26, 2013. Goals of data collection and management. Statisticians work with other team members to help establish databases Often simple excel spreadsheets Logics: statistician logic ≠ basic scientist logic - PowerPoint PPT Presentation

Transcript of Data Management: Procedures and Principles

Data Management: Procedures and Principles

Data Management:Procedures and PrinciplesElizabeth Garrett-Mayer, PhDFebruary 26, 2013Goals of data collection and managementStatisticians work with other team members to help establish databasesOften simple excel spreadsheetsLogics: statistician logic basic scientist logicstatistician logic clinical scientist logicDo your best to get involved BEFORE the data is entered!

Best examples are bad examplesDOB9/239/269/3010/310/610/810/1310/1610/2028.028.529.029.530.030.531.031.532.0Animal #ear taggenotypegene #gland459+/+2Lpdef+/+ neu+23/11/08130158368170711501596146336405854721312120316717177total301122312661179246227993311no.122222312/612/912/1212/1612/1912/2312/2612/301/324.52525.52626.52727.52828.5998+/+Npdef+/+ neu+26/17/0811512774685257479587936591397261512774685257479587936591397999+/+Lpdef+/+ neu+26/17/082 0000000096376Fixed-ishIDtypegenedobglandiddatevolumeage1527pdef+/+ neu+29/3/0822/27/200941025.2861527pdef+/+ neu+29/3/0823/4/2009697261527pdef+/+ neu+29/3/0832/13/200922223.2861527pdef+/+ neu+29/3/0832/20/200978424.2861527pdef+/+ neu+29/3/0832/27/200961525.2861527pdef+/+ neu+29/3/0833/4/2009761261527pdef+/+ neu+29/3/0872/7/200915222.4291527pdef+/+ neu+29/3/0872/13/200919023.2861527pdef+/+ neu+29/3/0872/20/200956124.2861527pdef+/+ neu+29/3/0872/27/200977525.2861527pdef+/+ neu+29/3/0873/4/2009711261528pdef+/+ neu+29/3/08110-Feb17122.8571528pdef+/+ neu+29/3/0812/17/200924323.8571528pdef+/+ neu+29/3/0812/27/200937625.2861528pdef+/+ neu+29/3/0813/4/2009490261528pdef+/+ neu+29/3/0822/3/200911021.8571528pdef+/+ neu+29/3/0822/10/200923322.8571528pdef+/+ neu+29/3/0822/17/200940823.8571528pdef+/+ neu+29/3/0822/27/200957925.2861528pdef+/+ neu+29/3/0823/4/2009982261528pdef+/+ neu+29/3/0872/3/200916021.8571528pdef+/+ neu+29/3/0872/27/200911925.2861528pdef+/+ neu+29/3/0873/4/2009307261528pdef+/+ neu+29/3/0882/3/200917121.8571528pdef+/+ neu+29/3/0882/10/200943722.857Principle 1: long formatIn general, grow datasets long not wideLong data can be reshaped to wide if neededEach row represents a unit of analysis.Patient? mouse?observation on tumor for a mouse?Think of repeated measures data: longitudinal

groupidday5day8day13day16day19day22day25day28day31166051R000111.7509385.0947650.22751951.2053236.3523869.84166052L0000081.52639438.1991766.2321034.28166053B0000068.4253.44831.3241113.6166054N0036.67601382.3891396.275457.504737.9551034.281317.904165971R000103.6301175.5246.4346544.6052819.21270.72165972L000175.6789275.184501.76604.8784.081203.2165973B000012.9632.912146.1499531.512872.2165974N000197.0345309.9802330515.188850.1761177.6165975DL0000081.52639438.1991696.961203.2265931R032.96963107.5748335.5935526.848977.70241285.4012140.1292783.375265932L000059.4292.922568.9121039.681615.884265933B000037.5171.462484.416823.21488.35265934N256.515325.6857504.6655.36842.81180.981668.62535.1233499.776265935DR0000100.82297.724638.10412802308.883266011R00213.86510.0894707.2962183.8084277.168690.0747930.81266012L00040.4568598312.5995629.1456820.42251276.506266013B000047.04275.7573356.51137.9341792.694iddaysgrouptsize152018232.969631132107.57481162335.59351192526.8481222977.702412521285.40112822140.12913122783.375252028202132021620219259.42222292.9222252568.91222821039.6823121615.88435203820Wide format Long format

Principle 2: numeric codesStudyIDSEXAGERACESurg Path Specimen UsedFinal Path DiagnosisEXP CensorPrimary CODPrimary COD CodeSecondary CODSecondary COD Code1F51WS45-1764, 11/18/1998AML02M60WS76-6965, 03/12/1999AML with MLD1Relapse1Sepsis13F42WS34-13589, 04/28/1999RAEB-2*1Tx Related2GVHD / Septicemia ( aspergillous1 and 34F59WS67-10420, 03/01/1999RAEB-21Tx Related2FTE45M32WS23-7186, 03/01/1999AML/MDS06M50WS09-15708, 5/101/99CMML07F63WNO BM PRIOR TO BMTNA08F50WS145-20523, 5/1/00RAEB-209M53WS87-43149, 09/12/2000AML wMLD*1Relapse110M57WS09-38696, 8/1/00MPD/MDS-U*1Relapse111F63WS56-47232, 10/03/2000AML/MDS1Tx Related2Graft Failure/Sepsis1 and 412M55WS23-47159, 10/01/2000RAEB-11Relapse113F52WS12-60174, 12/1/2000AML*014M30WS90-4988, 01/29/2001RCMD1Relapse1Infection115F57WS58-62446, 12/1/2000RCMD016M55WS45-11389, 3/1/2001RAEB-11Multi-organ Failure3Liver and Resp. Failure217M65BS378-8738, 02/1/2001RCMD018F63WS854-11103, 03/01/2001RAEB-11Relapse119M59WS43-26265, 05/21/2001MDS-U*1Tx Related2FTE/Infection1 and 420M61WS90-41961 ,8/1/2001RCMD021F63WS26-50236, 10/01/2001RAEB-1022M53WS49-60634, 11/01/2001RCMD*023M64WS78-63086, 12/01/2001AML wMLD1Relapse1MSOF/Fungal Sinusisit1 and 224M45WS56-3687, 01/01/2002AML from underlying CMML1Unknown3Were any AEs observed during during this period?AE NameGradeTreatment RelationEvent StatusYesAbdomen Distention/Ascites1Not RelatedNewYesAcne Rash (face, shoulders, chest)2ProbableNewYesAcne Rash (face, shoulders, chest)1ProbableOngoing without changeYesAcne Rash (head, arms, chest, legs)3ProbableNewYesAcneform Rash Face/Chest1ProbableOngoing without changeYesAcneform Rash on Face1ProbableOngoing without changeYesAcneform rash to face and chest2ProbableNewYesAcneform Rash-Face1ProbableNewYesAcneform Rash-Face1ProbableOngoing without changeYesAcniform Rash2DefiniteNewYesAcniform Rash to face3DefiniteOngoing with change in gradeYesacute coronary syndrome3UnlikelyNewYesAcute Renal FailureNewYesAlkaline Phosphatase1PossibleOngoing without changeYesAlkaline Phosphatase1PossibleOngoing without changeYesAlkaline Phosphatase1PossibleNewYesAlkaline Phosphatase1PossibleOngoing without changeYesAlkaline Phosphatase1PossibleOngoing without changeYesAlkaline Phosphatase1PossibleOngoing without changeYesAlkaline Phosphatase1PossibleOngoing without changeYesAlkaline Phosphatase1UnlikelyNewYesalkaline phosphatase2UnlikelyNewYesAlkaline Phosphatase1PossibleNewYesalkaline phosphatase2UnlikelyOngoing with change in gradeYesalkaline phosphatase2UnlikelyNewAE Type Grade 0 Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 TotalDIARRHEA 0 25 12 5 0 0 42 FATIGUE 0 17 19 6 0 0 42 PAIN 0 8 22 4 0 0 34 RASH 0 9 16 5 0 0 30 NAUSEA 0 16 9 2 0 0 27 ANOREXIA 0 10 15 0 0 0 25 DRY SKIN 0 15 9 0 0 0 24 WEIGHT LOSS 0 18 5 0 0 0 23 ALKALIINE PHOSPATASE 0 9 7 4 0 0 20 VOMITING 0 10 9 1 0 0 20 HYPERTENSION 0 13 4 2 0 0 19 BILIRUBIN 0 9 8 2 0 0 19 AST 0 6 5 6 0 0 17 PRURITIS 0 12 3 0 0 0 15 WEAKNESS 0 5 8 1 0 0 15 THROMBOCYTOPENIA 0 9 4 1 0 0 14 TASTE CHANGE 0 11 3 0 0 0 14 ALT 0 9 3 1 0 0 13 ANEMIA 0 10 0 2 0 0 12 CHILLS 1 9 1 1 0 0 12 PROTEINURIA 0 8 3 1 0 0 12 PLATELETS 0 7 5 0 0 0 12 FEVER 1 9 0 1 0 0 11 All 1181 AEs were tabulated AFTER combining categories of AEs.Principle 3: Be involved in data collection toolsQuantitative vs. qualitativeAvoid open-ended options no fill in the blankbe comprehensive in optionsAllow Other in case you have not considered all optionsConsider dont know and other missing codes (e.g., not applicable) to distinguish true missing from refused or DK.Principle 3: Be involved in data collection toolsBasic science, too.Provide a template for how the data should be entered. And NOT like this one!

Figure 5F PBMC EOMES/TBET RatioHealthyVitiligo0.4013890.2391730.3668450.3111640.5095240.3vitiligo pbmcEOMESTBETEOMES TBET RATIOT CELLS_0634 TBET.fcs9.8341.10.239173T CELLS_0640 TBET.fcs13.142.10.311164T CELLS_0939 TBET.fcs12.9430.3healthy pbmcT CELLS_5079 TBET.fcs5.7814.40.401389T CELLS_50784 TBET.fcs6.8618.70.366845T CELLS_50891 TBET.fcs10.7210.509524Principle 4: consider varianceIf there is no variance across your sample, you cannot learn anythingException is inclusion/exclusion criteria: you should have no variance!Example: incomewhen querying incoming, it is almost always categorical.Depending on your population of interest, which is more appropriate? Household income:$100K