Comparing linked maternity data sets to check data quality in SPSS Preeti Datta-Nemdharry, Nirupa...

Post on 17-Dec-2015

217 views 0 download

Tags:

Transcript of Comparing linked maternity data sets to check data quality in SPSS Preeti Datta-Nemdharry, Nirupa...

Comparing linked maternity data sets to check data quality

in SPSS

Preeti Datta-Nemdharry, Nirupa Dattani and Alison Macfarlane

Background (1)Birth registration

• By law, live births must be registered within 42 days of birth

• Information recorded from parents is mainly socio-demographic, such as names, address of residence, occupation of parents, marital status and country of birth

Background (2)

NHS Numbers for Babies (NN4B)

• Central Issuing System introduced in 2002 for issuing NHS numbers at birth for babies born in England, Wales and the Isle of Man

• A small set of data is collected, including gestational age for live births, ethnicity of baby and date and time of birth

Background (3)

Maternity Hospital Episode Statistics (HES)

• Data should be collected for all births occurring in England

• Core admitted patient care record for mother plus ‘maternity tail’ with details of delivery and the baby.

• Core birth record for baby plus ‘baby tail(s)’

Background (4)National Community Child Health database(NCCHD) and Patient Episode Database forWales (PEDW)

• Data collected for all births occurring in Wales

• Information collected on maternity similar to HES

Method

• Link data for 2005 and 2006 for England and Wales

• Phase 1 involving linkage of birth registration data to NN4B data

• Phase 2 involving linkage of registration/NN4B data to Maternity HES for England and Child Health/PEDW databases for Wales

Method cont…Phase 2

• Linkage to maternity HES carried out by Northgate Solutions using algorithm devised by City University

• Key data items for linkage, e.g. NHS no, DOB and unique ID compiled by ONS sent to Northgate solutions for linkage

• Linkage to Child Health and PEDW databases carried out by NHS Wales Informatics Service using the same algorithm

After the linkage was done…

• HES records, linked to registration/NN4B data, had multiple records for the same mother for each episode.

• So needed to omit the duplicates by keeping records with most information.

• Ensure one-to-one linkage to registration/NN4B

Identifying duplicates, triplicates..• GET• FILE='C:\Users\trial\Desktop\exampleHES.sav'.• Dataset name DataSet1 Window=Front.

• * Identify Duplicate Cases after sorting by id and within id by epikeys.

• Dataset activate Dataset1.• Sort cases by id(D) epikeys(D). /* sorts the cases first by id(D)

and then by epikeys(D)*/.• compute flag=1. /*computes a variable called flag with

default value of 1 */.• if id=lag(id) flag=0. /*replaces any initial ‘1’ value to 0 if id =

the same id in the row before*/.• exe.

id and epikey sorted –

descending

1.00 allocated to the highest epikey per id

Creating a file with only one id per row…

• *Create wodups - without duplicates dataset. • Dataset Activate dataset1. /*exampleHES dataset is

the active dataset */.• Dataset copy wodups.• Select if (flag=1). /*selecting the record with the

most information ie the highest epikey*/.• Exe.

Merge with exampleNN4BREG data• *merge exampleHES with exampleNN4BREG.• *first sort the key variable e.g. id.• *main dataset.• Dataset activate wodups. • Sort cases by id(A). /*make sure the cases are sorted in both the

datasets */.

• *dataset to be merged.• Dataset Activate NN4BREG.• Sort cases by id(A).

• *merging. • Match files file=wodups.• /file=NN4BREG• /by id.• Exe.

Data quality checks

• Quality of maternity HES based on completeness and consistency of the HES data in relation to birth registration data where ever possible

• NN4B data used to validate maternity HES where information not available from registration.

Missing data

• *Missing data - for string variables eg NHS No.• Dataset activate wodups.• missing values NHSnoHES (" ").• freq var = NHSnoHES/format=notable. • /*gives only the total numbers */.

• *OR.• compute var1 = (length(rtrim(NHSnoHES)) = 0). • execute.• desc var = var1• /statistics = sum.

• *Missing data - for dates, after checking formats.• freq var=dobHES/format=notable.

• *Missing data for numeric variables e.g. birthweight.• Freq var=birthweightHES/format=notable

• *OR.• Compute noBWT=missing(birthweightHES). /*codes 1 as

missing */.• Exe.

Cross checking dates…• *Cross checking baby's dob • *1) Formatting dates.• *if one date is string - reformat to date.• Compute datevar2=Number(dobReg,ADATE10). /*converting date in string eg

01/01/2005 into date format*/.• Formats datevar2 (ADATE10).• Execute.

• *if both are in date format but need to reformat into eg yyyy/mm/dd.• formats dobHES (sDate10). /*other way around ie mm/dd/yyyy - (aDate10)

*/.• execute.

• *2) cross checking dates.• compute equal=dobHES=dobReg. /*gives value of 1 =same dates and 0 =

dates differ*/.• Execute.• freq var=equal/format=notable. /* shows how many are equal*/.

Birthweight• *cross checking birthweight between two datasets.• *one way- create another variable which will give value of 0 if not equal

and 1 if equal.• DATASET ACTIVATE wodups.• Compute birthweight3=birthweightHES=• birthweightReg.• Execute.• Freq var birthweight3.

• *OR group birthweight into categories and see how many cases fall into each category.• *recoding birthweight data for HES.• Recode birthweightHES (0=0) (9998=0) (MISSING=0) (1 thru 499=1) (500 thru 999=2) (1000 thru 1499=3) (1500 thru

1999=4) (2000 thru 2499=5) (2500 thru• 2999=6) (3000 thru 3499=7) (3500 thru 3999=8) (4000 thru 4499=9) (4500 thru 4999=10) (5000 thru 5499=11) (5500 thru• Highest=12) INTO BWTgroupHES.• Var labels BWTgroupHES 'BWTgroupHES'.• Exe.

• *recoding birthweight data for registration.• Recode birthweightReg (0=0) (9998=0) (MISSING=0) (1 thru 499=1) (500 thru 999=2) (1000 thru 1499=3) (1500 thru

1999=4) (2000 thru 2499=5) (2500 thru• 2999=6) (3000 thru 3499=7) (3500 thru 3999=8) (4000 thru 4499=9) (4500 thru 4999=10) (5000 thru 5499=11) (5500 thru• Highest=12) INTO BWTgroupReg.• Var labels BWTgroupReg 'BWTgroupReg'.• Exe.

• Crosstabs• /tables=birthweightHES BY birthweightReg• /format=avalue tables• /cells=count /*row column-If want row percentage or column percentage */.• /count round cell.

Gestational age• *recoding gestational age data.• Recode gestNN4B (0=0) (missing=0) (1 thru 21=1) (44 thru Highest=2)

(Else=Copy) into GestGroupNN4B.• Var Labels GestGroupNN4B 'GestGroupNN4B'.• Execute.

• Recode gestHES (0=0) (missing=0) (1 thru 21=1) (44 thru Highest=2) (else=Copy) into GestGroupHES.

• Var labels GestGroupHES 'GestGroupHES'.• Execute.

• Crosstabs• /tables=GestGroupHES BY GestGroupNN4B• /format=avalue tables• /cells=count row column total• /count round cell.

Ethnicity• *Recoding ethnicity.• Recode ethnicNN4B ('A'=1) ('B'=1) ('C'=1) ('D'=9) ('E'=9) ('F'=9) ('G'=9)

('H'=2) ('J'=3) ('K'=4) ('L'=9) ('M'=6) ('N'=5) ('P'=7) ('R'=• 8) ('S'=9) ('Z'=10) (missing=10) into ethnicgroupNN4B.• Var labels ethnicgroupNN4B 'ethnicgroupNN4B'.• Execute.

• Recode ethnicHES ('A'=1) ('B'=1) ('C'=1) ('D'=9) ('E'=9) ('F'=9) ('G'=9) ('H'=2) ('J'=3) ('K'=4) ('L'=9) ('M'=6) ('N'=5) ('P'=7) ('R'=

• 8) ('S'=9) ('Z'=10) (missing=10) into ethnicgroupHES.• Var labels ethnicgroupHES 'ethnicgroupHES'.• Execute.

• *also rename the variable values into the relevant ethnic group.

Results

91% of maternity HES delivery records could be linked to the birth registration/NN4B records

Linked records for singleton births with missing data items in common data fields, 2005

NN4B Registration Maternity HES

Number Percent Number Percent Number Percent

Mother NHS No

164,458 30 NA NA 16,685 3

Mother’s DOB

960 0.2 0 0 0 0

Ethnicity 59,865 11 NA NA 77,771 14

Gestation 3,829 1 NA NA 264,877 48

Birth-weight 2,721 1 874 0.2 135,144 25

Birth status 615 0.1 0 0 176,455 32

Sex baby 1,098 0.2 0 0 144,115 26

Comparison of sex for singletons in the linked records, 2005

Maternity HES*

Birth registration

Male Female Total Percentage

Male 204,613 791 205,404 51

Female 2,814 196,524 199,338 49

Total 207,427 197,315 404,742 100

Concordance in data items between NN4B and maternity HES, 2005

Stated Missing Concordance where stated

Percentage

Birthweight* 75 25 99

Gestational age 52 48 89

Ethnicity 81 19 87

* using birth registration rather than NN4B

Conclusion

• Good linkage rate was obtained• To gain maximum benefit, data quality and

completeness needs to improve in maternity HES

• SPSS is useful in data quality checks.