1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and...

1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University of London

Transcript of 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and...

Page 1: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Data Linkage for Educational Research

Royal Statistical Society March 19th 2007

Andrew Jenkins and Rosalind Levačić

Institute of Education, University of London

Page 2: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Examples of Data Linkage

• (1) Data Linkage with the Longitudinal Survey of Young People in England (LSYPE) and the National Pupil Database (NPD)

• (2) Linking NPD to a survey of student experiences at school for evaluation of Diversity Pathfinders

Page 3: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Structure of presentations

Introduce the datasets used

Outline why data linkage was useful/important

How the datasets were combined

Any practical problems which arose in linking data

Methodological issues in using linked data

Page 4: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Main aim of research project (1)

To use data from first wave of Longitudinal Survey of Young People in England (LSYPE), combined with other datasets, to try to separate out effects of family background and neighbourhood on students’ attainment.

Page 5: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Value of Data Linkage

• Richer and more detailed models– e.g. Administrative data may include little

about pupil background

• Better control variables– e.g. controlling for family background factors

when modelling neighbourhood effects on attainment

Page 6: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Datasets to be combined:

Pupil Level: LSYPE


School Level: Edubase

Annual School Census

Neighbourhood Level: Area variables from 2001 Census

Indices of Deprivation

Page 7: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Longitudinal Survey of Young People in England

• Begins at age 14, in 2004

• Annual Interviews until age 25

• Currently only wave 1 data available

• Includes interviews with young person and parent/adult

Page 8: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


LSYPE variables: some examples

.Family Siblings, mother’s education, mother’s

occupation, single parent household, state benefits/ tax credit

Pupil Attitudes to school, homework, future plans,

risk factors e.g. in contact with police, truanting etc...

Parent Expectations for child’s education, helping with homework, family joint activities, parent involvement in school

Page 9: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Overview of National Pupil Database (NPD)

• Information on all state school pupils in England• Includes national test score results• It is longitudinal

– Pupils can be tracked through Key Stages

• NPD includes Pupil Level Annual School Census (PLASC)– PLASC provides pupil background data e.g. ethnicity,


• NPD owned by DfES who manage access to the data

Page 10: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Variables and data

.National Pupil Database

Pupil level variables

Key Stage 3 scores and Key Stage 2 prior attainment in maths, English, science; gender, SEN

Family variables FSM eligibility, ethnicity, EAL

Page 11: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Neighbourhood variables

.Census area variables


Indices of Deprivation


Proportion unemployed Employment deprivation score

Proportion lone parent households

Income deprivation score

Proportion with level 1 or lower qualification

Skills deprivation score

Proportions from various ethnic groups

Children’s educational deprivation score

Page 12: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Linking pupils and schools

• DfES provided us with linked LSYPE/NPD data

• Linkage to school-level data using LEA and Establishment numbers

Page 13: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Combining pupil and neighbourhood data

• National Pupil Database includes pupil postcodes

• Census data and Indices of Deprivation linked to the National Pupil Database using National Statistics Postcode Directory (NSPD, formerly AFPD)

• The NSPD provides a look-up between postcodes and various administrative geography codes

Page 14: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Some problems in using linked data

• Reductions in sample size – NPD has approx 0.5 million cases per year– LSYPE has sample size of around 15,000

• Missing data

• Representativeness of data which does link successfully

• Getting access to linked data

Page 15: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Outcomes of data linkage with LSYPE

N %

Total sample in LSYPE (Wave 1) 15,770 100.0

Did not merge with neighbourhood data

838 5.3

Did not merge with school data 675 4.3

Remaining cases 14,257 90.4

Page 16: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Linking a pupil survey to National Pupil Database: Diversity Pathfinders Project

Purpose of survey: to collect data as part of a 3.5 year evaluation of Diversity Pathfinders (2002-2006).

Six Local Authorities provided with some funding by DfES to promote collaboration between groups of secondary schools with the purpose of raising standard and promoting diversity through attaining specialist status.

Largely a qualitative study using interviews and some participant observation, supplemented by an analysis of examination performance and a survey of students’ views and experiences ‘before’ and ‘after’ three years.

Page 17: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


DP research design

Since DP was ‘pathfinding’ it was not a uniform treatment with controls.

By intention each LA developed its own approach and own way of selecting and grouping schools for collaboration within the DP project.

The research team selected 31 schools as case studies for which evidence collected by interviews.

These schools were also the ones selected for a survey.

Each school selected one mixed ability Year 11 form to respond to the survey on-line.

Page 18: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Purpose of the DP student survey

To establish: how did students rate aspects of their

learning experience? did students in 2005/6 rate their learning

experiences better than those in 2002/3, especially with regard to increased working with students from other schools?did students’ learning experiences differ by school and by student characteristics?did more disadvantaged students have an improved learning experience after 3 years of DP?

Page 19: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Use of National Pupil Database

NPD provides data on student’s

• prior attainment (KS2 and KS3)

• gender

• ethnicity

• special educational needs

• eligibility for free school meals

• English as an additional language

Page 20: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Advantages of data linkage

Obtaining data on student characteristics without needing to ask intrusive questions on the survey or extend length of questionnaire;

Did not need to use alternative of asking the school to supply the data – would add to burden of survey to schools and reduce further the response rate.

Page 21: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Mechanics of achieving data linkage between DP survey and NPD

NPD consists of Pupil Level Annual Census plus test results.

Each pupil has a Unique Pupil Number (UPN) used by the school when reporting data to DfES.

We needed the schools to give us the UPNs of the students in the form doing the survey. Also DoB in case needed for matching.

UPNs are highly confidential – letter from DfES to schools requesting this.

Problem: getting UPNs out of each school. 28 schools responded in 2002/3: only 16 in 2005/6.

Page 22: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


How UPNs used

Each pupil given a DP project identifier number which was attached to a questionnaire.

At school pupil used id number to download own questionnaire.

NPD uses matching pupil reference number.

We sent UPNs to DfES and they matched with pupil reference number and sent us matched NPD data for these students.

Page 23: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Linking UPN, matching pupil reference number and DP survey pupil identity number

(example: not actual data) pupilid UPN PMR104184 J330414491063 CCF850CD35D9B8F0DD104205 B330714491039 CCF850CE37D9BFFEDD104212 F330414491095 CCF850CD30DCBFF9DD104245 N330414491094 CCF850CE31D3BEF1DD104247 F330414561003 CCF850CF31D8BBF8AA104259 F770414491025 CCF850CF30D3BCF1BB104279 A330414541014 CCF850CA3CDEB5ECE104289 A359414491041 CCF850CC33DAB4FBH2104272 D330514491011105057 B330505781077 CCF850CA3CD9B5FDD5105047 G330325791127 CCF850CE34D3J8F1D6105088 A336405791079 CCF850CA3CDCB4G3B3

Page 24: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Methodological issues

Missing data:

from schools that do not supply UPNs

due to non matching of UPNs and PMRs

due to missing data in NPD.

Raises questions about how representative the data are.

Inconsistent data between DP survey and NPD- gender in some cases.

Page 25: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


DP survey: some results using data linkage

School satisfaction construct

Pupil factors which are significant

Quality of teaching Girl: negative KS3 attainment: negative

Quality of school Indian subcontinent ethnicity: negative First language is not English: positive

Perceived teacher support Girl: negative Negative attitudes to

school Prior attainment: inversely associated

School harassment Prior attainment: negatively associated Indian subcontinent: positively associated (only just significant.)

Page 26: 1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.


Advantages of data linkage for DP evaluation

Able to compare students from two waves of the survey

Able to control for pupil characteristics in analysis of questionnaire responses when comparing years or schools.

Able to address research questions on relationship between pupils’ characteristics and experience of school.