Making sense of large-scale evaluation data: The case of the Andhra Pradesh primary education...

16
Pergamon hit. J, Educa6onal I)e~eloprnent, Vol 16, No. 2, pp. 125-1,-11), 1996 Copyright © 1996. Elsevier Science Ltd Printed in Great Britain. All rights reserved 0738-0593/96 $15.(10 + II.l~l 0738-0593(95)00002-X MAKING SENSE OF LARGE-SCALE EVALUATION DATA: THE CASE OF THE ANDHRA PRADESH PRIMARY EDUCATION PROJECT BARRY COOPER, COLIN LACEY and HARRY TORRANCE Education Development Building, University of Sussex, Falmer, t-~ast Sussex, BNI 9RG, U.K. Abstract -- This article discusses problems of interpreting and analysing large-scale ewduation data with respect to a particular education project in a developing country context. An account is given of how an heuristic model of project implementation, impact and evaluation was generated. Specific findings related to aspects of the model are presented which illustrate both the efficacy of the model and some of the achievements of the project to date. INTRODUCTION "['he Andhra Pradesh Primary Education Pro- ject (APPEP) is attempting to introduce a range of 'active' primary school teaching meth- ods into all the primary schools in the Indian State of Andhra Pradesh. It aims to accomplish this by including every primary school teacher in the state in a rolling programme of in-service training and materials distribution to schools. This will involve a total of approximately 50,000 ~.chools and 170,000 teachers over the period 1989-1996, The Project involves collaboration between the U.K. Overseas Development Ad- ministration (ODA), the Government of India (GoI), the Government of Andhra Pradesh (GAP) and the British Council which is re- sponsible for the field management of the Project in India. It is funded by the ODA at an overall cost of c.£35 million over the 7-year period. This figure also includes money for a school building programme. The cascade programme of in-service training is currently running at around £1.2 million per annum. The authors are engaged as consultants to the Project's evaluation and in a recent initial paper described the rationale for the design of the evaluation and gave some examples of the sorts of problems that the evaluation has had to overcome (Lacey et al., 1993). The paper drew attention to the ambition of the Project and the wide range of (often very politically powerful) stakeholders and audiences who are interested in the Project's progress. It discussed issues of design and choice of methods and highlighted the problems of gathering valid and reliable data on a large scale in a developing country context. The paper then went on to describe some of the strategies developed for countering these problems, including the use of a wide range of data collection techniques. There has been growing discussion within India of the range of possible instruments which could be used in evaluative research in the context of planning for the evaluation of the forthcoming District Primary Education Programme (Jangira, 1994; NCERT, 1994; Government of India, n.d.). However, these have not yet been put into practice. Very re- cently, an evaluation of a school improvement project in Bombay has employed a wide range of data collection techniques (Black et al., 1993). However, the latter project was based in just two atypical schools (fee-paying, large urban schools serving the Moslem minority). To our knowledge, no previous evaluation study in India has attempted to employ a range of techniques over a number of years on the scale of the APPEP evaluation. Thus, a key issue addressed in our earlier paper was the developing relationship between the consultants and the Indian-based Evaluation Cell and the concomitant need for the evalu- ation design 'to be conceived of as a practical induction into evaluation issues.and methods as well as a substantive task wtiich had to be accomplished in its own right' (Lacey et al., 1993, p. 542). In order to allow the triangulation of data and the development of local expertise the APPEP evaluation is gathering data by the use of eight different questionnaires (see Appendix 125

Transcript of Making sense of large-scale evaluation data: The case of the Andhra Pradesh primary education...

Pergamon hit. J, Educa6onal I)e~eloprnent, Vol 16, No. 2, pp. 125-1,-11), 1996

Copyright © 1996. Elsevier Science Ltd Printed in Great Britain. All rights reserved

0738-0593/96 $15.(10 + II.l~l

0738-0593(95)00002-X

M A K I N G S E N S E O F L A R G E - S C A L E E V A L U A T I O N D A T A : T H E C A S E

O F T H E A N D H R A P R A D E S H P R I M A R Y E D U C A T I O N P R O J E C T

B A R R Y COOPER, COLIN LACEY and H A R R Y T O R R A N C E

Education Development Building, University of Sussex, Falmer, t-~ast Sussex, BNI 9RG, U.K.

Abstract - - This article discusses problems of interpreting and analysing large-scale ewduation data with respect to a particular education project in a developing country context. An account is given of how an heuristic model of project implementation, impact and evaluation was generated. Specific findings related to aspects of the model are presented which illustrate both the efficacy of the model and some of the achievements of the project to date.

I N T R O D U C T I O N

"['he Andhra Pradesh Primary Education Pro- ject (APPEP) is attempting to introduce a range of 'active' primary school teaching meth- ods into all the primary schools in the Indian State of Andhra Pradesh. It aims to accomplish this by including every primary school teacher in the state in a rolling programme of in-service training and materials distribution to schools. This will involve a total of approximately 50,000 ~.chools and 170,000 teachers over the period 1989-1996, The Project involves collaboration between the U.K. Overseas Development Ad- ministration (ODA), the Government of India (GoI), the Government of Andhra Pradesh (GAP) and the British Council which is re- sponsible for the field management of the Project in India. It is funded by the O D A at an overall cost of c.£35 million over the 7-year period. This figure also includes money for a school building programme. The cascade programme of in-service training is currently running at around £1.2 million per annum.

The authors are engaged as consultants to the Project's evaluation and in a recent initial paper described the rationale for the design of the evaluation and gave some examples of the sorts of problems that the evaluation has had to overcome (Lacey et al . , 1993). The paper drew attention to the ambition of the Project and the wide range of (often very politically powerful) stakeholders and audiences who are interested in the Project's progress. It discussed issues of design and choice of methods and highlighted the problems of gathering valid and reliable

data on a large scale in a developing country context. The paper then went on to describe some of the strategies developed for countering these problems, including the use of a wide range of data collection techniques.

There has been growing discussion within India of the range of possible instruments which could be used in evaluative research in the context of planning for the evaluation of the forthcoming District Primary Education Programme (Jangira, 1994; NCERT, 1994; Government of India, n.d.). However, these have not yet been put into practice. Very re- cently, an evaluation of a school improvement project in Bombay has employed a wide range of data collection techniques (Black et al . , 1993). However, the latter project was based in just two atypical schools (fee-paying, large urban schools serving the Moslem minority). To our knowledge, no previous evaluation study in India has attempted to employ a range of techniques over a number of years on the scale of the APPEP evaluation. Thus, a key issue addressed in our earlier paper was the developing relationship between the consultants and the Indian-based Evaluation Cell and the concomitant need for the evalu- ation design 'to be conceived of as a practical induction into evaluation issues.and methods as well as a substantive task wtiich had to be accomplished in its own right' (Lacey et al . , 1993, p. 542).

In order to allow the triangulation of data and the development of local expertise the APPEP evaluation is gathering data by the use of eight different questionnaires (see Appendix

125

126 BARRY COOPER et al.

1), interview schedules and observation sched- ules, which are completed by headteachers, teachers and visiting fieldworkers whose time is made available to the evaluation from their other duties as teacher-training lecturers in District Institutes of Education and Training (DIETs). Some of these fieldworkers (DIET lecturers) are also conducting case studies of the programme-in-action in order to gather more qualitative, longitudinal data on the Pro- ject's impact on individual schools.

This second paper takes our reporting of the evaluation and the consultancy process a stage further and focuses on the problems encountered in interpreting and making sense of large-scale data in such a context. Problems of rigour in design now overlap with those of rigour in analysis, but also involve questions of judgement when it comes to making some sort of overall interpretative sense of an immense amount of data. This element of judgement was initially quite difficult to communicate and led to the need to locate potentially fragmented findings in a very fully articulated overall explanatory framework. This paper now goes on to describe the explanatory model which was developed and give examples of the sorts of data which could then be used to explore some of the initial impact of the Project.

Meaning in analysis - - the deve lopment o f an heuristic mode l

The model began to emerge out of our initial concerns to design an evaluation which could identify and analyse the impact of the Project over time, and out of discussions with the Evaluation Cell (comprising several seconded teacher educators and statisticians) about the nature of such an evaluation and the likely pat- tern of events as implementation proceeded. Thus, for example, we assumed that no training should result in no implementation, no imple- mentation in no impact, etc. (Lacey et al., 1993, p. 546). Continuing discussions of such issues with the Evaluation Cell led us to make much more explicit the sorts of implicit assumptions which experienced researchers share and all too often take completely for granted. It became clear that there was a need to unpack the interpretative process and state very clearly the need for interpreting results rather than simply presenting them as a series of discrete findings. At the same time we were in regular contact with politicians and administrators from GoI,

GAP and the U.K. who sometimes presented very stark views on what the Project should be accomplishing and what would be consid- ered as success and failure. These views often changed as the next political imperative crossed their desks (cf. Dyer's account of Operation Blackboard, 1994) but usually boiled down to the assumption that change could be achieved quickly and that the Project's inputs on train- ing, teaching materials, etc. should translate straightforwardly into higher enrolment, less drop-out, improved tests scores, etc. Thus, the need to present a more complex model of possible implementation and impact coincided with the need to be as explicit as possible about the process of interpretation and to simplify and make sense of the large number of complicated analyses being produced following the first main survey conducted in late 1991.

Main Survey 1 (MS1) had been designed to enable the effects of the project to be measured by comparing the results of the survey of a stratified sample of trained schools with a comparable sample of untrained schools. It followed that, at every level of reporting, the argument for claiming an effect associated with the project depended on a compara- tive statement. The prospect of reporting the outcome of the project through the medium of hundreds of comparative statements, some showing strongly 'favourable' outcomes, some showing unexpectedly weak or 'unfavourable' outcomes, was not a realistic one. In addition, it became very clear that each outcome would need to be associated with some explanatory account making judgements about the individ- ual and cumulative importance of the finding. As early drafts of the Main Survey 1 report began to emerge from the Evaluation Cell it became clear that in order to fulfil our brief the model would have to provide a very clear framework within which the data could be interpreted. It would need to be more complex and theoretically secure than the simple cause-effect models pressured on to the project from some outside interest groups but still capable of simplifying the complex analysis that was emerging from the large and rich data base.

In order to accomplish these various tasks our model needed to present the expected progress of the project in a way that would be understood by and gain acceptance from a wide range of potential audiences. Along-

ANDHRA PRADESH PRIMARY EDUCATION PROJECT 127

side the discussions with the Evaluation Cell already referred to, the information base for the model included the evaluation literature, the professional experience of the team and the theory/assumptions on which the project had been designed. In addition, to be useful, the model would have to be associated with the construction of summary measures and indi- ~zators that would bring together complicated ranges of items relevant to particular stages of the model. We felt that if we could accomplish these tasks the model would be a valuable heuristic device which could serve as a source for structured description and a testbed that would either be confirmed as an accurate rep- resentation of the processes of implementation or, if not, would be changed as the analysis proceeded. The outcome of our discussions is presented in Fig. 1.

The model arranges the inputs and hoped-for outcomes in a sequence which represents a prediction of the order in which events are likely to happen. For instance, while it is clear that implementation must occur before direct effects like changed classroom practice, it does not follow that improved classroom practice in line with the APPEP principles will occur. Implementation is a necessary but not sufficient condition. This was an important message to

get across to some audiences. In addition, by predicting three orders of effect, the model injected a time dimension into the consid- eration of effects. We hoped that the model would replace ad hoc explanations and rhetoric with a logical framework of possible results and their possible significance in a sequence of events. By making the prediction public we were attempting to focus the discussion so that emerging evidence could be interpreted within a shared framework of understanding of how the Project might develop in practice. By predicting, for example, that if greater pupil enjoyment of school occurred within APPEP schools it would pre-date any second order outcome like less 'absenteeism" or 'parent awareness and satisfaction' we were testing an externally imposed logical sequence which can be set out as in Fig. 2.

The design of the APPEP scheme assumes that if parents become aware of this new level of interest and enthusiasm on the part of their children then they will be more satisfied with the state-provided education and will ensure that their children attend more regularly and do not drop out as soon as they otherwise might. The model as designed would allow us to test this built-in assumption. We already had evidence from some early case study work by

Inputs (Implementation by the Project). API~P training materials, professional support. New buildings.

First Order Outcomes. Better student learning, motivation and enjoyment.

Direct Effects (Implementation in the classroom). Improved classroom practice and teacher motivation. More "activities" and practical learning. (Better, less crowded classrooms.)

I Second Order Outcomes. ) Less absenteeism; broader student

performance. Parental awareness and satisfaction.

t

Third Order Outcomes. [ Less drop-out_ More enrolment. BeLier student performance.

Fig. 1. The heuristic model.

128 BARRY COOPER et al.

I implementation of

I 1

Students take home news of newmethods, and evidence of the newwork and new interests, and communicate their enjoyment.

Student experience of newmethods,

greater variety, more interest (less

boredom), and therefore more enjoyment of school.

Parents become aware of the ~. [ new situation, consider the

[ evidence, and adjust their I levels of satisfaction.

Fig. 2. Implementation, enjoyment and parental awareness.

D IE T lecturers that there might be powerful forces working in the other direction. For example, some case study work reported that some parents were perturbed by pupils 'play- ing' and 'enjoying' themselves at school. These parents interpreted this as not 'proper ' school- ing. In addition, some teachers were reported as fearing this reaction from some parents. Finally there was increasing evidence that new private English medium schools which had been growing in size and number in towns were beginning to spread to villages. These schools included in their titles and requirements all the superficial trappings of privilege and status e.g. 'private' , 'English medium', 'convent school' (and sometimes combinations of all three). If fairly poor villagers were prepared to pay R30s per term for these kinds of 'advantages' and, in addition, put their children into an expensive school uniform there was at least a possibility that APPEP was attempting to cause change by pulling the wrong social levers. The model would enable us to test this by tracing how far the innovation was being implemented and whether these postulated effects were reaching to first-, second- or third-order outcomes.

The construction of the model also revealed some potential shortcomings in the design logic of the APPEP innovation. The model represents this logic as a series of sequential steps. It therefore gives the impression that an

improvement in one element could without any complication feed into the educational process and produce better outcomes downstream. An improvement in 'direct effects' would therefore produce better first-order, second-order and third-order outcomes (bearing in mind the necessary but not sufficient condition, outlined earlier). However, a short consideration of possible outcomes reveals potentially complex interactions in which positive effects in some factors could have negative effects on others. For example, a sequence of effects causing an increase in enrolment might well produce larger classes of pupils and overcrowding in classrooms thus making it more difficult for teachers to sustain the innovation. Likewise, an increase in the number of 'marginal '1 pupils attending and staying on at school, and perhaps suffering from more crowding and less effective teaching as a direct result, could easily cause a decrease in average test scores recorded at the end of the year. These complex interactions and potentially negative feedback loops can only be noted at this stage. They represent possible outcomes that have to await further analysis of the data before light can be thrown on them. They are, however, a frequent feature of large-scale reform projects (Lockheed and Verspoor, 1991).

The final warning that emerged from our discussion of the model and the data analysis

ANDHRA PRADESH PRIMARY EDUCATION PROJECT 129

concerned the nature of the data and the levels of accuracy (margins of error) that we could hope to achieve. It was clear that some of the processes generating the system characteristics that we were attempting to measure were extremely complex and that the measurement task would be very difficult in itself. We were, by this stage, very aware of the kinds of inaccu- racy, exaggeration and error that could occur, particularly in respect of data on enrolment, attendance and absence (Kurian, 1983; Chap- man and Boothroyd, 1988; Lacey et al., 1993; Chattopadhyay et al., 1994; Jangira, 1994). In addition, we were conscious that some of the effects that the APPEP designers were hoping to see could be relatively small. For example, changes in enrolment of 2 or 3% could easily be masked by the kinds of exaggeration, error and inaccuracy that we had discovered during the data entry, data cleaning and early analysis of Main Survey 1. 2 In order to overcome this problem we would rely on two features of the evaluation design.

The first design feature comprises the numb- er of checks and triangulations built into the data set. The eight schedules (see Appendix 1) that were administered in each school collected teacher responses, head teacher responses, pupil and parent responses, classroom obser- vation data, pupil attendance statistics from school records as well as counts by the trained DIET staff, and finally routine test results. This complex data set enabled us to triangulate nearly all important measures and build robust indicators for each stage of the analysis, some of which will be described in a later section.

The second relevant design feature is the iongitudinal nature of the design. In Main Survey 1 (MS1) in late 1991 the main source of comparisons and therefore of evidence of pro- .lect effects would be the differences between the formally 3 trained and untrained samples of 224 and 276 schools, respectively. For Main Survey 2 (MS2) in late 1992, these 500 schools would still be included in the survey, along

with 133 new untrained schools (see Table 1). For Main Survey 2 therefore, the design would allow a three-fold comparison with one sample untrained (133 schools), one sample trained for less than one year (276 schools) and one sample trained for longer than one year (224 schools4). This second feature of Main Survey 1 and Main Survey 2 taken together would allow us to measure the cumulative effects of the innovation.

This design will in the future give us both 'over-time' comparisons and ~static' compari- sons. The 'over-time' comparisons (MS1 sam- ple A to MS2 sample B and MS1 sample B to MS2 sample C) will enable us to derive change measures, while the 3 matched MS2 samples A, B, C will enable us to make further static comparisons. The longitudinal feature of the design should enable us to compensate for the effects of such systematic sources of error as the exaggeration of attendance data. (A Main Survey 3 has also now been conducted and a Main Survey 4 is planned.)

In this article we confine ourselves to the Main Survey 1 data set and it is therefore important to begin our analysis by showing that the comparability aimed for in the two sub- samples (trained and untrained) was achieved by examining the eight background measures contained in the headteacher questionnaire (see Table 2). The selection of these measures was influenced by advice given to us by mem- bers of the APPEP Evaluation Cell. They are all school-level measures and in some cases are estimates made by the headteacher. 5 Neverthe- less, they cover a wide range of characteristics and if there were any major differences be- tween the two samples we would expect them to emerge in at least some of these dimensions.

None of the comparisons in Table 2 yield statistically significant differences between the trained and untrained samples. It follows that we can have considerable confidence that any differences that emerge between the trained and untrained samples are a result of the

Table 1. Main Survey 1 and Main Survey 2 samples

Main Survey 1 (late 1991) Main Survey 2 (late 1992)

A. APPEP (untrained) New sample A. APPEP (untrained) ~ B. APPEP (trained for less than 1 year) B. APPEP (trained for less than 1 year) ~ C. APPEP (trained for more than 1 year)

130 BARRY COOPER et al.

Table 2. Main Survey 1: trained and untrained samples compared

Probability Variable Test value

Types of school management ×2 0.71 School area (location) X 2 0.96 Ownership of school building ×2 0.82 Type of school building ×2 0.77 Literacy of parents 1. Males ×2 0.66 Literacy of parents 2. Females ×2 0.41 Average annual income of parents ×2 0.57 Rooms index (classrooms weighted for size) t-test 0.69 Years of service of teachers t-test 0.10

implementation of APPEP and not a result of any initial differences between the samples.

The remainder of the paper is organised to reflect and illustrate aspects of the heuristic model and our sequential analysis. We first discuss the training element of the initial stage of our model, i.e. project inputs to the primary school system. 6

The implementation of APPEP by the delivery system

In a scheme as large and complex as APPEP, it is inevitable that some aspects of the pro- gramme are not delivered on time or that some schools receive a training that is not as satisfactory as the norm. It will also be the case that some teachers miss the training ses- sions through illness or that some schools lose trained teachers subsequently through transfer. In 1990/91 there were a number of external events which took teachers outside of their schools and prevented the training timetable from being realised on time. Courses for the year should have been completed by June 1991. In fact, however, some of the courses were de- layed until the Autumn of 1991. Furthermore, two of these external events, the Census and the General Election, caused delays which disrupted the delicate sequencing of events. This resulted in some teachers who had re- ceived their initial training having to wait many months before follow-up Teacher Centre (TC) meetings were held. In other cases teachers did not receive the 3-day follow-up course at the planned time. In addition, there were cases where associated classroom materials were not received until long after the initial training. It

might seem therefore that 1990/91 is an atypical year and therefore not a good year in which to judge the success of a project. However, it must be remembered that there is no such thing as a typical year and that there will always be events that disrupt ambitious plans. The purpose of the evaluation is to attempt to document what the project has achieved despite the problems that have arisen from external sources. It must also be remembered that the evaluation is attempting to do this through a series of annual Main Surveys.

Training shortfalls In the 224 APPEP sample schools, there

were 928 teachers and of these 721 (77.7%) had undergone APPEP initial in-service training by the time of Main Survey 1. Their train- ing took place between July 1990 and June 1991. Therefore, over 20% of the teachers in APPEP sample schools had not undergone the APPEP training prior to the Autumn of 1991. This unforeseen delay in training resulted in the APPEP trained sample being diluted by untrained teachers. This has had the effect of reducing the number of teachers in ' trained' schools who were able to respond to some ques- tions and/or diminishing any apparent effects of training. 7 This should be borne in mind while interpreting the results of the survey reported in later sections.

Teachers' views of the in-service training Before leaving the implementation of the

project by the delivery system it is useful to examine how the teachers who attended the training courses viewed the usefulness of what they experienced. This will provide us with some indication of the effectiveness of an important aspect of the scheme. Teachers ' views of the usefulness of their initial training is set out in the second and third columns of Table 3. These initial courses were of 10-18 days in length and delivered either in the District Institutes of Education and Training (DIETs) or at the more local level of a Mandal (an administrative unit of which there are some 1100 in Andhra Pradesh). The longer DIET-based courses were taught by per- sonnel trained mainly at Project Headquarters, while the courses at Mandal level were taught typically by senior primary sch~3ol teachers and Headteachers previously trained by D IET staff. All courses covered project principles

A N D H R A P R A D E S H P R I M A R Y E D U C A T I O N P R O J E C T 131

while the longer courses also covered aspects of national (Government of India) policy on pr imary education.

Table 3 also compares the response of teach- ers trained in the year prior to Main Survey 1 with those who had been trained mainly in 1990 prior to a pilot evaluation survey of 134 schools which took place in April 1991 (Lacey et al., 1993). The results indicate that while there was a worrying decline in the numbers who found the course 'very useful' the numbers finding the course 'of no use' show only a very small increase. This decline is in line with our expectations of a cascade delivery system in which trainers train trainers who then deliver a relatively standard course a large number of times. The result is confirmed by a similar analysis of the response to the 3-day follow-up course shown in Table 4, though here both the percentages finding the course of no use and finding the course very useful have declined markedly with a corresponding rise in the percentage finding the course of some use.

These results were useful in demonstrat ing to administrators and the project management that the existence of p rogrammes of courses and timetables does not necessarily produce a 100% trained and satisfied cohort of teachers and to suggest, that when we look at the next stage outlined in the model, implementat ion

in the classroom, we might expect further slippage.

Indicators of implementation Having shown that the majority of teachers

claimed to have found the in-service courses of at least some use, we now move on to consider levels of implementat ion in the classroom, i.e. to consider the direct effects stage of our mod- el. We will utilise two sets of measures of the degree of implementat ion of APPEP principles within the schools. The six APPEP principles, around which training is organised are:

• the development of activity-based learning;

• the use of practical work;

• the use of small group work as well as whole class teaching;

• the recognition of individual differences in learning;

• the use of the local environment for teach- ing materials and as a teaching context;

• the display of children's work and the creation of an interesting classroom envi- ronment.

Our indicators of implementat ion, derived from both self-report data and observational

Table 3. Usefulness of initial in-service training: teachers" views

Percentage at the time Usefulness of training Number of teachers Percentage of the pilot studv

Very useful 288 39.94 63.18 Of some use 410 56.87 36.82 Of no use 9 1.25 I1.00 Total 707 98.06 100.00 Non- response 14 1.94

Table 4. Teachers ' opinions of the helpfulness of the 3-day follow-up courses

Percentage at the time Helpfulness of follow-up courses Number of teachers Percentage of the pilot survey

A lot 151 20.94 56.09 Quite a lot 477 66.16 33.04 Not at all 37 5.13 10.87 Total 665 92.23 100.00 Non-responses and invalid responses 56 7.77

132 BARRY COOPER et al.

data, have been designed to record the extent to which these principles have become organ- isers of classroom practice.

Self-report measures The self-report measures are based on data

collected from some 2000 teachers in both A PP EP schools (i.e. formally trained schools, some of which may contain untrained teachers) and non-APPEP schools (i.e. not yet trained). They concern (i) the use of group activities with pupils, (ii) the display of children's work in the classroom, and (iii) the extent to which teachers have been professionally active in meetings with colleagues from other schools. Table 5 shows the results for each of these self-reported indicators of implementation. The figures are means, taken across the groups of schools in each subsample, of the mean teacher scores for each school. In this table, 'group activ- ities' derives from the number per teacher claimed for the week prior to the survey. 8 'Display of children's work' is similar, while the 'participation and involvement ' scale runs from 1 to 13, being based on responses to a variety of questions concerning professional and pedagogic activity in Teacher 's Centres with teachers from other schools.

It can be seen that there is a consistent pattern of differences between APPEP and non-APPEP schools on these measures, 9 with teachers in APPEP-trained schools being much more likely to report the behaviours desired by the project designers. Furthermore, all these differences are significant, by t-tests, at better than the 0.001 level.

Observational indicators We also have access to structured observa-

tional data for one lesson from each school. (This should have been the second lesson of two actually observed - - part of an attempt to prevent the putting on of a special one- off display for the observer.) Observers, who were D IET lecturers or, in some cases, other D IET personnel, recorded observations on six dimensions at 2 minute intervals. Three of the dimensions concerned teacher behaviour, while three concerned pupil behaviour. The codes for each dimension can be found in Appendix 2. These codes were designed with the Evaluation Cell and trialed in several schools during an exercise with the Consultants, and further trialed in the pilot evaluation survey of April 1991. They were intended to cover both tradi- tional and APPEP-recommended pedagogical behaviours. Usable data was gained from the majority of schools,

Initially this data was used to construct simple bar charts to show the distribution of the observed behaviours across APPEP and non-APPEP schools. An example is shown in Fig. 3 for the case of Teacher Dimension 1 (Teacher Talk). It can be seen that TW (teacher talking to the whole class) was dominant in untrained schools but much less prevalent in formally trained schools where, in particular, TG (teacher talking to group) was much more likely to occur.

An examination of the six distributions showed that there were clear differences in each case between those for APPEP and non-APPEP schools. 1° Given this, and our

Table 5. Teachers' self-reported implementation of project principles

Non-APPEP APPEP Self-report measures of implementation (means) schools schools n (schools)

Teacher centre participation and involvement 4.77 7.10 429

Reported group activities (mathematics) Reported group activities (language) Reported group activities (environmental studies 1) Reported group activities (environmental studies 2) Reported group activities in previous week (total)

Reported display of children's work (mathematics) Reported display of children's work (language) Reported display of children's work (environmental studies 1) Reported display of children's work (environmental studies 2) Reported display of children's work in previous week (total)

O. 69 2.23 449 O. 78 1.95 449 0.49 1.74 449 0.69 1.64 449 2.65 7.57 449

0.37 1.50 449 0.35 1.39 449 0.26 1.14 449 0.34 1.50 449 1.32 5.52 449

ANDHRA PRADESH PRIMARY EDUCATION PROJECT

Teacher Dimension 1 : Percentages Within A P P E P & N O N - A P P E P

70 -r

6O P e 50 I

c 411 0

n ! 30 a

g 211 6

10

TW TI TIW TG TG'W TS

Observation Codes

133

I]] NON-APPEP

[ ] APPEP

Key :

TW

TI

TIW

TG

TGW

TS

Teacher talks to whole class

Teacher talks to individual pupils

Teacher with individual but addresses whole class

Teacher talks to group

Teacher with group but addresses whole class

Teacher silent

Fig. 3. Nature of teacher talk in trained and untrained schools (for codes see Appendix 2).

desire to move on to more sophisticated analysis, we designed, in conjunction with the Evaluation Cell, a quantitative index for each of the six dimensions. These were intended to act as indicators of the extent to which APPEP- recommended behaviours occurred in the observed lessons. Their definitions can also be found in Appendix 2. These indices were calculated for each observed lesson, i.e. at the level of the individual teacher observed. The mean values for A P P E P and non-APPEP schools were found to be significantly different (see Table 6).

These indicators of implementat ion were also found to be significantly correlated with

one another, and were combined into one observation-based index with a range of 0.09 to 6.0. The mean value of this index for A P P E P schools was found to be 2.08 and that for non- A P P E P schools 1.01 (significant at p < 0.001). The correlations of this combined observation index with the self-report indicators were also found to be statistically significant, giving us some confidence in our overall set of measures of implementat ion (see Table 7).

The 'Appep-ness' indicator From this set of measures, we finally con-

structed a combined school-level measure of the degree of implementat ion, labelled 'appep-

134 BARRY COOPER et al.

Table 6. The six observation indices (see Appendix 2 for definitions): means for APPEP versus non-APPEP schools in Main Survey 1

(all differences significant, p<0.001) Non-APPEP schools APPEP schools

Teacher dimension 1: focus of teacher talk (range: 0-1) Teacher dimension 2: purpose of teacher talk (range: 0-1) Teacher dimension 3: teacher's pedagogic activity (range: 0.09-1) Pupil dimension 1: organisation of the class (range: 0-1) Pupil dimension 2: nature of pupil talk (range: 0-1) Pupil dimension 3: nature of pupils' activity (range: 0-1)

0.12 0.24 0.20 0.35 0.39 0.47 0.05 0.35 0.06 0.28 0.14 0.39

Table 7. Correlations between self-reported and observational indicators

Self-reported participation Self-reported Self-reported

Observation in teacher use of display of Correlations: index centre activities group activities children's work

Observation index 1.0000 Self-reported participation in teacher centre activities 0.2723** Self-reported use of group activities 0.2415"* Self-reported display of children's work 0.3014"*

0.2723** 0.2415"* 0.3014"*

1.0000 0.3442** 0.3792** 0.3442"* 1.0000 0.7383"* 0.3792** 0.7383** 1.0000

Number of cases: 407; 1-tailed significance: *-0.01"* -0.001.

ness', giving equal weight to self-report and ob- servational data (both first s tandardised) f rom each school, xl The mean values of 'appep-ness' for A P P E P and n o n - A P P E P schools respec- tively were 1.0582 and -0 .8986 (difference significant at p < 0.0001). Given the loss of schools f rom this analysis as it deve loped - - mainly because of inadequate observat ional data in the case of some schools - - this final indicator was available for 407 of the 500 schools in Main Survey 1.

'Appep-ness' and pupil response: analysing a first-order outcome

A n early explora tory use of this measure was in an examinat ion of the extent to which, in A P P E P schools, 12 pupils ' en joyment of school was related to the degree of implementa t ion of A P P E P principles within the classroom. We had interview data for some 800 pupils, i.e. approximate ly four f rom each A P P E P school. These children had been asked, among many o ther things, how much they enjoyed at tending their school.a3 A mean degree of pupil enjoy- ment was calculated for each A P P E P school, and this was then corre la ted with a n u m b e r of

variables, 14 including the 'appep-ness' indica- tor, for the 181 schools for which the relevant data existed (see Table 8).

It can be seen that 'appep-ness' (our compos- ite measure of the degree of implementa t ion of project principles) and s tudents ' en joyment in a t tending school are significantly correla ted, i.e. pupils appear to be more interested in a t tending their school the more it has imple- mented A P P E P principles. It is also interesting to note o ther significant correlat ions in this table. E n j o y m e n t in at tending is negatively correlated with pupil- teacher ratio, i.e. pupils are more interested in at tending their school if it is relatively well-supplied with teachers. It is also positively corre la ted with a measure of the adequacy of the school 's general facilities (p layground, toilets, light, electricity, water supply, garden) . This pat tern of correlat ion suggest that our measures are captur ing some- thing real. This pat tern of results suggests that at least one e lement of our heuristic model receives support f rom Main Survey 1 data. Given the implementa t ion of training (a key Project input) , and subsequent implementa- t ion of Project principles in the classrooms of

A N D H R A P R A D E S H P R I M A R Y E D U C A T I O N P R O J E C T 135

e-

e~

E 0

"d

¢

k

0

[--,

~ N

~ d d ~ d d ~ d

, ~ ~ . ~ Z = = w = ~ - = =

1

. . . ~ . . . .

e -

l . .

~ ~ - ~ e e e e = e

,4 II ¢-

II

E

. ~ ~ e

E ~

~. &

= ~ ~ - ~ ~ ~ .~-~ ~ ~ ~'~ S ~ . ~ ~ - ~ o ~ - r , ~ k.~.

trained schools (a direct effect), there appears to be some positive effect on students' enjoy- ment of schooling (a first-order effect).

Second-order effects: parental awareness of APPEP activities

A total of 856 parents of pupils at APPEP schools were interviewed as part of Main Survey 1. Those who had visited the school during the academic year were asked, among other questions, whether they had noticed any change in teaching methods in the school. Four hundred and two (52%) said that they had, and 371 (48%) that they had not. This further suggests that some real change had occurred in pedagogy. Further support for this conclusion can be found by examining the relationship between a school's 'appep-ness' score and pa- rental reports of change. APPEP schools were classified into high and low "appep-ness' schools (by taking the mean value for these schools as a cut-off point). Table 9 shows the relationship between this classification and parental reports of pedagogic change. It can be seen that in schools with higher 'appep-ness' scores 60.9% of parents reported change while, in schools with lower 'appep-ness' scores, 45.6% did so. The difference is statistically significant (X 2, p < 0.001).

Parents were also asked a range of questions about their children's behaviour at home. As an example, we can consider the question which asked parents whether they had noticed their child collecting local materials for use at school (seeds, match sticks, bottle tops, etc.). Such behaviour was reported by 63% of parents. Again, there was a statistically significant re- lationship with the school's 'appep-ness" score (X 2, p < 0.001).

A straw in the wind? Continuous absence Main Survey 1 collected information on

enrolment and absenteeism. In particular, in- formation was collected on the number of children who had been 'continuously absent' during the months of March 1991 and October 1991. A simple non-proportional measure of the total number of such pupils, by gender, shows a pattern in favour of APPEP trained schools (see Table 10). At first-glance, this may appear to be evidence for a further and very important second-order outcome. However, it stood alone. There was, for example, no rela- tionship between this measure of absence and

136 BARRY COOPER et al.

Table 9. Parental reports of pedagogic change by "Appep-ness"

In schools with lower In schools with higher 'Appep-ness' s c o r e s 'Appep-ness' scores

Parents reporting change 139 (45.6%) 213 (60.9%) Parents reporting no change 166 (54.4%) 137 (39.1%) Totals 305 (100%) 350 (100%)

Table 10. Continuous absence of pupils by APPEP/non-APPEP school

Sex (continuously absent Non-APPEP schools: APPEP schools: Significance by for month of:) mean number of pupils mean number of pupils t-test

Boys (March 1991) 18.76 15.28 0.095 (not sig.) Girls (March 1991) 17.23 13.24 0.040 (sig.) Boys (October 1991) 16.93 12.76 0.023 (sig.) Girls (October 1991) 15.29 11.78 0.052 (not sig.)

n (schools) 239 198

'appep-ness' . We therefore regard it as a 'straw in the wind' to receive further examination in Main Survey 2.

Third-Order outcomes Not surprisingly, at this stage, an exami-

nation of variables representing third-order outcomes in our model produced no clear pat tern of association with the implementat ion of the project at Main Survey 1. In particular, there were no ascertainable effects on either the enrolment or drop-out of pupils. Neither did the routine examination scores received by pupils from teacher-set and marked tests show any A P P E P effect at this stage. However , we shall be pursuing the examination of third- order outcomes in the analysis of subsequent Main Surveys.

Conclusion This article presents an a t tempt at modelling

the progress of a large-scale innovation. The at tempt was governed by dual considerations. On the one hand, we wished to help to clarify the public discussion and debate about APPEP, particularly among Government of India, Government of Andhra Pradesh and Overseas Development Administrat ion offi- cials, and, on the other hand, we needed to produce a model of sufficient sophistica-

tion to bring order to a large data set. The current article concerns the second of these aims. However , in achieving this latter goal, the model also provided descriptive/analytical power and presented decision-makers with a useful tool with which to trace the progress of an important project. The analyses presented here are a selection from a much larger set which were able to show that, while some second-order outcomes had been achieved, the project had not yet delivered third-order effects (APPEP, 1993).

Considerable disappointment was felt by decision-makers in GoI , G A P and O D A about the lack of third-order outcomes at this stage. However , the model contextualised this finding as part of a developing process. It therefore became possible to reconstrue the 'failure' to produce third-order effects at Main Survey 1. The project had not had time to establish itself in schools and villages to the point where third-order outcomes became likely. For example, it would be difficult to see how a project based largely on an improved and more varied classroom pedagogy could affect enrolment and dropout within the few months which many schools had had to implement the project principles after training. Absenteeism rates were potentially more responsive to the innovation and the slight evidence of improve-

A N D H R A P R A D E S H P R I M A R Y E D U C A T I O N P R O J E C T

ment in this indicator was presented as 'a straw in the wind' that might promise more good news later.

At this stage the model catalysed two devel- opments. Firstly, it encouraged a debate within the project's field management team about the adequacy of the provisions within APPEP to produce third-order outcomes. In particu- lar, the question was raised as to whether a classroom-based strategy could affect signifi- ,:antly parental decisions about whether or not to send and/or keep a child at school when much evidence indicated that social and eco- aomic factors were more important variables. Secondly, it promoted a discussion within the Evaluation Cell and between the Cell and the Consultants about the adequacy of the design ,3f Main Survey 2 and its questionnaires in respect of the community context within which APPEP schools were functioning. As a result a :~ample of pupils and parents from non-APPEP .~chools were brought into the design and more ,questions concerning the decision to send one's child to school were directed to parents.

It is important to point out that these discus- ~ions were able to take place within a positive ,atmosphere because the project evaluation had provided strong empirical evidence of real achievement in terms of classroom im- plementation and pupil response. In addition, the model contextualised these achievements in a developing process that would require more time to penetrate the Andhra Pradesh '~AP) educational system. Subsequent surveys ~vill be well-placed to provide further evidence ,.~n which to base decisions about the direction ,:ff the project.

4 c k n o w l e d g e m e n t s - - We wish to acknowledge the ,:ontribution of Project personnel in Hyderabad to our r:hinking on these issues and particularly the members of r:he Evaluation Cell. We would also like to acknowledge ~:he support of the British Council which is funding the Evaluation Consultancy and the help and advice given o us by British Council Field Officers and Managers in India, particularly Dr Tony Davison and Dr K. N. Rao. However, the views expressed in the paper are those of h e authors, and should not be attributed to either the (-)DA or the British Council.

NOTES

1. By "marginal' we mean to refer, for example, to those ~tudents from illiterate families who would not previously have at tended the school.

137

2. We do not wish to suggest that the problems with our data set are in any way unusual (see Chapman et aL , 1988).

3. It is important to note that the 224 ' trained' schools were identified, in advance of actual training, from Project plans. For a variety of reasons, not all of the teachers from these schools had received their training by the time of MS1. Strictly speaking, the group consists of those schools which s h o u l d have received their training before MS1. This should be borne in mind while reading this paper.

4. In fact, some 20 of the 224 schools had received their training much earlier than the rest as part of a pilot project under taken several years previously.

5. With respect to literacy, for example, the headteacher is asked to est imate whether most fathers and mothers - - taken separately - - of s tudents in the school are literate or illiterate.

6. Buildings and their effects will receive analysis as part of MS2, since, at the time of MS1, it was clear that there had not yet been enough time for such effects to have occurred on any scale.

7. A problem in the computer coding of data at the time of MS1 data entry has made it very difficult up to the time of writing to pull out untrained teachers from the "trained' sample of schools. This problem was resolved prior to MS2, and future analyses will be in terms of both formal (i.e. planned) and actual training.

8. Clearly, the meaning given to 'group activities" will have varied across the teachers. It cannot be assumed that teachers reporting, truthfully, that they have used 'group activities" will have used them in ways that the A P P E P central team would recommend.

9. Some trained teachers will have been transferred to non -APPEP schools after training - - and vice versa - - and some teachers in APPEP schools will not have been trained at the time of Main Survey 1. Hence it is possible that these figures underest imate the effects of APPEP training, though against this, should be considered effects due to exaggeration (Lacey et al . , 1993).

10. It is obviously possible that, to some extent, the teachers in APPEP schools provided atypical lessons for the observers. On the other hand, on the worst assumption, we at least have evidence that they can deliver e lements of APPEP pedagogy. It is also important to note that this observational data, whatever its defects, represents a crucial advance on merely having access to self-report data, with all the associated problems flowing from 'social desirability" effects (Lacey et al. , 1993, p. 551).

11. While observational data may be more valid, the self-report data (in schools with more than one teacher) is based on responses from several teachers. Equal weighting seemed a reasonable procedure in the light of these two countervailing tendencies.

12. In MS1, but not subsequent surveys, data was only collected from parents and pupils in APPEP schools.

13. The pupils were interviewed in Telugu, the language of Andhra Pradesh. As a result of translation from and

138 BARRY COOPER et al.

then back into English, there is some ambiguity in the meaning of enjoyment here. It seems likely that it carries a sense of interest in attending school as well as enjoyment per se.

14. Some of the variables in this table are not ideally suited to correlational analysis. The exploratory purpose of the analysis should be borne in mind.

REFERENCES

Andhra Pradesh Primary Education Project (1993) Report of Main Survey 1 on Implementation of the Project in Schools. Directorate of School Education, Hyderabad, Andhra Pradesh.

Black, H., Govinda, R., Kiragu, F. and Devine, M. (1993) School Improvement in the Developing World: an Evalu- ation of the Aga Khan Foundation Programme. Scottish Council for Research in Education, Edinburgh.

Chapman, D. W, and Boothroyd, R. A. (1988) Threats to data quality in developing country settings. Comparative Education Review 32,416-429.

Chattopadhyay, R., Chaudhuri, S. and NagiReddy, V. (1994) The Status of Primary Education in Assam: A

Project Sponsored by UNICEF Calcutta. Indian Institute of Management, Calcutta.

Dyer, C. (1994) Education and the state: policy implemen- tation in India's federal polity. International Journal of Educational Development 14, 241-253.

Government of India (n.d.) The District Primary Edu- cation Programme. Government of India, Ministry of Human Resource Development, New Delhi.

Jangira, N. K. (1994) Learning Achievement of Primary School Children in Reading and Mathematics: A Synthesis Report. National Council of Educational Research and Training, New Delhi.

Kurian, J. (1983) Elementary Education in India: Myth, Reality, Alternative? Vikas Publishing House PVT Ltd, New Delhi.

Lacey, C., Cooper, B. and Torrance, H. (1993) Evaluating the Andhra Pradesh Primary Education Project: prob- lems of design and analysis. British Educational Re- search Journal 19, 535-554.

Lockheed, M. E. and Verspoor, A. M. with others (1991) Improving Primary Education in Developing Countries. Oxford University Press for the World Bank.

National Council of Educational Research and Train- ing (1994) Research-based Interventions in Primary Education: the DPEP Strategy. National Council of Educational Research and Training, New Delhi.

APPENDIX 1 DETAILS OF THE EIGHT MAIN SURVEY 1 SCHEDULES

Schedule Data To Be Collected To Be Completed By

I

II

III

IV

V

VI

VII

VIII

School: community background and resources

Enrolment, absenteeism and drop-out of pupils

Exam scores of pupils

Classroom observation

Teachers" opinions on APPEP training, and self-reported pedagogic practice

Teachers' self-reported pedagogic practice

Parents' opinions of the school

Pupils' opinions of the school

Head teacher of the school

A: Head teacher of the school B+C: DIET lecturer

Head teacher of the school

DIET lecturer

Head teachers and teachers in APPEP-trained schools

Head teachers and teachers in non-APPEP-trained schools

DIET lecturer

DIET lecturer

Note. These schedules have since been reduced to seven in number via the merging of V and VI.

140 BARRY COOPER et al.

Pupil Behaviour Dimensions, P1-P3.

P1. Codes for the first pupil dimension: the organisation of pupils for learning

pc Organised and working as a class pci Organised as a class, but working individually pg Organised and working in a group pgi Organised in groups, but working individually pp Organised and working in pairs pgc Organised in groups, but working as a class pi Organised and working individually

Index for Pupil Dimension 1

This is formed thus, for each lesson observed:

(pg + pp + pgi )/ (pc + pci + pg + pgi + pp + pgc + pi )

This is intended to give a proportional measure of APPEP-related behaviours for this dimension. Again it runs from a possible 0 to a possible 1.

P2. Codes for the second pupil dimension: whether pupils are talking, and type of talk

ps Pupils silent pat Pupil answering teacher pqt Pupil questioning teacher (content) pqto Pupil questioning about organisation ptp Pupils talking in pairs ptg Pupils talking in groups ptc Pupil(s) talking to the whole class pch Pupils chattering

Index for Pupil Dimension 2

This is formed thus, for each lesson observed:

(p qt + pqto+ ptp + ptg )/(ps +pat + pqt + pqto + ptp + pt g + ptc + pch )

This is intended to give a proportional measure of APPEP-related behaviours for this dimension. Again it runs from a possible 0 to a possible 1.

P3. Codes for the third pupil dimension: nature of pupils" learning activity

pcp Pupils copying (from blackboard, chart, book or dictation) pwm Pupils working with materials pri Pupils recording own information pdp Pupils drawing pictures pp Pupils playing psr Pupils singing or reciting pd Pupils dancing pl Pupils listening pro Pupil(s) reading (out) psp Pupil solves problems prc Pupils repeating in chorus pco Pupil(s) calling out to teacher or pupils

Index for Pupil Dimension 3

This is formed thus, for each lesson observed:

(pwm+pri+pdp+pp +psp)/(pcp+pwm +pri+pdp+pp+psr+pd+pl+pro+psp+prc+pco)

This is intended to give a proportional measure of APPEP-related behaviours for this dimension. Again it runs from a possible 0 to a possible 1.