Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval...

9
JOURNAL OF TEACHING IN PHYSICAL EDUCATION. 1992, 11, 315-323 Validity of Interval Recording in Measuring Classroom Climates in Physical Education Michael J. Stewart David Destache University of Nebraska at Omaha Greenbay Public Schools The purpose of this study was to determine the validity of interval recording utilizing a 5-s whole-interval observe time period and 5-s, 10-s, and 20- s lengths of recording intervals in measuring the classroom climates of management, instruction, and activity in a physical education setting. The various record-interval lengths were always in conjunction with a 5-s observe interval. Subjects in the study were 9 physical education teachers from elementary, junior high, and senior high levels. Activities taught by the subjects included rhythms, gymnastics, ball handling, badminton, tennis, and swimming. Each subject was videotaped for one lesson (M=28.9 min). The videotape bank was used to determine the actual and estimated time subjects spent in each climate. Comparison of the continuous time spent in manage- ment, instruction, and activity was made with the 5-s observe, 5-s record; 5-s observe, 10-s record; and 5-s observe, 20-s record interval techniques. Data were analyzed utilizing an ANOVA with repeated measures on the continuous factor. Results indicated no significant difference between contin- uous recording of management, instruction, and activity climates and any of the three observe-record methods. These results suggest that the observe- record methods were valid estimates of time spent in management, instruction, and activity climates. Observational instruments and behavioral research in physical education and coaching have employed several methods of recording teaching and coaching behavior over the past several years. One popular method has been interval recording. Interval recording allows the observer to measure the occurrence of behavior within specific intervals (van der Mars, 1989).This method of measuring behavior utilizes intervals of equal or unequal lengths of time spent observing and recording. Therefore, there would be times during the observation session when behavior would not be recorded although it might be occurring. The popularity of interval recording is reflected in the textbook Analyzing Physical Education and Sport Instruction (Darst, Zakrajsek, & Mancini, 1989), where 19 of the 34 observational instruments listed utilize some form of interval M.J. Stewart is with the School of HPER at the University of Nebraska at Omaha, Omaha, NE 68182-0216. D. Destache is with Greenbay Public Schools, Greenbay, WI 54304.

Transcript of Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval...

Page 1: Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval Recording in Measuring ... the period of time in a class when, ... Each videotape was then analyzed

JOURNAL OF TEACHING IN PHYSICAL EDUCATION. 1992, 11, 315-323

Validity of Interval Recording in Measuring Classroom Climates in Physical Education

Michael J. Stewart David Destache University of Nebraska at Omaha Greenbay Public Schools

The purpose of this study was to determine the validity of interval recording utilizing a 5-s whole-interval observe time period and 5-s, 10-s, and 20- s lengths of recording intervals in measuring the classroom climates of management, instruction, and activity in a physical education setting. The various record-interval lengths were always in conjunction with a 5-s observe interval. Subjects in the study were 9 physical education teachers from elementary, junior high, and senior high levels. Activities taught by the subjects included rhythms, gymnastics, ball handling, badminton, tennis, and swimming. Each subject was videotaped for one lesson (M=28.9 min). The videotape bank was used to determine the actual and estimated time subjects spent in each climate. Comparison of the continuous time spent in manage- ment, instruction, and activity was made with the 5-s observe, 5-s record; 5-s observe, 10-s record; and 5-s observe, 20-s record interval techniques. Data were analyzed utilizing an ANOVA with repeated measures on the continuous factor. Results indicated no significant difference between contin- uous recording of management, instruction, and activity climates and any of the three observe-record methods. These results suggest that the observe- record methods were valid estimates of time spent in management, instruction, and activity climates.

Observational instruments and behavioral research in physical education and coaching have employed several methods of recording teaching and coaching behavior over the past several years. One popular method has been interval recording. Interval recording allows the observer to measure the occurrence of behavior within specific intervals (van der Mars, 1989). This method of measuring behavior utilizes intervals of equal or unequal lengths of time spent observing and recording. Therefore, there would be times during the observation session when behavior would not be recorded although it might be occurring.

The popularity of interval recording is reflected in the textbook Analyzing Physical Education and Sport Instruction (Darst, Zakrajsek, & Mancini, 1989), where 19 of the 34 observational instruments listed utilize some form of interval

M.J. Stewart is with the School of HPER at the University of Nebraska at Omaha, Omaha, NE 68182-0216. D. Destache is with Greenbay Public Schools, Greenbay, WI 54304.

Page 2: Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval Recording in Measuring ... the period of time in a class when, ... Each videotape was then analyzed

316 STEWART AND DESTACHE

recording. Interval length in these instruments ranges from 3-s observe to 10-s observe, and from 0-s record to 60-s record.

The variability of the interval length depends on several factors. Van der Mars (1989) has suggested that the criteria used when selecting an appropriate length of interval are the level of expertise the observer has in observing and recording and the complexity of the instrument-the number of behaviors being observed and recorded. A third criterion not addressed by van der Mars is the question of representativeness. In other words, which interval length will produce data that best represent the actual behavior of the session being observed? It is important that the data collected, which represent only a sample of the continuous behavior, accurately depict what is happening in the observed session.

Interval recording, then, is a sample of the behavior that occurs during the session being observed. Parker (1989) suggested that

As with any sampling process, the more samples that are collected and the more evenly those samples are distributed across the total time, the more representative the sample is to what actually transpired during the total length of the observation session. The length of the interval must be consistent and only as long as needed to accurately code the defined behaviors. (p. 199)

Recommendations such as Parker's (1989) can be found throughout the literature, yet little research has been conducted to validate them. Intuitively, one would agree that the more samples one collects, the more representative the data would be of the behavior that actually transpired during an observed session. However, the important research question is to determine how long an interval length can be before the data no longer represents what actually transpired during the observation session. This is particularly important to the credibility of any research that utilizes interval recording.

Interval recording has been studied in the past but for the most part has not been done in phyiical education class settings. Thornson, Holmberg, and Baer (1974) conducted one of the first investigations that studied the validity of various pattems of intermittence by using interval recording in a preschool setting. They concluded that in the particular setting they used, it is better to sample a subject briefly, but repetitively, over the time available as opposed to observing (a) for the longest possible unbroken span of time, or (b) intermittently for only half the available time.

In an effort to determine if high, medium, and low response-rate data in constant and nonconstant pattems would yield the same results when collected by time sampling, interval recording, and frequency recording, Repp, Roberts, Slack, Repp, and Berkler (1976) conducted a study with data generated electrome- chanically. They concluded that time sampling was a poor predictor and that interval recording accurately represented low and medium rates but grossly underestimated high rates of behavior. Further, they concluded that there was little difference in the representativeness of the interval data when an observer observes and records in the same interval and when the observer observes in one interval and records in a succeeding interval. They did caution, however, that in studies where the data are not simulated, this may not be so and should be studied further.

A study conducted by Powell, Martindale, Kulp, Martindale, and Bauman (1977) nearly replicated an earlier study by Powell, Martindale, and Kulp (1975)

Page 3: Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval Recording in Measuring ... the period of time in a class when, ... Each videotape was then analyzed

VALIDITY OF INTERVAL RECORDING 317

that manufactured in-seat behavior so that behavior occurred 20%, 50%, and 80% of the time. Their results indicated that with the interval measure, the percent of observations scored overestimated or underestimated the amount of behavior that occurred, as compared to the continuous measure.

Later, Powell and Rockinson (1 978), utilizing the information from previous work on scored intervals and response duration (Powell et al., 1975; Powell et al., 1977), conducted a study to determine the correspondence between scored intervals and response frequency. By manipulating the frequency and the average length of response of a behavior, the researchers were able to demonstrate that there are many combinations of behavioral frequency and duration where interval measures cannot produce valid estimates of behavior.

More recently, and in a physical education setting, Silverman and Zotos (1987) conducted a study comparing continuous recording to time sampling as utilized by two versions of the interval instrument Academic Learning Time in Physical Education (ALT-PE). The purpose of their study was to compare the validity of these two widely used instruments against continuous recording. The results indicated that the actual student-engaged time and engaged time as estimated by the time-sampling instrument were significantly lower than the times obtained as estimated with the two ALT-PE instruments. This led them to conclude that widespread use of the ALT-PE instruments should be reexamined. This was the first study to question the use of interval recording as a measurement technique in a physical education classroom setting.

Given the many formal observation instruments in the physical education setting that utilize interval recording and the widespread advocacy of observa- tional systems in general (Anderson, 1980; Siedentop, 1991), it is important that the data, which are often used to make decisions about behavior changes, are valid. Further, the available research seems to be consistent in raising questions as to the validity of the use of interval recording carte blanche. The purpose of this study was to determine the validity of interval recording in measuring the classroom climates of management, instruction, and activity in physical education settings.

Method

Subjects

Nine physical education teachers were subjects for this study. They con- sented to have their classes videotaped and were told that the purpose of the study was to determine the representativeness of a variety of observational recording techniques. All were experienced teachers from one school district, and they were representative of elementary, junior high, and senior high levels. Each teacher taught one subject matter for the videotape. The variety of subject matter represented in the study included rhythmic activity, gymnastics, ball- handling skills, badminton, tennis, and swimming.

Videotape Procedures

Each subject was videotaped on one occasion. This videotape was used for subsequent analysis of comparing the three interval-recording techniques to

Page 4: Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval Recording in Measuring ... the period of time in a class when, ... Each videotape was then analyzed

318 STEWART AND DESTACHE

continuous recording. The video equipment was a Panasonic Omnimovie PV- 300. Each subject wore a Cetec Vega Pro Series wireless remote microphone.

Behaviors Used for Analysis

The behaviors that were coded to determine the validity of the three interval-recording techniques were management, instruction, and activity. These behaviors were selected because they are generally used in the physical education literature to describe the gymnasium climate (Siedentop, 1991). For the purpose of this study, management, instruction, and activity were defined as noted by Stewart (1989). Management was defined as

the period of time in a class when, theoretically, the opportunity to learn is not present. During this time, 51% or more of the students are involved in activities that are only indirectly related to the class learning activity. During this time there is no instruction, no demonstration, or no practice. (Stewart, 1989, p. 250)

Instruction was defined as

the period of time in the class when, theoretically, the opportunity for the student to learn is present. Students can receive information either verbally or nonverbally. During this time, 5 1 % or more of the students are not engaged in physical activity. (Stewart, 1989, p. 250)

Activity was defined as

the period of time when 51% or more of the students are involved in actual physical movement in a manner that is consistent with the specific goals of the particular environment. (Stewart, 1989, p. 250)

Coding of Videotapes

After videotaping all subjects, the investigators began recording the data. Actual data collection was performed by one investigator. The other collected data for the purpose of interobserver reliability checks. Videotapes were dubbed with a start and finish signal and were synchronized with a tape recorder, which was used to cue the investigator when to observe and when to record. The classroom climates of management, instruction, and activity were recorded and became the dependent variables for analysis.

First, continuous recording of the three dependent variables was completed on all the videotapes. Each videotape was then analyzed using interval methods of 5-s observe, 5-s record; then 5-s observe, 10-s record; and finally 5-s observe, 20-s record. All videotapes were analyzed using one method of observe-record before proceeding to the next method. Each time a new method was used, the subject order in which the tapes were coded was changed to decrease the possibility of order effect. After training, practice, and achieving a minimum of .90 interobserver reliability on two consecutive practice videotapes for each behavior utilizing the continuous-recording technique, actual data collection began.

Page 5: Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval Recording in Measuring ... the period of time in a class when, ... Each videotape was then analyzed

VALIDITY OF INTERVAL RECORDING 319

Interobserver Agreement

To examine the percent of observer agreement, the scored-interval method was utilized (van der Mars, 1989). Interobserver agreement was calculated for each climate for the continuous-recording method and for the three interval methods. The standard for interobserver agreement was .80 (van der Mars, 1989).

Data Analysis

Total time spent in each climate for each interval method was determined by multiplying each 5-s interval by 5 and then multiplying it by an appropriate factor. For example, the 5-s observe, 5-s record method was multiplied by a factor of 2 because a 5:5 ratio meant behavior was being observed only 50% of the time. Likewise, the 5-s observe, 10-s record method was multiplied by 3, and the 5-s observe, 20-s record method was multiplied by 5. These new values were the estimates of time spent in the total lesson. The estimates of time spent in management, instruction, and activity were then analyzed utilizing a one-way ANOVA with repeated measures on the continuous factor to determine if there were differences among the methods of interval recording. An alpha level of .05 was used to determine significance.

Results

During the collection of data, interobserver agreement was assessed with one check of each of the four measurement techniques. This was performed after the 5th subject had been coded. Results are in Table 1.

The length of the lessons used for the study ranged from 20 min and 50 s to 36 min, with an average length of 28 min and 50 s. The range of episodes for the lessons was 10 to 29, with a mean across all lessons of 20.

As noted in Table 2, continuous mean percent time for management, instruction, and activity across subjects (N=9) was 24.1%, 16.2%, and 59.7%, respectively. The percent of time spent in these climates appears to be typical of the average physical education class as reported by Siedentop (1991). As expected, there were large differences in the percent of time spent in the various

Table 1

lnterobserver Agreement Measures for the Recording Techniques

Climates

Recording techniques Management Instruction Activity

Continuous 5-s 10-s 20-s

Note. Figures were derived from using the scored-interval method.

Page 6: Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval Recording in Measuring ... the period of time in a class when, ... Each videotape was then analyzed

STEWART AND DESTACHE

Table 2

Percentage of Time Spent in Climates

Climates M SD

Management Instruction Activity

Note. Data were derived from continuous recording.

climates across subjects. Activity, for example, yielded the greatest variability (SD=13.14), ranging from 39% to 74% of class time. Instruction, however, showed the least variability (SD=6.67), ranging from 9% to 26% of class time. Management ranged from 12% to 42% of class time (SD=9.41).

The variation is most likely due to the varied activities that were represented in the study. Siedentop (1991) suggested that instruction can account for anywhere from 10% to 50% of class time depending on the activity being taught, and due to the wide range of activities being taught in this study, one would expect variability. Variability in climates can also be explained by the time already spent in the unit the data were collected in (Metzler, 1979). As Metzler pointed out, instruction is typically higher at the start of a unit, and activity is typically higher at the conclusion of a unit when students are involved in the activity itself. In the present study, the particular day within the unit was not constant across subjects. In other words, for some subjects, data were collected during the first week of the unit; for others, data were collected during the middle of the unit or the last week of the unit. This might explain the variability in the instructional climate.

To determine if the data collected utilizing the three interval-recording techniques were valid when compared to the continuous measure and if there were differences among the methods of interval recording, a one-way ANOVA with repeated measures on the continuous factor was performed for the climates of maiagement, instruction, and activity. As noted in Table 3, there was no

Table 3

ANOVA With Repeated Measures on Continuous Recording

Climates

Management Instruction Activity

Page 7: Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval Recording in Measuring ... the period of time in a class when, ... Each videotape was then analyzed

VALIDITY OF INTERVAL RECORDING 32 1

significant difference between data from the continuous-recording measure and data from the interval-recording measures in the climate of management, F(3,24)= .71, p=.56; instruction, F(3,24)=.99, p=.42; or activity, F(3,24)=1.51, p=.24. Therefore, because the data from the interval-recording measure were statistically similar to the data from the continuous measure, any of the three interval- recording methods (5-s record, 10-s record, or 20-s record) would yield valid estimates of behavior.

Discussion

As Johnston and Pennypacker (1980, p. 147) have pointed out, "how often periods of observation should occur and how long they should last is a question of accuracy of estimation." It would appear, at least for the sample in this study, that using a 5-s, 10-s, or 20-s interval to record would be acceptable durations for the climates of management, instruction, and activity. This is somewhat inconsistent with the idea that the more samples collected, the more valid the data are in representing what actually transpired during the total observation period (Siedentop, Tousignant, & Parker, 1982).

It is also inconsistent with the findings of Repp et al. (1976). Their study indicated that interval recording accurately represented low and medium rates of behavior but grossly underestimated high rates of behavior. The present study indicated that climates (high rates) were not significantly underestimated or overestimated. However, it is possible that climates cannot be fairly compared to behaviors as defined by Repp et al. (1976). It should be pointed out that the Repp et al. (1976) study utilized data that were generated by electromechanical equipment and were therefore ''pseudobehavior" as opposed to "real" behavior, which was utilized in the present study.

When Silverman and Zotos (1987) investigated the validity of a 6-s interval utilized by Siedentop et al. (1982) in the ALT-PE instrument, they found that the interval-recording estimates of engaged time by students were higher than the actual time and concluded that the two ALT-PE instruments may not be valid measures of actual engaged time. The data in the present study suggested that the 5-s, 10-s, and 20-s record intervals were, in fact, valid estimates of data from the continuous-recording method and, therefore, are in conflict with the results of Silverman and Zotos. It should be pointed out, however, that Silverman and Zotos, as well as the ALT-PE instrument, utilized partial-interval scoring for student-engaged time whereas this study utilized whole-interval scoring.

This inconsistency with the Silverman and Zotos (1987) study might also be explained in light of the fact that classroom climate is not as behavior specific as is engaged-student time. A whole repertory of behaviors constitute the global definition of management, instruction, and activity whereas student-engaged time is much more discrete. Another variable that may account for the differences in the results found in the present study and in the Silverman and Zotos study is the type of activity studied. The Silverman and Zotos study used aerobic dance, badminton, basketball, fencing, karate, and volleyball; this study used rhythmic activity, gymnastics, ball-handling skills, badminton, tennis, and swimming. Siedentop (1991) has stated that the type of activity can affect the representative- ness of the data when utilizing interval recording.

Page 8: Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval Recording in Measuring ... the period of time in a class when, ... Each videotape was then analyzed

STEWART AND DESTACHE

Additionally, Silverman and Zotos (1987) used six university physical education classes for analysis; the present study used classes at the elementary and secondary levels. It may be that grade level, the activity, and the number of days that have elapsed in a unit, as well as the number and duration of episodes, are factors that influence the results and therefore should be studied.

The present study determined that the data collected utilizing recording intervals of 5-s, 10-s, and 20-s were valid. However, there are a limited number of studies that suggest that the type of activity, the length of the episode, the behavior being studied, and the time in a particular unit may have an effect upon the validity of data when interval recording is utilized. The scope of this initial study did not seek to control for those variables. However, future studies should be designed to control for the independent variables mentioned. Suffice it to say, the literature in this area is limited, and if the use of interval recording is to continue to be used with confidence, more research is needed to validate its use in specific situations.

References

Anderson, W.G. (1980). Analysis of teaching physical education. St. Louis: Mosby. Darst, P., Zakrajsek, D., & Mancini, V. (Eds.) (1989). Analyzing physical education and

sport instruction. Champaign, IL: Human Kinetics. Johnston, J.M., & Pennypacker, H.S. (1980). Observing & recording. Strategies and tactics

of human behavioral research (pp. 145-170). Hillsdale, NJ: Lawrence Erlbaum. Metzler, M. (1979). The measurement of academic learning time in physical education.

Unpublished doctoral dissertation, The Ohio State University, Columbus. Parker, M. (1989). Academic learning time-physical education (ALT-PE), 1982 revision.

In P. Darst, D. Zakrajsek, & V. Mancini, (Eds.), Analyzing physical education and sport instruction (pp. 195-206). Champaign, IL: Human Kinetics.

Powell, J., Martindale, B., & Kulp, S. (1975). An evaluation of time-sampling measures of behavior. Journal of Applied Behavior Analysis, 8, 463-469.

Powell, J., Martindale, B., Kulp, S., Martindale, A., & Bauman, R. (1977). Taking a closer look: Time sampling and measurement error. Journal of Applied Behavior Analysis, 10, 325-332.

Powell, J., & Rockinson, R. (1978). On the inability of interval time sampling to reflect frequency of occurrence data. Journal of Applied Behavior Analysis, 11,531-532.

Repp, A.C., Roberts, D.M., Slack, D.J., Repp, C.F., & Berkler, M.S. (1976). A comparison of frequency, interval, and time-sampling methods of data collection. Journal of Applied Behavior Analysis, 9, 501-508.

Siedentop, D. (1991). Developing teaching skills inphysical education (2nd ed.). Mountain View, CA: Mayfield.

Siedentop, D., Tousignant, M., & Parker, M. (1982). Academic learning time-physical education (rev. ed.). Columbus, OH: The Ohio State University, School of Health, Physical Education, and Recreation.

Silverman, S., & Zotos, C. (1987). Validity of interval and time sampling methods for measuring student engaged time in physical education. Educational and Psychologi- cal Measurement, 4, 1005-1012.

Stewart, M.J. (1989). Observational recording record of physical educator's teaching behavior. In P. Darst, D. Zakrajsek, & V. Mancini (Eds.), Analyzing physical education and sport instruction (pp. 249-260). Champaign, IL: Human Kinetics.

Page 9: Validity Interval Recording in Measuring Classroom ... · PDF fileValidity of Interval Recording in Measuring ... the period of time in a class when, ... Each videotape was then analyzed

VALIDITY OF WTERVAL RECORDING 323

Thomson, C., Holmberg, M., & Baer, D.M. (1974). A brief report on a comparison of time-sampling procedures. Journal of Applied Behavior Analysis, 7 , 623-626.

van der Mars, H. (1989). Basic recording tactics. In P. Darst, D. Zakrajsek, & V. Mancini (Eds.), Analyzing physical education and sport instruction (pp. 19-5 1). Champaign, IL: Human Kinetics.

Acknowledgment

We gratefully thank the teachers who participated as subjects in this study, the school district for its cooperation, and the manuscript reviewers for their helpful suggestions.