PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called...

14
Review Articles Pediatric Exercise Science, 1996, 8, 201 -21 4 O 1996 Human Kinetics Publishers, Inc. Physical Fitness Testing of Children: A European Perspective Han C.G. Kemper and Willem Van Mechelen The purpose of this article is to clarify the scientific basis of physical fitness assessment in children and to review the European efforts to develop a EUROFIT fitness test battery for the youth in the countries of the Council of Europe. The development of EUROFIT is based on the efforts made in the United States in the 1950s and in Europe in the 1980s. Physical fitness mea- surement is not identical to physiological measurement: The EUROFIT tests are aimed at measuring abilities rather than skills. Correlations between physical fitness tests and physiological laboratory tests show varying results and, there- fore, need to be continued. Reliability of fitness tests needs to be continually studied. Because of the multipurposes of physical fitness testing, EUROFIT norm- and criterion-referenced scales for EUROFIT have to be developed. Examples of scaling methods are given. Implementation of the EUROFIT fit- ness tests for educational purposes is urgently needed. The measurement of physical fitness in children and youth has long been a topic of interest to physical educators, to exercise and health scientists, and lately to private organizations dealing with sport, fitness, and health. Numerous fitness tests have been constructed by physical educators, exercise physiologists, sport physi- cians, and sport trainers during the last 100 years. These tests are used to evaluate the physical performances of subjects in instructional or medical situations. There is little in the field of physical education that has stimulated as much emotional debate as the components, interpretation, and value of physical fitness testing with these tests. In 1989, Thomas Rowland (40), the editor of Pediatric Exercise Science, introduced a debate on clarifying the scientific basis of fitness assessment in children. The articles by Pate (38) and Seefeldt and Vogel(45) dis- cussed the pros and cons of large-scale physical fitness testing in youth in the United States. In 1990 Safrit (43) reviewed the validity and reliability of fitness tests for children. One of her major conclusions was that the psychometric stan- dards of fitness tests, compared to intelligence personality and motor development tests, are unacceptable. In these articles the authors did not review the European efforts to establish fitness test batteries in several countries (Belgium, The Netherlands) or the com- Han C.G. Kemper and Willem Van Mechelen are with the Institute for Research in Extramural Medicine (EMGO), AGGO Research Group, Medical Faculty, Vrije Universiteit, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands.

Transcript of PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called...

Page 1: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

Review Articles Pediatric Exercise Science, 1996, 8, 201 -21 4 O 1996 Human Kinetics Publishers, Inc.

Physical Fitness Testing of Children: A European Perspective

Han C.G. Kemper and Willem Van Mechelen

The purpose of this article is to clarify the scientific basis of physical fitness assessment in children and to review the European efforts to develop a EUROFIT fitness test battery for the youth in the countries of the Council of Europe. The development of EUROFIT is based on the efforts made in the United States in the 1950s and in Europe in the 1980s. Physical fitness mea- surement is not identical to physiological measurement: The EUROFIT tests are aimed at measuring abilities rather than skills. Correlations between physical fitness tests and physiological laboratory tests show varying results and, there- fore, need to be continued. Reliability of fitness tests needs to be continually studied. Because of the multipurposes of physical fitness testing, EUROFIT norm- and criterion-referenced scales for EUROFIT have to be developed. Examples of scaling methods are given. Implementation of the EUROFIT fit- ness tests for educational purposes is urgently needed.

The measurement of physical fitness in children and youth has long been a topic of interest to physical educators, to exercise and health scientists, and lately to private organizations dealing with sport, fitness, and health. Numerous fitness tests have been constructed by physical educators, exercise physiologists, sport physi- cians, and sport trainers during the last 100 years. These tests are used to evaluate the physical performances of subjects in instructional or medical situations.

There is little in the field of physical education that has stimulated as much emotional debate as the components, interpretation, and value of physical fitness testing with these tests. In 1989, Thomas Rowland (40), the editor of Pediatric Exercise Science, introduced a debate on clarifying the scientific basis of fitness assessment in children. The articles by Pate (38) and Seefeldt and Vogel(45) dis- cussed the pros and cons of large-scale physical fitness testing in youth in the United States. In 1990 Safrit (43) reviewed the validity and reliability of fitness tests for children. One of her major conclusions was that the psychometric stan- dards of fitness tests, compared to intelligence personality and motor development tests, are unacceptable.

In these articles the authors did not review the European efforts to establish fitness test batteries in several countries (Belgium, The Netherlands) or the com-

Han C.G. Kemper and Willem Van Mechelen are with the Institute for Research in Extramural Medicine (EMGO), AGGO Research Group, Medical Faculty, Vrije Universiteit, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands.

Page 2: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

202 - Kemper and Van Mechelen

bined action of the Council of Europe to construct, evaluate, and promote, on a wide scale, a physical fitness test battery for school-aged children (EUROFIT). On the basis of the work of Fleishman (19), both Simons et al. (46,47) and Beunen and Claessens (7) performed factor-analytical studies to construct a reliable and valid physical fitness test battery for boys and girls during their teenage years (age 12-18). Kemper et al. (27) adapted these fitness tests for The Netherlands (MOPER fitness tests), Leyten (32) extended these fitness testing to younger children (from age 8 on), and Bovend'eerdt et al. (9) paid attention to the didactic aspects of implementation in the school curriculum.

Based on the Belgian and Dutch efforts at the initiative of the Council of Europe, a European test battery was constructed and introduced (1). The dimen- sions, factors, and the test items of the EUROFIT are summarized in Table 1. It includes nine field tests, five anthropometric measurements, and age and sex as identification data.

Physical Fitness Measurement Is Not Physiological Measurement

In their well-known and widely accepted Textbook of Work Physiology, P.-0. Astrand and K. Rodahl (5) devote chapter 8 to the evaluation of tests for physical work capacity. The first paragraph is about physical fitness tests, and four specific state- ments illustrate the opinion of these physiologists. The first statement is as follows:

Table 1 Dimensions and Factors of Physical Fitness and the Eurofit Tests - -

Dimensions

Sequence order

Factor EUROFIT test for testing

Cardiorespiratory endurance

Strength

Muscular endurance

Speed

Flexibility

Balance

Cardiorespiratory endurance

Static strength Explosive strength

(power) Functional strength Trunk strength

Running speed agility

Speed of limb movement

Flexibility

Total body balance

Endurance shuttle run

Hand grip Standing broad jump

Bent arm hang Sit-ups

Shuttle run 10 x 5 m

Plate tapping

Sit and reach

Flamingo balance

Note. Anthropometric measures included height (cm), weight (kg), and body fat (five skinfolds: biceps, triceps, subscapular, suprailiac, and calf). Identification data included age (years, months)andsex, Note. Infomation in this table taken fromAdam et al. (1).

Page 3: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

Fitness Testing of Children - 203

Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special gymnastic or athletic performance, they are really not suitable for an analysis of basic physiological functions. Practice and training in the performance of the actual test may greatly influ- ence the results. (p. 355)

Here Astrand and Rodahl refer to the problem of measuring skills. When a fitness test includes an important skill character, it is certainly true that training and prac- ticing will influence the test result and that a learning effect will occur. Neverthe- less, based on Fleishmann's work (19) and that of Clarke (ll) , the concept of ability was developed. Factor analytic studies were able to identify several general components of physical fitness from a large number of different fitness tests. Such components were interpreted as an ability. Each athletic performance is more or less defined by one or more physical abilities plus its specific skills. The newly developed fitness tests, such as EUROFIT, are aimed at measuring abilities such as strength, flexibility, and endurance rather than measuring skills. The name of these abilities is more a linguistic interpretation of the general component of physical fitness than a definition.

The second significant statement by Astrand and Rodahl(5) is the following:

The fact that there may be significant correlations between the results from complicated batteries applied to a group of individuals, or that the scores are related to certain parameters characteristic of the subjects, does not neces- sarily mean a direct relationship. Such data may cause confusion rather than solve problems. From a physiological and medical viewpoint, any test bat- tery for the evaluation of physical fitness is rather meaningless unless it is based on sound physiological consideration. (p. 355)

In this passage, the authors point to the well-known phenomenon that a statistical correlation between a laboratory test of any physiological function and a physical fitness test cannot explain a cause-effect relationship. Therefore "strength" mea- sured by means of a field test probably does not reflect isometric or isokinetic muscle force measured in the laboratory.

The question remains whether the foregoing also implies that an increase over time in the same subject, or a difference between two subjects, in a score on a strength field test is meaningless. It may be true that such results are meaningless when the interpretation of a change or difference in a strength score measured with a field test is interpreted as a change or difference in maximal isometric or isokinetic muscle force. Therefore, the designers and users of physical fitness tests must be careful in their terminology and interpretation of field tests into physiological functions.

A third statement by Astrand and Rodahl(5) is the following:

It may help the teacher or coach to stimulate the athlete's interest in training. Furthermore, any progress can be evaluated objectively. The selection of such activities and tests should therefore be based on pedagogic and psycho- logical considerations with adaptations to local facilities. If they cannot be justified from these viewpoints, it is better to exclude them from the curricu- lum altogether. (p. 355)

Here the authors point to the popular way of attributing the results of physical fitness testing to fields other than biomedical sciences: The physiologist often finds

Page 4: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

204 - Kernper and Van Mechelen

psychological or sociological factors that can explain his or her results when they are insignificant or not expected. In addition, the motivational effects caused by testing itself are not primarily desired by physiologists, and they are hardly never demonstrated in the literature, but research in this trea needs to be done.

Finally, the fourth important statement by Astrand and Rodahl (5) is the following:

Too often the tests are incorrectly claimed to serve a physiological purpose. Actually, from a physiological viewpoint, application of a test battery may sometimes be unsuitable, since the performance of the tests usually demands maximal exertion of a subject who may be completely untrained. (p. 355)

Maximal exertion is inherent to physical fitness testing. At the same time, it is a serious problem, not merely from a physiological point of view as stated above but also in terms of reliability. A fitness test result is a good result when the individual tries to do his or her best. That is why some tests have insufficient reproducibility (such as endurance tests). So far, physical fitness tests are applied in young and apparently healthy subjects who also exert maximally in competition sports, physical education classes, and in their free playing time competing with peers. A good warm-up and demonstration and explanation before actual testing can prevent in- juries. In other tests, such as flexibility, overtraining of the muscleltendon complex can be avoided by giving more than one trial to the subject, so that he or she can gradually increase the performance.

Short History of Fitness Testing

The developments of fitness test batteries started in the United States after the publication of Kraus in comparing fitness results of American children with Euro- pean age peers in the 1950s. The relative inability of American youth to meet minimal standards of muscular strength and flexibility, led to the founding of the President's Council on Physical Fitness and Sports (PCPFS) in 1956. Together with the American Alliance for Health, Physical Education, Recreation and Dance (AAHPERD) (2), a Youth Fitness Test was published in 1958 (45).

In the mid 1970s, this AAHPERD test was changed from primarily perfor- mance-related to more health-related fitness, including cardiovascular fitness, body composition, and muscle force. In the mid 198Os, education-oriented packages were included to help teachers improve their students' fitness and to enhance learning of fitness concepts. This package is called Fitnessgram (14). Together with the PCPFS battery, the American physical education teacher today is confused by sev- eral national fitness tests (25,49).

In Europe, the development of fitness tests followed the Americans, with a delay of 20 years. Belgium and The Netherlands published their test batteries in the 1960s (9, 21, 32, 46, 47), followed by other countries. A more coordinated effort started in 1978, when upon the initiative of the Council of Europe, Commit- tee for the Development of Sport (CDDS) aims and concepts of a EUROFIT test battery were formulated. Between 1980 and 1982, the evaluation and choice of both motor fitness and endurance fitness tests was carried out, and as a result of their international effort, in 1983 a provisional and in 1988 a final EURORT hand- book was published in French and English (1).

Page 5: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

Fitness Testing of Children - 205

Table 2 Reference Values for Age and Sex Groups of the Eurofit Tests in Different European Countries

Country Age and sex group Author(s)

Italy Boys and girls 12-14 yrs Cilia & Belluca (10) Spain (Catalonia) Boys and girls 13-18 yrs Prat et al. (39) Belgium Boys and girls 6-12 yrs Levefre et al. (3 1)

Boys and girls 13-18 yrs Levefre et al. (30) Latvia Boys and girls 11-17 yrs Volbekiene (48) The Netherlands Boys and girls 12-16 yrs Van Mechelen et al. (36)

The test items cover both physical ability factors such as strength, power, speed, flexibility, balance, endurance, as well as body composition measures such as height, weight and skinfold thicknesses (see Table 1)

One of the main objectives is that this standardized test will be used in all European countries in order to develop population-based references for boys and girls of different age groups in each country. Such references for boys and girls now exist in a great number of member states of the Council of Europe (see Table 2). In addition to these references (including a full description of the tests in the language of the country concerned) a translated version of the EUROFIT manual is also available in Hungarian (6).

So, in contrast with the North Americans, we Europeans have succeeded in presenting one fitness battery. But establishing a standardized fitness test battery is one step. The next step is to give the educational perspective to the professionals. The fitness tests are not educational objectives, but are tools that may be used to attain such objectives (49)

Purposes of Physical Fitness Testing

According to Pate (38), in school-based physical education, fitness testing has been proposed and used for quite different purposes, varying among program evalu- ation, student motivation and recognition, selection of sport talents, and promotion of cognitive and affective learning. It is believed that by using a pretest-posttest control group design, the effects of a physical education program can be evaluated. This is, however, only possible in a true experimental design in which all other confounding factors can be mastered and when the expected changes are bigger than the measurement faults inherent to the less reliable field tests (12). Too often, this kind of physical fitness evaluation is generalized over the whole physical edu- cation program instead of realizing that fitness is only one of the purposes of the educational process.

Although motivational purposes are never proved in the literature, testing children's fitness without giving implications in an educational context will not motivate or will even demotivate children. In selecting sports talent there are at least three factors that one has to take into account. First, one must realize that simple field tests have loose validity in comparison to sophisticated and time-

Page 6: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

206 - Kemper and Van Mechelen

consuming laboratory tests. Second, even a maximal exercise test with measuring maximal oxygen uptake (~0 ,max) in a laboratory can give only a moderate pre- diction of endurance performance (34). Third, testing performances of children, who are in the middle of their growth and development, results in large interindividual variations that are determined by genetic as well as environmental factors. The consequence of this can be that a strong, accelerated child before puberty is relatively weak after puberty and vice versa.

The above-mentioned assumptions make the success of these fitness tests for this purpose at least doubtful, apart from the question whether talent selection has any educational base. The health-related tests are not being used for this purpose.

It is well known that knowledge of results is a positive factor in the mental learning process in general, but it can also facilitate motor learning in the case of motor fitness. The motivated teacher can introduce each fitness test by presenting its rationale (fit for health or for specific sports performances). After administra- tion, the results can be discussed, and effective physical education programs can be promoted to increase these fitness aspects. Therefore, it is important that the teacher has knowledge about how to do this kind of follow-up. In this respect some of the North American fitness tests are ahead of the EUROFIT in that they already made attempts to include educational materials to improve fitness and health.

However, no research is available that has evaluated the effectiveness of these strategies. The supposed health and performance improvement has to be demon- strated in the near future. As stated earlier, field tests are not very powerful in predicting sports performances. It is difficult to identify children who fail or pass preestablished standards. Determining the effectiveness of these strategies is pos- sible only if (a) standards are available; (b) these standards are normalized for important characteristics such as age, sex, and possibly other anthropometric mea- sures; (c) these standards concern tests that measure important factors, that is, mostly health-related factors (one's failure on a flexibility test has probably less health consequences than failure on a trunk strength or endurance test); and (d) these standards are not based on population distribution data (comparison between individuals), but on absolute criteria such as criterion-referenced standards. If mean- ingful criterion-referenced standards can be achieved for subjects who fail to reach these standards, they will more likely adopt another lifestyle program to amelio- rate their fitness deficiencies.

Considering the above-mentioned purposes, one can conclude that the EUROFIT test battery, as it is described in the final handbook, needs to be ex- tended in order to serve as an useful educational instrument.

Evaluation of Fitness Scores

Norm-Referenced Scales

Criteria for evaluating fitness scores are mostly based on "relative norms," that is, on percentile scores constructed from large cross-sectional databases. The norms are constructed by forming quartiles or quintiles from the population distribution of the fitness scores. In the EUROFIT, each quintile is given a verbal evaluation of low, below average, above average, and high (36) (see Figure 1).

Norms to evaluate the individual scores of fitness tests are supposed to be influenced by age and gender differences. Most fitness batteries, therefore, have norms that are differentiated for boys and girls and for calendar age groups. The

Page 7: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

Fitness Testing of Children - 207

~~utflber of (normal)

e*shinnte distribution

,o P

40 9 0 P

a0

low below average above high average average

fitness score

Figure 1 - Hypothetical distribution of a fitness score in a population. A normative interpretation of quintiles is given.

interpretation of test scores, however, must also take into account aspects other than age and gender, such as maturity level, anthropometrical factors, genetic en- dowment, skill at the test, motivation, and training effects. For the construction of norms, however, it seems impractical to fractionate into too many categories.

In 1983, Kemper et al. (28) investigated the influence of three main factors- calendar age (CA), body height (BH), and body mass (BM)-on a set of eight physical fitness tests, of which a number were the same or slightly different from the EUROFIT tests. These tests were conducted in a cross-sectional study of 3,100 girls and 3,400 boys between the ages of 12 and 18. The raw data were analyzed allometrically (3,4) in the equation of Huxley (23):

In Equation 1, Y represents a fitness performance; X either age, body height, or body mass; and B the exponent of X. By way of log-transformation of Equation 1, a linear regression can be calculated:

log Y = log A + B (log X) (2)

In Equation 2, B, and B, represent the slopes of the linear regression lines indicat- ing the importance of parameter X on the fitness test result Y (see Figure 2A). The explained variance (R2) of the regression line represents the reliability of the re- gression line (see Figure 2B).

Page 8: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

208 - Kemper and Van Mechelen

Figure 2 - Explanation of the equation log Y = log A + B(log X ) (28). B, and B, are the slope of the linear regression lines (A) and RZ is the reliability of the slopes (B).

If the regression lines explain less than 10% of the total variance (R2), the B values are given no further consideration.

In Table 3, the results of linear relationship of eight fitness tests with respect to calendar age, body height, and body mass are summarized. From the B values in the boys it can be seen that the influence of body height is more important on at least three tests (standing high jump, 10 x 5 in sprint, and 25 x plate tapping) than body mass and calendar age. However, the R2 is higher with calendar age. In the girls, the influence of calendar age and body mass is only significant in 25 x plate tapping. These results favor the constructions of norm-referenced scales for 12- to 18-year- old boys on the basis of calendar age groups. In 12- to 18-year-old girls, there is no need to construct standards on the basis of body height, body mass, and calendar age.

Another disadvantage of relative norms is that they are dependent on the reference population that is used. Let us assume that the 50th percentile is taken as criterion for failing or "unfit" (<P,), or passing or "fit" (>P,,). When the whole population on which the norms are based changes (e.g., when their running speed diminishes), the P,, value will also change, and in the case of diminished running speed, it will become easier to meet the P,, standard for the same subject. This is one of the reasons that the AAPHERD has revised its norms over the years and why it is better to use criterion-referenced scales.

/ R~

LOGY

A

Criterion-Referenced Scales

LCGY

B

-+$ --- - - - - - - \-

Setting criterion-referenced standards is, from a strict measurement perspective, almost impossible. However, from an educational perspective, it is more prefer- able than alternative methods such as nonnative comparisons, individual teachers' judgments, or no interpretation at all (49).

Cureton and colleagues (14, 16) did considerable work to establish a crite- rion for endurance fitness. For endurance running, it is known that the energy comes almost exclusively from aerobic sources (26). On the basis of that, Margaria et al. (33) constructed a nomogram to estmate the maximal aerobic power (m, max) in adult males from endurance runs over different distances and durations. In adults, running with an average speed of 8 km - hr' corresponds with a V0,max of 30 ml . min-' - kg body weighr'. If one assumes thet the ability to nm with a speed

Page 9: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

Fitness Testing of Children - 209

Table 3 Results of Linear Relationships of Eight Fitness Tests with Calendar Age, Body Height and Body Mass in Boys and Girls

Calendar age Body height Body mass R2 B R2 B R2 B

Arm pull/body mass Bent arm hang 10 x leg lifts Standing high jump 10 x 5 m spring 25 x plate tapping Sit and reach 12-min endurance run

Arm pulllbody mass Bent arm hang I0 x leg lifts Standing high jump 10 x 5 m spring 25 x plate tapping Sit and reach 12-min endurance run

Boys (n = 3,430)

Girls (n = 3,3137) -

Note. RZ values more than 10% explained variance are given consideration, and the respective B values are tabulated.

of 8 km . hr' is a minimal requirement in normal daily life and to maintain health, this corresponds with a minimum performance of 1,000 m on a 12-min endurance run (Cooper test) or a 5 km distance covered within a duration of 50 min. For the 20-m endurance shuttle run test (20 MST), the first stage of this test corresponds with a running speed of 8 km . hr-l. Oja and Tuxworth (37) came up with a criterion of 32-35 ml . kg-' . min-I for 40- to 64-year-old males based on a threshold for health benefit of daily physical activity with an energy expenditure of 7.5 kcal.

However, the accuracy of these estimations can only be rough since there are large interindividual differences in running economy in adults and also between subjects with different age and sex (17, 29, 40, 41, 42). The oxygen uptake is around 5% lower in females compared to males and is 10% higher in adults com- pared to pubertal children (35). Also, training effects can add to a higher running economy (50). Thus, for youth the nomogram of Margaria (33) is not appropriate because of differences in running economy and percentage ~ 0 , m a x between chil- dren and adults. These changes are progressive as a function of age, and there is a different relation of distance run performance to V0,max in boys and girls (15). Therefore, the criterion-referenced standards for field tests of ~ 0 , m a x has to be used from Cureton and Warren (16).

Page 10: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

21 0 - Kernper and Van Mechelen

Other attempts for health-related and criterion-referenced norms were made by Edwards et al. (18) for leg strength. In the Allied Dunbar National Fitness (ADNF) Survey, this was applied by establishing thresholds for muscle strength (20). In the ADNF Survey, functional thresholds for power and flexibility were applied on an adult population aged 16 years and over in England (20). In doing so, it was assumed that a minimal isometric two-legs strength equal to a subject's body weight is necessary to raise from a chair without using the arms. Because power does not correlate perfectly with strength, power measurements seem to be important when force is required rapidly, for example, during take-off of run- ning, jumping, and stair climbing. From correlations with simple strengthlpower tests, such as a standing broad jump, the distance corresponding to two-legs strength can be determined. However, this kind of information is still lacking. Also, a value of 15.3 force, or 10% of body weight of handgrip strength is sug- gested as criterion-referenced norm needed for independent functioning in daily household activities, such as using tools, opening containers and bottles, and supporting weights. The development of such values for fitness tests are urgently needed for youth.

Reliability and Validity of EUROFIT Tests

In order to come to valid tests, a prerequisite is the reliability of the test in terms of their reproducibility and their objectivity. Most of the EUROFIT test items are ex- tensively studied on these aspects. In fact, these reliability criteria were part of the selection procedures (7,9). Validation of the test items remains difficult. Exercise physiologists claim that fitness tests are only valid if they meet their criteria (5).

Attempts to validate some physical fitness test items such as bent arm hang, leg lifts, arm pull, and standing high jump strength in 12- to 18-year-old boys and girls (24) demonstrate that the strength and muscular endurance factors measured by these field tests cannot be compared with isometric strength testing. In Table 4, it is shown that in both sexes the correlation between field measures and labora- tory measures vary between .06 and 32, indicating an explained variance between 3 and 65 % of total variance (8).

This result holds also for the relation between endurance runs (12 min run, 20 MST). Correlations between w,max in children do not exceed 30, indicating that at the most about 60% of the variation can be explained by these running tests.

Table 4 Spearman Correlation Coefficients Between Physical Fitness Tests and Laboratory Strengths and Muscular Endurance (34 Boys and 17 Girls)

Boys Girls

10 x leglifts vs. isometric lift force of the legs -.52* -.43 Standing high jump vs. isometric extension force -.08 +.06 Arm pull vs. isometric arm-torsion +.82* +.55* Bent arm hang vs. endurance time of arm flexion +.76* +.45

Page 11: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

Fitness Testing of Children - 21 1

Higher correlations are mostly caused by measurements in very heterogeneous populations. These high correlations do not necessarily indicate high validity.

In this respect, the results of Cooper (13) are very illustrative. Although he reported a correlation of .92 between his 12-min walk-run test and V0,max (ml - kg-' - min-I), the scatter plot shows that a performance of 1,800 m could be achieved by subjects with a relatively high V0,max (52 ml . min-I - kg-') as with a relatively low V02max (40 ml . min-l . kg-'). Van Mechelen et al. (34) demonstrated correla- tions of 6-min endurance and 20 MST with directly measured ~ 0 , m a x of about .6 in 14-year-old children. For youth, a more representative diagram is given by Cureton et al. (15). A comprehensive review of correlations between distance runs and ~ 0 , m a x is given by Safrit (44).

Hence, it is better to avoid any interpretation of field test results if one can- not separate statistical factors from pure physiological factors. In most cases, the result on a field test is more or less the end product of a mixture of different fac- tors. This also holds for the flexibility factor: The sit-and-reach performance is the result of unknown contributions of bones, muscles, tendons, and articulations. It is doubtful whether attempts to correct for one of these contributions, as done by Hoeger et al. (22), can overcome this. And above all, for educational purposes we do not need physiological validation. On the other hand, correlations of .60 are certainly evidence of validity: It shows that strength and distance run tests do mea- sure isometric strength and frO,max to some extent.

Conciusions

It can be concluded that the development of the EUROFIT test battery is an impor- tant step in creating uniform physical fitness testing in Europe. However, it is only a first step. Availability of a EUROFIT handbook allows people to use these tests and to construct norm-referenced scales. These norms need not be corrected for anthropometric variables such as height and weight. In girls between 12 and 18 years of age, even a normalization for calendar ages is not needed.

Reliability of the tests continues to need study. Attempts to establish validity on physical and physiological criteria show varying successes; therefore, these studies need to be continued. In the near future, health-related criterion references will also need to be developed. More importantly, the educational implementation of these fitness tests, by means accompanying educational material such as that done for the Fitnessgram batteries from the United States, will need to be added.

References

1. Adam, C.V., M. Klissouras, R. Ravazollo, W. Renson, and W. Tuxworth. EUROFIT: European Test of Physical Fitness. Rome: Council of Europe, Cornittee for the Devel- opment of Sport, 1988.

2. American Alliance for Health, Physical Education, Recreation and Dance. Physical Best: Educational Package. Reston, VA: Author, 1989.

3. Asmussen, E., and K.A. HeebGll-Nielsen. A dimensional analysis of physical perfor- mance and growth in boys. J. Appl. Physiol. 7593-603, 1955.

4. Asmussen, E., and K.A. Heeboll-Nielsen. Physical performance and growth in chil- dren. J. Appl. Physiol. 8:371-380, 1956.

5 . Astrand, P.-O., and K. Rodahl. Textbook of Work Physiology. New York: McGraw Hill, 1987.

Page 12: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

21 2 - Kemper and Van Mechelen

6. Barabas, A. EUROFIT: A Fizikai Fittskg MLrksinek Eurkpai Tesztje [EUROFTF Euro- pean Test of Physical Fitness]. Budapest, Hungary: Minisztkrium TestneveldsiCs 6s Sport Oszrdly, 1993.

7. Beunen, C., and A. Claessens. Physical fitness evaluatie: de PF-Leuven test batterij. [Physical fitness evaluation, the PF-Leuven test battery]. Gen. Sport 6:224-231, 1987.

8. Bovend'eerdt, J.H.F., M.J.E. Bernink, T. van Hyfte, J.W. Ritmeester, H.C.G. Kemper, and R. Verschuur. De Moper Fitness Test, onderzoek verslag [The Moper Fitness Test, Research Report]. Haarlem, The Netherlands: De Vriescheborch, 1980.

9. Bovend'eerdt, J., H.C.G. Kemper, R. Verschuur, and C. Leyten. Het Moper-Fitness- Test Project: Onderzoek naar de prestatiegeschiktheid van Nederlandse jeugdigen [The Moper Fitness-Test Project: Research of performance of Dutch youth]. Gen. Sport 6:232- 237,1987.

10. Cilia, G., and M. Belluca. EuroJit: Test Europei de Attitudine Fisica [EUROFIT: Euro- pean Test of Physical Fitness]. Roma: ISEF, 1993.

I I. Clarke, H.H. Physical and Motor Tests in the Medford Boys Growth Study. Englewood Cliffs, NJ: Prentice Hall, 1971.

12. Cook, T.H., and D.T. Campbell. Quasi-Experimentation. Chicago: Rand McNally, 1979. 13. Cooper, K.H. A means of assessing maximal oxygen, correlation between field and

treadmill testing. JAMA 203:201-204, 1968. 14. Cureton, K.J. Aerobic capacity. In: The Prudential FITNESSGRAM: Technical Refer-

ence Manual, H.B. Falls, J.R. Morrow, and H.W. Kohl (Eds.). Dallas, TX: Cooper Institute for Aerobic Research, 1994, pp. 33-55.

15. Cureton, K.J., M.A. Sloniger, J.P. O'Bannon, D.M. Black, and W.P. McCormack. A generalized equation for prediction of VO, peak from 1-mile runlwalk performance. Med. Sci. Sports Exerc. 27:445-451, 1995.

16. Cureton, K.J., and G.L. Warren. Criterion-referenced standards for youth health-re- lated fitness tests: A tutorial. Res. Psych. Exerc. Sports 61:7-19, 1990.

17. Daniels, J., Oldridge, F. Nagle, and B. White. Differences and changes in VO, among young runners of 10-18 years of age. Med. Sci. Sports Exerc. 10:200-203, 197 8.

18. Edwards, R.H.T., A. Young, G.P. Hosking, and D.A. Jones. Human skeletal muscle func- tions: Description of tests and normal values. Clin. Sci. Mol. Med. 52:283-290, 1977.

19. Fleishman, E.A. The Structure and Measurement of Physical Fitness. Englewood Cliffs, NJ: Prentice Hall, 1964.

20. Health Education Authority. Allied Dunbar National Fitness Survey: A Report On Ac- tivity Patterns and Fitness Levels. London: The Sport Council and the Health Educa- tion Authority, 1992.

21. Hebbelinck, M., and J. Borms. Tests en norrnschalen [Tests and Norm Scales]. Brus- sels: Free University, 1969.

22. Hoeger, W.W.K., D.R. Hopkins, S. Button, and T.A. Palmer. Comparing the sit and reach with the modified sit and reach in measuring flexibility in adolescents. Ped. Exerc. Sci. 2:60-69, 1990.

23. Huxley, J.S. Problems of Relative Growth. London: Methuen, 1932. 24. Hyfte, T. van, J.W. Ritmeester, H.C.G. Kemper, and R. Verschuur. Relatie krachttests

uit de Moper Fitness Test met isometrische krachtmetingen [Relation of force tests from the Moper Fitnees Test with isometric force measurements]. In: De Moper Fit- ness Test, J.H.F. Bovend'eerdt, H.C.G. Kmper, and R. Verschuur (Eds.). Haarlem: de Vrieseborch, 190-21 1, 1982.

25. Institute for Aerobics Research. Fitnessgram. Dallas: Author, 1989. 26. Kemper, H.C.G. The 12 and6 minutes endurance m as field test forcxdio-respiratory

Page 13: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

Fitness Testing of Children - 21 3

fitness of schoolchildren. Proc. CDDS Cons Olympia 61:69-78, 1982. 27. Kemper, H.C.G., and R. Verschuur. The Motor Performance Fitness Test: A practical

approach in measurement in physical education in the Netherlands. In: Schriftenreihe Bundes Institutfur Sportwissenschaft (Vol. 36), H. Haag (Ed). Schondorff, Germany: Hofmann, 1981, pp. 186-191.

28. Kemper, H.C.G., R. Verschuur, P.van Dok, and J.W. Ritmeester. Influence of age, bodyheight and bodymass upon the Moper Fitness Test results of 12 to 18 year old boys and girls. In: Proceedings of 1981 ICPFR Tokio, Physical Fitness Research, Baseball Magazine. Tokyo: SHA, 1983.

29. Krahenbuhl, G.S., D.W. Morgan, and R. Pangrazi. Longitudinal changes in distance- running performance of young males. Int. J. Sports Med. 104(2):92-96, 1989.

30. Levbfre, J., G. Beunen, J. Borms, R. Renson, J. Vrijens, A.C. Claessens, and H. Van der Aerschot. EUROFIT: Leidraad bij de testafnerning en referentiewaarden [EUROFIT: Guideline for testing and reference values]. Brussel: BLOSO, 1993.

3 1. Levkfre, J., G. Beunen, J. Borms, R. Renson, J. Vrijens, A.C. Claessens, and H. Van der Aerschot. EUROFIT: Leidraad bij de testafnenmling en referentiewaarden voor 6- tot en met 12-jarige jongens en meisjes [EUROFIT: Guidelines for testing and reference values of 6- to 12-year-old boys and girls] [Monograph]. Lichamelijke Opvoeding 22: 105-127, 1993.

32. Leyten, C. De Moper Fitheids test, onderzoeksveslag 9 t/m 11 jarigen [The Moper Fitness test Research Report 9-11 years old]. Haarlem: De Vrieseborch, 1981.

33. Margaria, R., P. Aghemo, and F. Pinera Limas. A simple relation between performance in running and maximal aerobic power. J. Appl. Physiol. 38: 351-352, 1975.

34. Mechelen, W. van, H. Hlobil, and H.C.G. Kemper. Validation of two running tests and estimates of maximal aerobic power in children. Eur: J. Appl. Physiol. 55503-506, 1986.

35. Mechelen, W. van, H.C.G. Kemper, and J.W.R. Twisk. The development of running economy from 13-27 years of age. Med. Sci. Sports Exerc. 26(Suppl.): S205, 1994. (Abstract)

36. Mechelen, W. van, W.H. van Lier, H. Hlobil, I. Crolla, and H.C.G. Kemper. Handbook With Reference Scales for 12-16-Year-Old Boys and Girls in The Netherlands. Haarlem, The Netherlands: De Vrieseborch, 199 1.

37. Oja, P., and B. Tuxworth (Eds.). Eurofit for Adults, Assessment of Health-Related Fit- ness. Strasbourg: Council of Europe, 1995.

38. Pate, R.R. The case for large scale physical fitness testing in American youth. Ped. Exerc. Sci. 1:290-294, 1989.

39. Prat, J.A., J. Cosamart, N. BalaguC, M. Martinez, J.M. Povill, A. Sanchez, D. Silla, S. Santigora, G. Perez, J. Riera, J.M. Vela, and P. Partero. Eurofit: La Batteria EUROFIT a Catalufia [EUROFIT: The Physical Fitness Battery in Spain]. Barcelona: Secretaria General de I'Esport, 1993.

40. Rowland, T.W. Fitness testing in children: Where from here? Ped. Exerc. Sci. 1:289, 1989.

41. Rowland, T.W., J.A. Auchinackie, T.J. Keen, and G.M. Green. Physiologic responses to treadmill running in adult and pre-pubertal males. Int. J. Sports Med. 8:192-197, 1987.

42. Rowland, T.W., and G.M. Green. Physiologic responses to treadmill exercise in fe- males: Adult-child differences. Med. Sci. Sports Exerc. 20:474-478, 1988.

43. Safrit, M.J. The validity and reliability of fitness tests for children: A review. Ped. Exerc. Sci. 2:9-20, 1990.

Page 14: PES vol8 no3 - · PDF fileFitness Testing of Children - 203 Since most of the so-called fitness tests, including evaluation of flexibility, skill, strength, etc., are related to special

21 4 - Kemper and Van Mechelen

44. Safrit, M.J., M.G. Costa, L.M. Hooper, P. Patterson, and S.A. Ehlert. The validity gen- eralization of distance run tests. Can. J. Sport Sci. 13:188-196, 1988.

45. Seefeld V. van, and P. Vogel. Physical fitness testing of children: A 30-year history of misguided efforts? Ped. Exerc. Sci. 1: 295-302, 1989.

46. Simons, J. , G. Beunen, M. Ostyn, R. Renson, P. Swalus, D. van Gemen, and E. Willems. Construction d'une batterie de tests d'aptitude motrice pour garqons de 12 a 19 ans, par la methode de l'analyse factorielfe [Construction of a motor performance test for boys from 12 to 19 years by way of factor analysis]. In: Aspekten van de somatische en motorische otwikkelinig bij jongens, een reader over physical fitness. Leuven, Bel- gium: Kath. Universiteit Leuven, 1975.

47. Simons, J., M. Ostyn, G. Beunen, G. Ruison, and D. van Gerven. Factor analytic study of the motor ability of Belgian girls age 12-19. In: Biochemics of Sports And Kinanthropometry, F. Landry and W.A. Orball (Eds.). Miami: Symposia Specialists, 1978, pp. 395-402.

48. Volbekiene, V. EUROFIT: Fizinio Pajkgumo Testai IR metodika. Lietuvos Kuno [EUROFIT: Measurements of Physical Fitness]. Vilnius: Kulturos in Sporto Departamentas, 1993.

49. Whitehead, J.R., C.L. Pemberton, and C.B. Corbin. Perspectives on the physical fitness testing of children: The case for a realistic educational approach. Ped. Exerc. Sci. 2: 111-123, 1990.

50. Wilmore, J.H., and D.L. Costill. Physiology of Sport and Exercise. Champaign, IL: Human Kinetics, 1994.