The 40th International Biometrical Colloquium and Second...

1

The Polish Biometric Society

The 40th International Biometrical Colloquium

and

Second Polish-Portuguese Workshop on Biometry

in honour of Prof. J.T. Mexia

Abstracts

29 August – 2 September 2010

Będlewo/Poznań, Poland

2

Contents

Katarzyna Ambroży, Iwona Mejza

Balance and Efficiency of some Augmented Split-Block-Plot Design ……….………………5

Ewa Bakinowska

Zastosowanie modeli logistycznych do analizy doświadczeń polowych ..................................6

Ewa Bakinowska, Anna Szczepańska

Detection of change point in the experiment with the winter wheat ………………….………7

Bilal Ahmad Bhat

On Gamma distribution with an application using S-Plus Software ………………………….8

Tadeusz Caliński

On the analysis of experiments in nested block designs ………………………………………9

Elisabete Carolino, Isabel Barão

Comparison of Acceptance Sampling Plans for non-Gaussian variables with Acceptance

Sampling Plans for Gaussian variables (obtained by Box-Cox transformation) …………….10

Carlos A. Coelho, Joao T. Mexia

The Distribution of the Test Statistic for Testing the Equality of two Generalized Variances

in the Non-central Linear Case. An Example of Application of the results on the

Distribution of the Product of Independent Gamma-Ratio Random Variables …………...…11

Anita Dobek

Different approaches to genetic epistasis ……………………………………………...……..12

Miguel Fonseca

Linear models, analysis of variance and more …………………………………….……...….13

Bożena Gładyszewska, Izabela Kuna–Broniowska, Anna Ciupak

Analysis of Poisson’s ratio variation of tomato fruit peel …………………..……………….14

Janusz Gołaszewski, Anna Zaręba, Dariusz Załuski, Anna Imiołek,

Aneta Stawiana-Kosiorek, Tomasz Bieńkowski

A procedure for testing crop production technology ...............................................................15

M. Ivette Gomes, M. Manuela Neves

Estimation of parameters of extreme events for random censored data ..................................16

Darek Gozdowski, Wiesław Mądry, Adriana Derejko, Jan Rozbicki

Wnioskowanie na podstawie wielokrotnej serii doświadczeń dwuczynnikowej

w układzie split – block ...........................................................................................................17

Dariusz Gozdowski, Stanisław Samborski, Eike Stefan Dobers

Evaluation of methods for the detection of spatial outliers in the yield dataof winter wheat ..18

3

Jolanta Grala-Michalak, Katarzyna Kaźmierczak

Discriminant analysis for Kraft’s classes of trees …………………………...……………….19

Luís M. Grilo, Helena L. Grilo, António de Oliveira

Quantifying the (dis)agreement between two medicine measurements methods ……………20

Abdollah Hajivandi

Determining years of life lost (YLL) of leading death causes responsible for reduction

in life expectancy (boushehr – Iran) ………………………………………………...……….21

Zofia Hanusz, Joanna Tarasińska

Simulation study on multivariate normality based on Shapiro – Wilk statistics …………….22

Anna Imiołek, Janusz Gołaszewski, Dariusz Załuski

Badania ankietowe jako źródło informacji o kluczowych czynnikach agrotechnicznych

w produkcji żyta ozimego (Secale cereale L.) .........................................................................23

Katarzyna Kaźmierczak, Witold Pazdrowski, Agnieszka Jędraszak,

Marek Szymański, Marcin Nawrot

Crown width of a tree and its relationships with age, height and diameter at breast

height based on common oak (Quercus robur L.) ……………………………...……………24

Andrzej Kornacki, Katarzyna Ostroga

Zastosowanie kryterium Akaike do selekcji rozkładu normalnego .........................................25

Marcin Kozak, Agnieszka Wnuk, Dariusz Gozdowski, Zdzisław Wyszyński

Visualizing bivariate relationships with hexagonally binned data ………………….………..26

Katarzyna Marczyńska, Stanisław Mejza

Unreplicated experiments in early stage breeding programs ………………….……………..27

João Tiago Mexia

Models and inference – the normal case ………………………….………………………….28

Amilcar Oliveira, Teresa Oliveira

Using R packages in experimental design ……………………….…………………………..29

Dariusz Parys

Type I error rates in multiple testing ……………………………..…………………………..30

Wiesław Pilarczyk, Anna Fraś

On the precision of winter rape variety testing trials in Poland ……...………………………31

Stanisław Pluta, Agnieszka Masny, Wiesław Mądry, Edward Żurawicz

Fruit crop breeding with using biometrical methods ……………………………...…………32

Paulo C. Rodrigues, Ep Heuvelink, Marco Bink, Leo Marcelis, Fred van Eeuwijk

Crop growth modelling and QTL analysis of multilocation trials …………...………………33

4

Alicja Szabelska, Michał Siatkowski, Teresa Goszczurna, Joanna Zyprych

Overview of growth models in R ………………………………………………...…………..34

Agnieszka Tomkowiak, Alicja Szabelska, Joanna Zyprych, Zbigniew Broda,

Idzi Siatkowski

Analiza zróżnicowania genetycznego odmian i klonów koniczyny białej

(Trifolium Repens L.) przy użyciu markerów molekularnych .................................................35

Joanna Ukalska, Krzysztof Ukalski, Jakub Borkowski

An application of the generalized linear models for an examination of the phenotypic

quality of roe deer ……………………………………………………..……………………..36

Dorota Weigt, Alicja Szabelska, Joanna Zyprych, Idzi Siatkowski,

Zbigniew Broda

Morfological analysis of inflorescence mutants inalfalfa (Medicago sativa L.sl.)

with the respect to seed yield traits ………………………………………………..…………37

Bogna Zawieja, Wiesław Pilarczyk, Bogna Kowalczyk

Comparisons of uniformity decisions based on Coyu and Bennett’s methods –

simulated data …………………………………………………………………….………….38

Joanna Zyprych, Alicja Szabelska, Idzi Siatkowski

Gene’s selection based on statistical tests ………………………………………..…………..39

5

Balance and Efficiency of some Augmented Split-Block-Plot Design

Katarzyna Ambroży, Iwona Mejza

Department of Mathematical and Statistical Methods

Poznan University of Life Sciences

A construction procedure of an augmented split-block-plot design with control subplot

treatments is presented in the paper. In the modelling data the structure of an experimental

material and a four-step randomization scheme are taken into account. With respect to the

analysis of the obtained randomization model with six strata the approach typical to the

multistratum experiments with orthogonal block structure is adapted. A numerical example is

presented to illustrate the method of the construction, statistical properties of the final design

and their consequences for an analysis.

6

Zastosowanie modeli logistycznych do analizy doświadczeń polowych

Ewa Bakinowska

Katedra Metod Matematycznych i Statystycznych

Uniwersytet Przyrodniczy w Poznaniu

Hodowla nowych odmian zbóż jest procesem kosztownym i długotrwałym (wieloletnim).

Początkowo pracuje się nad uzyskaniem nowego materiału genetycznego i rozmnożeniem go.

Następnie wykonuje się badania porównawcze nowych genotypów w celu wykrycia linii

dających nadzieję uzyskania nowych odmian zbóż. Badania porównawcze polegają na

zakładaniu doświadczeń jednopowtórzeniowych z dużą liczbą genotypów. Na etapie

doświadczeń jednopowtórzeniowych dokonuje się ostrej selekcji i do dalszych badań

doświadczalnych wybieranych jest około 20-40% genotypów. Kolejne doświadczenia, z

mniejszą liczbą genotypów, wykonywane są już w układach doświadczalnych z

powtórzeniami.

Główna cecha badanych linii, brana pod uwagę przy wyborze do dalszych badań, to plon.

Niemniej jednak dla genotypów plonujących na tym samym poziomie ważną rolę odgrywają

inne, wizualne cechy. Są to m.in. wysokość rośliny, wyrównanie, porażenie mączniakiem

prawdziwym, wyleganie, porażenie rdzą brunatną, plamistość liści.

Celem pracy jest odpowiedź na pytanie jaki wpływ, oprócz plonu, na wybór linii do dalszych

doświadczeń, (wielopowtórzeniowych) przedwstępnych i wstępnych, mają inne obserwowane

cechy. Materiał doświadczalny stanowiły dane z jęczmieniem jarym pochodzące z

doświadczenia jednopowtórzeniowego przeprowadzonego w Stacji Hodowli Roślin

„Modzurów” – Grupy Szelejewo w 2006 roku. Do analizy użyto modelu logistycznego.

7

Detection of change point in the experiment with the winter wheat

Ewa Bakinowska, Anna Szczepańska


Poznań University of Life Sciences

The aim is a presentation of the change point’s estimation of growth of biomass. To analysis

of experiment the nonparametric regression model was applied. The change point was treated

as an abrupt change in the response function. To determine the change point the theory

showed by Paul L. Speckman was used.

References

Paul L. Speckman (1994). Detection of change-points in the nonparametric regression.

8

On Gamma distribution with an application using S-Plus Software

Bilal Ahmad Bhat

Division of Sericulture

Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir

Mirgund, (J&K) India

Gamma distribution is a natural extension of the exponential distribution which has appeared

in the literature since the early 1800s. Johnson and Kotz (1970) discuss this Distribution and

include 130 references. This distribution is one of the commonly used statistical distribution

in practice. In the literature there exists a number of generalizations of this distribution. In this

paper, another approach is suggested to derive Gamma Distribution. Finally, numerical

illustrations using S-Plus software are also given in case of real time data.

9

On the analysis of experiments in nested block designs

Tadeusz Caliński


Poznań University of Life Sciences, Poland

Nested block designs are quite often used in practice, particularly in agricultural

experimentation. Their statistical properties have been considered in many papers, as

reviewed by Bailey (1999). Of special interest are those nested block designs which satisfy

the general balance property introduced by Nelder (1965) and discussed by several authors,

by Bailey (1994) and by Bogacka and Mejza (1994) in particular. The purpose of the present

paper is to give explicit formulae for analyzing an experiment carried out in a nested block

design having the general balance property. They follow from a randomization-derived mixed

model, decomposed into stratum submodels. Of particular interest is the combined analysis

allowing the information from higher strata to be recovered. The paper is essentially an

extension of some results presented in Caliński and Kageyama (2000).

References

Bailey, R. A. (1994). General balance: Artificial theory or practical relevance?

In: Caliński, T., Kala, R. (Eds.), Proc. Int. Conf. on Linear Statist. Inference

LINSTAT’93 (pp. 171-184). Kluwer Acad. Publ., Dordrecht.

Bailey, R. A. (1999). Choosing designs for nested blocks. Listy Biometryczne – Biometr.

Lett. 36, 85-126.

Bogacka, B. and Mejza, S. (1994). Optimality of generally balanced experimental block

designs. In: Caliński, T., Kala, R. (Eds.), Proc. Int. Conf. on Linear Statist. Inference

LINSTAT’93 (pp. 185-194). Kluwer Acad. Publ., Dordrecht.

Caliński, T. and Kageyama, S. (2000). Block Designs: A Randomization Approach,

Volume I: Analysis. Lecture Notes in Statistics, Volume 150. Springer, New York.

Nelder, J. A. (1965). The analysis of randomized experiments with orthogonal block

structure. Proc. Roy. Soc. Lond. Ser. A 283, 147-178.

10

Comparison of Acceptance Sampling Plans for non-Gaussian variables with

Acceptance Sampling Plans for Gaussian variables (obtained by Box-Cox

transformation)

Elisabete Carolino1, Isabel Barão

2

1 ESTeSL, IPL, Portugal

2 DEIO, FCUL, Portugal

In the quality control of a production process (of goods and services), from a statistical point

of view, focus is either on the process itself with application of Statistical Process Control, or

on its frontiers, with application of Acceptance Sampling (AS) – studied here – and

Experimental Design. AS is used to inspect either the output process – final product – or the

input – initial product. The purpose of AS is to determine a course of action, not to estimate

lot quality. AS prescribes a procedure that, if applied to a series of lots, will give a specified

risk of accepting lots of given quality. In other words, AS yields quality assurance. An AS

plan merely accepts and rejects lots, considering sampling information.The AS by variables is

based on the hypothesis that the observed quality characteristics follow a known distribution,

namely the Gaussian distribution (classical case of the AS by variables – treated in classical

standards). This is sometimes, however, an abusive assumption, that leads to wrong decisions.

AS for non-Gaussian, mainly asymmetrical variables, is thus relevant. When we have a non-

Gaussian distribution we can build specific AS plans associated with that distribution. If the

real distribution of data is very asymmetric and/or has heavy tails, but we are able to

adequately model the data and estimate its parameters, which usually is not easy, we can use

those specific AS plans. Alternatively, we can make the transformation of the original data

into normal values through a transformation of the Box-Cox type, which requires no prior

modeling process of the data and then use AS plans for the classical case – the Gaussian case.

In this work we will address the problem of determining AS plans by variables for

Exponential distribution, Gamma distribution and Extreme Value distributions. Considering

the same sample, the acceptance sampling plans specific to each non-Gaussian variable will

be compared with acceptance sampling plans for Gaussian variables (after Box-Cox

transformation) in terms of acceptance rate of the lot. The results show advantages in

applying the Box-Cox transformations to normalize the data and then applying the acceptance

sampling plans for Gaussian variables.

11

The Distribution of the Test Statistic for Testing the Equality of two

Generalized Variances in the Non-central Linear Case

An Example of Application of the results on the Distribution of the

Product of Independent Gamma-Ratio Random Variables

Carlos A. Coelho, Joao T. Mexia

Mathematics Department - Faculdade de Ciencias e Tecnologia - Universidade Nova de

Lisboa

In his presentation we will study in detail the exact distribution of the likelihood ratio test

statistic for testing the equality of two generalized variances in the non-central linear case and

will also consider in detail some near-exact distributions for this same statistic. The results

obtained are based on the recent book "Product and Ratio of Generalized Gamma-Ratio

Random Variables: Exact and Near-exact Distributions - Applications" by the same authors.

Simulations and numerical studies are used to show the usefulness of the near-exact

distributions in handling so complicated distributions, as well as the sharp closeness of such

distributions to the exact distribution.

Reference

Carlos A. Coelho, Joao T. Mexia (2010). Product and Ratio of Generalized Gamma-Ratio

Random Variables: Exact and Near-exact Distributions - Applications, LAP - Lambert

Academic Publishing AG & Co. KG, Saarbreucken, Germany (ISBN: 978-3-8383-5846-8).

12

Different approaches to genetic epistasis

Anita Dobek



In the last years much efforts has been done for the identification of genes that are responsible

for different quantitative traits, especially in medicine, biology etc. The very important

problem is to detect genes which alone have small influence on the phenotype. The problem

may be solved by the analysis of interaction of such gene with other one. On the other hand

it is well known that the gene-gene interaction as well as gene-environment interaction plays

a pivotal role in the developments of an organism.

Due to the importance of this problem there is a huge literature dealing with the genetic

epistasis. However, the scientist representing different disciplines are using different

definitions and terminology. Consequently, it is difficult to compare the proposed statistical

tools used for the identification and estimation of gene – gene interaction effects.

The presentation of some important interpretations may facilitate the proper choice

of a statistical method used in this context.

13

Linear models, analysis of variance and more

Miguel Fonseca

In this work, I will discuss the work my work developed jointly with Prof. João Tiago Mexia.

Starting with linear mixed models, many were the incursions in estimation, hypothesis testing

and construction of confidence regions for parameters in these models. Results on mixed

linear models will be presented, with emphasis on orthogonal models.

References

[1] Fonseca M., Mexia J.T., Zmy´slony R. (2008). Inference in normal models with commutative

orthogonal block structure. Acta et Commentationes Universitatis Tartuensis de Mathematica,

12,3-16.

[2] Fonseca M., Mexia J., Zmy´slony R. (2006). Binary operations on Jordan algebras and

orthogonal normal models. Linear Algebra and Its Applications 417, 75-86.

[3] Fonseca M., Mexia J.T., Zmy´slony R. (2003). Exact Distributions for the Generalized F

Statistic. Discussiones Mathematicae – Probability and Statistics 22, 37–51.

[4] Fonseca M., Mexia J.T., Zmy´slony R. (2003). Estimating and Testing of Variance

Components: an Application to a Grapevine Experiment. Biometrical Letters 40, 1–7.

[5] Fonseca M., Mexia J.T., Zmy´slony R. (2003). Estimators and Tests for Variance

Components in Cross Nested Orthogonal Models. Discussiones Mathematicae – Probability

and Statistics 23, 173–201.

14

Analysis of Poisson’s ratio variation of tomato fruit peel

Bożena Gładyszewska1, Izabela Kuna–Broniowska

2, Anna Ciupak

1

1Department of Physics

2Department of Applied Mathematics and Informatics

University of Life Sciences of Lublin

The paper presents results of studies on the effects of storage time and temperature on the

Poisson's ratio variation skin of two varieties of greenhouse tomato: Admiro and Encore.

Poisson's coefficient is one of the most important parameters determining the strength of the

material. It was noted that Poisson's ratio variation in fruit peel Admiro stored at 130C

recommended by Polish Standard, was stored in the initial period from 0.7 to 0.8, and after 10

days declined to about 0.6 and remained at that level until the end of the experiment.

By contrast, the variety Encore characterized by lower values and lower variability

of Poisson's ratio, which stood at from 0.4 to 0.5 during the period of storage. Higher storage

temperature, which was 210C, reduced the duration of the investigation to 12 days, because

after this period it was not possible to separate the sample because of the state structure of the

fruit surface. The value of Poisson's ratio for both varieties stored at room temperature

fluctuated throughout the entire experience around the value 0.5.

15

A procedure for testing crop production technology

Janusz Gołaszewski1, Anna Zaręba

1, Dariusz Załuski

1, Anna Imiołek

1,

Aneta Stawiana-Kosiorek1, Tomasz Bieńkowski

2

1Department of Plant Breeding and Seed Production

University of Warmia and Mazury in Olsztyn, Poland 2Household Agricultural Production Seed Central Ltd.

Profitable crop production requires a quick adaptation of technology to market demands.

Thus, a traditional technology ought to be modified or made anew what means that key

agrotechnical factors which shape high yield or a property of the yield will be changed.

For testing new technology we propose a three-stage approach:

1) detection of the key technology factors on the basis of the results from survey and

advanced experimental designs (FD, FFD),

2) implementation of those factors into a series of FDD on-farm experiments, and

3) estimation of efficiency of new technology as well as economic analysis of profitability.

The approach is illustrated by empirical data obtained from testing green pea production

technology in the north-eastern Poland. From the estimation of main and interaction effects

the three factors and their levels were implemented in the three types of FFDs generated from

FD-2^3 which finally were dislocated in six farms. The synthesis of the data has given

information on the statistical efficiency and economic profitability of changing each tested

agrotechnological factor. It was concluded that the suggested procedure may be implemented

in testing other crop production technologies, allowing for a specificity of a given crop.

16

Estimation of parameters of extreme events

for random censored data

M. Ivette Gomes1, M. Manuela Neves

2

1Universidade de Lisboa, Faculdade de Ciências, DEIO e CEAUL

2Universidade Técnica de Lisboa, ISA, e CEAUL

In the area of Statistics of Extremes we deal essentially with the estimation

of parameters extreme events, like the probability of exceedance of a high level

or a high quantile, situated in the border or even beyond the range of the available data. The

most common assumptions on any set of univariate data are either independently, identically

distributed or weakly dependent and stationary complete samples, from an unknown

distribution function F. However, in the analysis of lifetime data, observations are usually

censored. We shall now assume the case of random censorship, where apart from a recent

paper by Einmahl et al. (2008) and another by Gomes and Neves (2010), there is only, as far

as we know, a brief reference to the topic in Reiss and Thomas (1997, Section 6.1) and a

paper by Beirlant et al. (2007). In such a context of random censorship, as in all applications

of extreme value theory, the estimation of the extreme value index (EVI) is of primordial

importance. Such a parameter measures the heaviness of the tail and has been widely studied

in the literature. For heavy tails we mention the classical Hill estimator (Hill, 1975) and the

most recent minimum-variance reduced-bias estimators of the EVI (Caeiro et al., 2005;

Gomes et al., 2007; Gomes et al., 2008) and of extreme quantiles (Gomes and Pestana, 2007).

For a general EVI estimation, we mention the moment estimator of Dekkers et al., (1989) and

the “maximum likelihood” estimator (Smith, 1997; Drees et al., 2004). We shall give here

special attention to such estimation, as well as associated high quantile estimation under

random censoring, making use of a recent general EVI estimator, the mixed moment

estimator in Fraga Alves et al. (2009). We shall illustrate the results with simulations and with

the application of the methodology to a set of survival data.

17

Wnioskowanie na podstawie wielokrotnej serii doświadczeń

dwuczynnikowej w układzie split – block

Darek Gozdowski1, Wiesław Mądry

1, Adriana Derejko

1, Jan Rozbicki

2

1Kadedra Doświadczalnictwa i Bioinformatyki

2Katedra Agronomii

Szkoła Główna Gospodarstwa Wiejskiego

Celem pracy jest sformułowanie łącznej analizy wariancji dla danych z serii doświadczeń

dwuczynnikowej w Porejestracyjnym Doświadczalnictwie Odmianowym (PDO). Skupienie

się na średnich w konfiguracji dla efektów głównych odmian i sposobu uprawy

oraz interakcje pomiędzy tymi czynnikami, tym samym dążąc do oceny wpływu dwóch

sposobów uprawy na średni plon pszenicy. Ocena ta będzie pokazana w różnych

środowiskach przeciętnie dla odmian oraz na interakcji odmiana x sposób uprawy.

W ostatniej odsłonie ukazano wpływ sposobu uprawy dla każdej z badanych odmian

przeciętnie dla każdej z miejscowości. Doświadczenie zaplanowane jest w układzie

split – block, seria doświadczeń zawiera 25 odmian, 2 sposoby uprawy oraz 8 lat.

18

Evaluation of methods for the detection of spatial outliers in the yield data

of winter wheat

Dariusz Gozdowski1, Stanisław Samborski

2, Eike Stefan Dobers

3

1Department of Experimental Design and Bioinformatics, Warsaw University of Life Sciences

2Department of Agronomy, Warsaw University of Life Sciences

3Faculty of Geoscience and Geography, Georg – August University

Yield maps are a valuable source of spatial data in precision agriculture, but only if they

report crop yields close to the actual yields. Unfortunately, devices used to monitor crop

yields quite often register data significantly different from actual yield values. The number of

such incorrect data (spatial outliers) saved depends on the presence of obstacles in the field,

stops of harvester, etc. Share of the spatial outliers usually ranges from 10 to even 50%. It is

difficult and laborious, to point out the outliers based on raw yield data and visual assessment

of yield maps. Statistical methods that could help to detect such outliers are very desirable.

This work presents evaluation of three methods of spatial outlier detection in yield data.

Raw yield data used for the analyses came from a field cropped with winter wheat in 2009

located in north of Poland. Three methods were used for the spatial outliers detection,

one method based on histogram and two methods based on spatial autocorrelation coefficient

(Moran’s I). Different percentages of the outliers were detected using each of the methods

and quite weak correspondence between the methods was achieved.

The study proved that the use of the autocorrelation coefficient Moran’s I alone, is not

an objective method for the spatial detection of outliers within raw yield data. The detection

of spatial outliers based on negative value of Moran’s I was not sufficient and many outliers

pointed out earlier by the histogram method were not detected.

It has been observed that not only negative autocorrelation coefficient Moran’s I but also its

very high value can be the indicator of an outlier.

The process of detection of spatial outliers should consist of classical methods (e.g. removing

very high and very low values of grain yield) and complementary methods based

on the autocorrelation coefficient as a final step for creation of reliable yield maps.

19

Discriminant analysis for Kraft’s classes of trees

Jolanta Grala-Michalak1, Katarzyna Kaźmierczak

2

1Faculty of Mathematics and Computer Science

Adam Mickiewicz University 2Department of Forest Management


The paper presents results of discriminant analysis for Kraft’s classes of trees.

Kraft’s classification is based on tree position in the stand social structure and its crown

development and extent. Belonging to a given social class it reflects a position of a tree

in a stand, and through this, its growth potential. The aim of the analysis was the choosing

of the variables which mostly determined the Kraft’s class of tree and the construction

of discriminant functions which well classifies data to Kraft’s classes.

20

Quantifying the (dis)agreement between two medicine

measurements methods

Luís M. Grilo1, Helena L. Grilo

2, António de Oliveira

3

1Mathematics Department

Polytechnic Institute of Tomar, Portugal 2Mathematics Department

Polytechnic Institute of Tomar, Portugal 3Medical Expert, Portugal

To analyze the serum levels of folic acid in a blood sample we use two different medicine

measurement methods, which usually do not produce exactly the same results. In order to

replace the old method by the new one, without causing problems in clinical interpretation,

we need to assess the agreement of the available data, which in this case presents a complex

variation across the range of the measurement. To do so, we estimate the 95% limits of

agreement, before and after logarithmic transformation, and we also consider an appropriate

use of regression. We apply these two different statistical techniques that are very useful and

easy to interpret by medical researchers.

21

Determining years of life lost (YLL) of leading death causes responsible for

reduction in life expectancy (boushehr – Iran)

Abdollah Hajivandi

Biostatistics and Epidemiology Department

Isfahan University of Medical Sciences, Isfahan, Iran

Introduction: death causes with highest rate are not always major factors responsible for

reduction of life expectancy but those are responsible which have highest YLL (years of life

lost} that is life expectancy at age at death time. this subject is investigated for mortality data

of population of boushehr province, located in south of Iran .

Methodology: years of life lost (YLL) of three leading death causes in province are computed

for both gender separately based on life expectancies, age and number of death due to each

death cause using mortality data.

Results: in both gender groups, hearts diseases are leading causes of death but only in group

of women this death cause contributes highest YLL. In men number of death from accidents

is about half of it due to heart diseases but years of life lost (YLL) by accidents is 2.5 times

more than YLL of heart disease. Mean of ages at death occurrence in accident groups of two

sexes are lowest among means of three leading causes of death.

Conclusion: driving accidents have highest influence on reduction in life expectancy

comparing to other leading causes of death in community specially in men because of low

mean age of people who are dead in driving accidents. This survey shows that years of life

lost (YLL) of death causes is more important than number of death in different death causes

in life expectation promotion projects of the community.

22

Simulation study on multivariate normality

based on Shapiro – Wilk statistics

Zofia Hanusz, Joanna Tarasińska

Department of Applied Mathematics and Computer Science

University of Live Sciences in Lublin

The paper concerns three tests for multivariate normality based on the

Shapiro – Wilk W statistic for the principal components of a covariance matrix. Two of them

were proposed by Srivastava and Hui (1987), the third was introduced by Hanusz

and Tarasińska (2008b). The type I errors of these tests at significance levels 0.1, 0.05

and 0.01 are evaluated both for the sample and residuals in the two data groups. The powers

of the tests under consideration against chosen alternative distributions are also presented

in both the sample and residual cases.

23

Badania ankietowe jako źródło informacji o kluczowych czynnikach

agrotechnicznych w produkcji żyta ozimego (Secale cereale L.)

Anna Imiołek, Janusz Gołaszewski, Dariusz Załuski

Department of Plant Breeding and Seed Production

University of Warmia and Mazury in Olsztyn, Poland

Badania ankietowe są powszechnie stosowaną metodą gromadzenia danych w badaniach

rynkowych lub socjologicznych. W badaniu agrotechniki upraw ankiety wykorzystuje

się relatywnie rzadko, pomimo iż odpowiednio skonstruowany kwestionariusz ankietowy

może stanowić podstawę uzyskania cennych informacji o technologii i możliwościach

wprowadzania innowacji technologicznej.

Badania ankietowe własne miały na celu określenie kluczowych elementów technologii

produkcji roślin. Rośliną testową było żyto ozime (Secale cereale L.) uprawiane na ziarno.

Ankietyzacją objęto większość producentów ziarna żyta w północno-wschodniej Polsce

prowadzących uprawę na areale większym niż 1 ha. Kwestionariusz ankietowy zawierał

pytania dotyczące charakterystyki ogólnej gospodarstwa, czynników technologicznych

produkcji, oceny energochłonności (agrotechniczna) oraz struktury nakładów. Zakodowane

dane o czynnikach produkcji stanowiły zmienne ogólnego modelu liniowego. W analizie

wyników dokonano analizy technologii produkcji oraz detekcji czynników kluczowych

technologii uprawy z wykorzystaniem sum kwadratów typu III w ANOVA

oraz oszacowanych wielkości efektów czynnikowych eta-kwadrat.

24

Crown width of a tree and its relationships with age, height and diameter at

breast height based on common oak (Quercus robur L.)

Katarzyna Kaźmierczak1, Witold Pazdrowski

2, Agnieszka Jędraszak

2,

Marek Szymański2, Marcin Nawrot

2

1Department of Forest Management 2Department of Forest Utilization

Poznań University of Life Sciences,

The paper presents results of studies on a dependency between crown width on basic

measurements of the tree. The aim of the analysis was to determine the strength of the

relationship between crown diameter of a tree and its diameter at breast height and height as

well as age. Moreover, regression equations were developed for the estimation of crown

width. The experimental material comprised measurement data of 33 oaks (aged from 41 to

148 years). Crown projection area was established for each tree on the basis of characteristic

points of the tree crown projected using a crown projector. Crowns were projected in eight

geographical directions. Equation parameters were estimated using the method of least

squares and the strength of the relationship was established on the basis of empirical data. A

function most accurately explaining the dependence of crown width on diameter at breast

height was selected. Moreover, the distribution of measurement data was compared with the

normal distribution and their basic statistical characteristics were established. The power of a

relationship between crown width and diameter at breast height, height and age of a tree was

evaluated using a correlation coefficient for a linear dependence. In turn, the strength of the

dependence between crown width and diameter at breast height was evaluated by a correlation

ratio for curvilinear functions. In view of the statistically significant dependence between

crown width of oaks and measured tree traits the analysis of regression was conducted,

assuming the investigated traits (age, height and diameter at breast height) as independent

variables. The progression stepwise regression was applied. Crown width of a tree may be

determined both on the basis of information on the age, height and diameter at breast height

of a tree. Diameter at breast height was the best among measurable traits of a tree for the

estimation of crown width. This is shown by the strength of a dependence between both traits

and the easily measurable diameter at a height of 1.3 m.

25

Zastosowanie kryterium Akaike do selekcji rozkładu normalnego

Andrzej Kornacki1, Katarzyna Ostroga

2

1 Katedra Zastosowań Matematyki i Informatyki

2 Katedra Maszynoznawstwa Rolniczego

Uniwersytet Przyrodniczy w Lublinie

Rozkład normalny jest powszechnie stosowany w naukach przyrodniczych, technicznych jak

również humanistycznych i społecznych. W literaturze opisanych jest wiele testów

sprawdzających normalność rozkładu badanej cechy zarówno w przypadku jedno

jak i wielowymiarowym. (literatura). Wszystkie one wymagają stosowania odpowiednich

tablic wartości krytycznych.

W niniejszej pracy proponuje się zastosowanie do selekcji między rozkładem normalnym

i logarytmo–normalnym kryterium informacyjnego Akaike. Kryterium to wywodzi się

z teorii informacji. Sugerowaną metodę zastosowano do danych uzyskanych w doświadczeniu

z plonami buraków cukrowych .Wnioski wynikające z kryterium Akaike potwierdzono

klasycznym testem Shapiro–Wilka.

26

Visualizing bivariate relationships with hexagonally binned data

Marcin Kozak1, Agnieszka Wnuk

1, Dariusz Gozdowski

1, Zdzisław Wyszyński

2

1Department of Experimental Design and Bioinformatics

2Department of Agronomy

Warsaw University of Life Sciences

Scatterplots are overwhelmingly often used in numerous branches of science. Unfortunately,

they fail when the number of points in a plot is very large. Several graphical techniques are

available to overcome such problems, and one of them if graphing hexagonally binned data.

The algorithm for hexagonal binning is quite complex, but the resulting graph is easy to

understand: a color of hexagonal symbols represents a number of observations that occurred

in the space covered by the corresponding bins. Unfortunately, this technique still has not

gained popularity among researchers, and the aim of our work was to present the usefulness

of graphing hexagonally binned data. This will be done for a three – year field experiment

with spring barley. The display is compared with regular scatterplots and it is shown that it

provides information about a relationship when regular scatterplots fail, even despite

employing very small symbols and jittering.

27

Unreplicated experiments in early stage breeding programs

Katarzyna Marczyńska, Stanisław Mejza



In plant breeding trials, during the early stages of the improvement process, it is not possible

to use an experimental design that satisfies the requirement of replicating all the treatments

because of the large number of genotypes involved, the small amount of seed and the low

availability of resources. Hence, the unreplicated designs are used for early generation testing

when hundreds or even thousands new genotypes need evaluation in the same trial using

a limited amount of seed that is enough for one replicated only. To control the real

or potential heterogeneity of experimental units, control (check) plots are arranged in the trial.

There are many methods of using information resulting from check plots. In the paper

the main tool of exploring this information will be based on a response surface methodology

(RSM). At the beginning we will try to identify response surface characterizing experimental

environments. The obtained response surface we will be then used to adjust the observations

for genotypes. Finally, so adjusted data will be used for inference concerning the next steps

of breeding program. The theoretical considerations will be illustrated with the example

dealing with spring barley.

28

Models and inference – the normal case

João Tiago Mexia

Faculdade de Ciências e Tecnologia. Universidade Nova de Lisboa, 2825 Monte da

Caparica, Portugal

The normality is a key feature when making inference in many classes of models. In this

paper we present and obtain inference for several classes of normal models, namely mixed

models, models with orthogonal block structure, and models with commutative orthogonal

block structure. The assumption of normality in these families of models leads to estimators,

tests and confidence regios with optimal properties. Some operations with these models, such

as crossing and nesting, are discussed.

29

Using R packages in experimental design

Amilcar Oliveira1, Teresa Oliveira

2

1Center of Statistics and Applications, University of Lisbon

2Departament of Sciences and Tecnology, Universidade Aberta, Portugal

In Experimental Design besides the crucial parts of new methodological advancements

and new areas of application, it urges to improve the willingness of the future researcher

by developing the skills to take advantage of the ultimate roles. In the twenty – first century

new challenges emerge from the software development which allowed great advances

in all areas of research, particularly with respect to Statistics and Experimental Design.

Usually it was common the use of computer programs such as STATISTICA, SPSS or SAS

in teaching and research programs. Recently, software R appears as the current program

of greater investment in the scientific community of Statistics. R is a user-friendly software,

free, open to the community and manageable according to the specific needs in each situation.

In this work we present a retrospective of the main functions and packages used in problems

involving Experimental Design issues. We will seek to point out ways towards

the development of topics of interest in this area which until now have not been adequately

addressed in R.

References

Atkinson, A.C. and Donev, A.N. (1992). Optimum Experimental Designs. Oxford: Clarendon

Press.

Bailey, R.A. (1981). A uni_ed approach to design of experiments. Journal of the Royal

Statistical Society, Series A 144 , 214-223.

Box G. E. P, Hunter, W. C. and Hunter, J. S. (2005). Statistics for Experimenters

(2nd edition). New York: Wiley.

Fox, J. (2005). The R Commander: A Basic-Statistics Graphical User Interface to R. Journal

of Statistical Software 14 , Issue 9.

Groemping, U. (2009). Design of Experiments in R. Presentation at UseR! 2009 in Rennes,

France.

Lenth, R.V. (1989). Quick and Easy Analysis of Unreplicated Factorials. Technometrics 31,

469-473.

30

Type I error rates in multiple testing

Dariusz Parys

University of Lodz

In multiple testing of hypothesis we have many possible definitions for the Type I error rates.

In this paper we consider the family-wise error rate (FWER), the generalized family – wise

error rate (gFWER), the per-family error rate (PFER), the per-comparison error rate (PCER),

the median based per – family error rate (mPFER), the quantile number of false positives

(QNFP). We treat the Type I error rate as a parameter )( , nRnUn F of a joint

distribution nRnUF , of the numbers of Type I errors En.

We also consider the class of multiple testing procedures that control a given Type I error rate

at an acceptable level α with remarks on power of these procedures.

31

On the precision of winter rape variety testing trials in Poland

Wiesław Pilarczyk1,2

, Anna Fraś2

1Department of Mathematical and Statistical Methods

Poznań University of Life Sciences 2The Research Centre for Cultivar Testing

Winter rape is an important crop in Poland. So every new variety before listing in National

List is carefully examined by The Research Centre for Cultivar Testing. The performance of

varieties is checked in numerous VCU (value for cultivation and use) trials performed at

experimental stations characterized by different soil and climatic conditions. Winter rape is

very sensitive to extreme climatic conditions, e.g. frost in winter and drought periods in

vegetation season. Such factors influence not only the performance of individual varieties but

also overall precision of trials. Precision is often identified with the average value of least

significant difference expressed in percents of general mean (LSD%). The precision of VCU

trials on cereals in Poland has been reported in the papers by Pilarczyk [1987, 2008] and by

Pilarczyk and Fraś [2008]. In this research, using similar ethodology, the precision of winter

rape trials is investigated using extensive data from trials performed in the period between

2000 and 2008.

References

Pilarczyk,W.,1987, Precision of field trials in incomplete block designs for several species,

Biuletyn Oceny Odmian 18-19, str. 161-169.

Pilarczyk W., 2008, Confidence bounds for precision in cereal trials in Poland, Biometricke

Metody a Modely v Podohospodarskej Vede, Vyskume a Vyucbe, pp. 139-145.

Pilarczyk W., Fraś A., 2008., Ocena precyzji doświadczeń rejestrowych zbóż w Polsce,

Biuletyn IHAR 249, str. 19-27.

32

Fruit crop breeding with using biometrical methods

Stanisław Pluta1, Agnieszka Masny

1, Wiesław Mądry

2, Edward Żurawicz

1

1Research Institute of Pomology and Floriculture

2Warsaw University of Life Sciences

A basic method of the fruit crop breeding is a conventional approach, that involves crossing

of different parents, and then positive selection amongst the obtained offspring (seedlings) of

the F1 generation. However, producing new cultivars by this way is long-term, needs the high

financial support and a lot of work. Applying the molecular techniques as well as biometric

methods permit shortening of the breeding cycle and increasing a breeding efficiency. The

selection of parental genotypes for crossing programs can be done on the basis of their

assessment for phenotypic traits (per se) in the cultivar collection (gene bank). Crossing

programs could be more effective when the breeding value of parental genotypes were

assessed on the basis of the general and specific combining ability effects for important traits.

Combining abilities of parents can be assessed with mating designs including the diallel or

factorial mating designs. Crossing of selected parents is carried out in controlled conditions

and in the field. Seedlings are produced in the winter-spring season in the heated greenhouse

and, then, after their hardening in the second half of May they are planted in the selection

fields. The assessment and selection of the breeding material are done in few steps: on

seedlings level, on advanced clones and in multi-environment cultivar trials. It takes from 8

till 15 years, depending on fruit crop. In the breeding process knowledge of variation,

heritability and correlation among important traits play a substantial role within the gene pool.

In order to know these population characteristics multivariate statistical methods should be

used. The most often utilized are the PCA and Cluster analysis. The band pattern for the best

advanced breeding clones is performed by using DNA fingerprinting. It permits an

identification of the genotype and confirming its origin. The most valuable breeding clones

are submitted for the register trials conducted by COBORU for their final evaluation. Multi-

environment trials are conducted including two their kinds carried out parallel. In the first

kind of trial testing for DUS of new tested genotypes is conducted. In the second one the

VCU is assessed. On the basis of results of the both trials the best breeding clones are released

in the National Register as original cultivars.

33

Crop growth modelling and QTL analysis of multilocation trials

Paulo C. Rodrigues1,2

, Ep Heuvelink3, Marco Bink

2, Leo Marcelis

3, 4,

Fred van Eeuwijk2,5

1CMA and Department of Mathematics, Faculty of Sciences and Technology, Nova University

of Lisbon, 2Biometrics, Wageningen University and Research Centre

3Horticultural Supply Chains, Wageningen University and Research Centre 4Greenhouse Horticulture, Wageningen University and Research Centre

5Centre for Biosystems Genomics, P.O. Box 98, 6700 AB Wageningen, The Netherlands

A different response of genotypes across environments is frequent in multi-location trials and

is known as genotype by environment interaction (GEI). The study and understanding of these

interactions is a major challenge for breeders and agronomic researchers. However, for the

last two decades, molecular markers and mapping techniques have allowed researchers to go

one step further and analyse the whole genome to detect specific genes which influence a

quantitative trait such as yield. These specific locations (loci) are called quantitative trait loci

(QTL), and the ”new” challenge of breeders is the analysis of QTL by environment

interaction (QEI).

In this paper we use an adaptation of the LINTUL (light interception and utilization

simulator) crop growth model with 7 physiological parameters, to simulate two-way

phenotypic data tables. To each of these 7 parameters, a number of QTL are assigned in order

to study different genetic architectures. Considering θ the vector of the 7 parameters and f(.) a

nonlinear function, the phenotypic realisations (e.g. yield for genotype i and environment j)

can be written as

( ) , with , ,Phe f QTLi j i i j θθ θ (1)

The objectives of this simulation study are: (i) to determine whether is possible to generate

realistic GEI and QEI, including crossovers, using a simple crop growth model with 7

physiological parameters without GEI; (ii) to determine whether the QTL for physiological

parameters are found in a QTL analysis of the two-way table of phenotypic data; and (iii) to

explore and compare different genetic architectures underlying yield simulated by crop

growth modelling. GEI and QEI for yield of sweet pepper (Capsicum annuum L.) were

simulated through the crop growth model. In this case study, the QTL assigned to some of the

physiological parameters matched the QTL detected for yield.

34

Overview of growth models in R

Alicja Szabelska1, Michał Siatkowski

2, Teresa Goszczurna

1, Joanna Zyprych

1


2Department of Agricultural Engineering


The growth rates are a useful tool for modeling the natural events that involve investigation of

changes of the response in time. With given model of growth rate we can consider time

as a continuous variable instead of discrete one. Depending on the type of modeled process

we can obtain better or worse fit of given model to empirical data. The considered models

differ from each other. However for specific values of the parameters can introduce similar

shape of the growth curves. For the most of processes it is possible to find appropriate model.

In some cases there exists more than one suitable model.

This poster presents five growth models: exponential, logistic, log-logistic, Gompertz and

Weibull. For each model the formula, the graphical representation and tools for estimation

of the parameters in R platform are described. Next, Akaike Information Criterion

and Bayesian Information Criterion are presented as a criteria, which permit comparing them

and choosing the best model.

35

Analiza zróżnicowania genetycznego odmian i klonów koniczyny białej

(Trifolium Repens L.) przy użyciu markerów molekularnych

Agnieszka Tomkowiak1, Alicja Szabelska

2, Joanna Zyprych

2, Zbigniew Broda

1,

Idzi Siatkowski2

1Katedra Genetyki i Hodowli Roślin,

2Katedra Metod Matematycznych i Statystycznych

Uniwersytet Przyrodniczy w Poznaniu

W ostatnich latach obserwuje się stopniowy wzrost zainteresowania uprawą roślin

motylkowatych, będący wynikiem zastępowania w żywieniu zwierząt gospodarskich białka

zwierzęcego paszami roślin wysokobiałkowych.

Celem badań była analiza zróżnicowania genetycznego odmian i klonów koniczyny białej z

wykorzystaniem techniki RAPD - PCR oraz określenie udziału poszczególnych klonów w

tworzeniu odmian na podstawie podobieństwa genetycznego określonego przy pomocy

markerów molekularnych. Materiałem roślinnym użytym do badań były cztery odmiany

koniczyny białej oraz jedna odmiana rozmnażana generatywnie poprzez szkółkę selekcyjno-

rozmnożeniową. Dla zobrazowania dystansów genetycznych badanych współczynników

zastosowano metodę grupowania elementów we względnie jednorodne klasy. Podstawą

grupowania jest podobieństwo pomiędzy elementami – wyrażone przy pomocy metryki

euklidesowej. Przy wyznaczaniu grup odmian podobnych zastosowano algorytm grupowania

hierarchicznego – algorytm tworzy dla zbioru obiektów hierarchię klasyfikacji, zaczynając od

takiego podziału, w którym każdy obiekt stanowi samodzielne skupienie, a kończąc na

podziale, w którym wszystkie obiekty należą do jednego skupienia. W procesie grupowania

wykorzystano metodę średnich. Chcąc porównać wyniki dla każdego współczynnika

wykorzystano test Mantela i wpółczynnik korelacji Spearmana.

Na podstawie przeprowadzonych analiz stwierdzono, że wyselekcjonowane startery

generowały polimorfizm, który pozwolił dobrać komponenty do krzyżowań w celu

testowania zdolności kombinacyjnych. Dendrogramy wykonane w oparciu o współczynniki

Nei i Ochai tworzą grupy podobieństwa w skład których wchodzą odmiany wraz ze swoimi

klonami najprecyzyjniej więc grupują formy pod względem pochodzenia. Współczynniki

Simple Matching, Hamman oraz Roger and Tanimoto tworzą grupy, które często zawierają

odmiany i klony nie należące do danej odmiany nie są więc użyteczne przy wyborze

komponentów do krzyżowań.

36

An application of the generalized linear models for an examination of the

phenotypic quality of roe deer

Joanna Ukalska1, Krzysztof Ukalski

1, Jakub Borkowski

2

1Biometry Division, Department of Econometrics and Statistics,

Warsaw University of Life Sciences 2Department of Forest Ecology

Forest Research Institute, Raszyn

The influence of forest environment (forest regeneration after a 1992 forest fire covered with

young stands (low quality deer habitat) and unburned forest of diversified stand age classes

(high quality deer habitat)) and climatic factors (the mean temperature and the total number

of days with snow cover in January and February) on roe deer antler asymmetry in two

age classes of roe deer males was studied. Data were collected by local hunters from 366 shot

males during 1998-2007. We applied 4 generalized linear models: Poisson model, Poisson

adjusted for overdispersion, negative binomial and negative binomial with log canonical link

function. Goodness-of-fit statistics were checked as well as residuals plots. There was

a significant difference in roe deer antler asymmetry incidence between age classes for both

considered habitats while weather conditions didn’t influence roe deer antler asymmetry.

37

Morfological analysis of inflorescence mutants in

alfalfa (Medicago sativa L.sl.) with the respect to seed yield traits

Dorota Weigt1, Alicja Szabelska

2, Joanna Zyprych

2, Idzi Siatkowski

2,

Zbigniew Broda1

1Department of Genetics and Plant Breeding



The research was performed based on lucerne plants (Medicago sativa L.) which belong to

three types of inflorescence’s mutation: mutant with a long peduncle inflorescence – lp,

mutant with a branched raceme inflorescence – br, and mutant with top flowering

inflorescence - tf. Radius variety, which has inflorescence typical for Medicago species, was

used as a control.

Material was analyzed with respect to 5 qualitative features that are main components

of the seed yield: number of raceme in the shoot, flower number per raceme, pod number per

raceme, seed number per pod, and number of embryos per ovary.

The results obtained from the biometric measurements constituted the starting material

for the statistical analysis. The graphical interpretation of the mutants of the features was

performed using the boxplots. To investigate the dissimilarities in the seed yield traits for

different forms of lucerne multivariave analysis of variance (MANOVA) was performed.

Taking into account the size of data and the number of forms the assumptions of normality

(using Shapiro-Wilks and Shapiro – Francia tests) and the assumption equality of variances

(using Bartlett and Levene tests) were verified. In MANOVA four parametric tests were used:

Hotelling-Lawley test, Pillai test, Wilks test, Roy test. In addition, nonparametric MANOVA

were performed using permutational test. Since the results of each test revealed differences in

the seed yiled traits, each feature was analyzed separately using parametric and nonparametric

ANOVA. Tukey’s test was used to investigate significant distinctions in the seed yield

between analyzed forms.

38

Comparisons of uniformity decisions based on Coyu

and Bennett’s methods – simulated data

Bogna Zawieja1, Wiesław Pilarczyk

1,2, Bogna Kowalczyk

2


Poznań University of Live Sciences 2The Research Centre for Cultivar Testing

Uniformity decisions concerning new varieties of plants are based both on quantitative

characteristics and on qualitative characteristics. Decision rules for qualitative characteristics

(usually “qualitative” is equivalent with “visually assessed”) are rather simple. Namely

for every new variety the number of non-typical plants in a fixed sample size is counted

and if it is larger than the threshold value (established by crop-experts), the variety is treated

as non-uniform. More complicated procedure is applied for quantitative characteristics.

Decisions are based on comparisons of standard deviation of candidate variety with average

value of standard deviations of so called reference varieties. A special procedure called

COYU (combined over years uniformity) was elaborated by member states of UPOV

(International Union for Protection of New Varieties of Plants) for this purpose, Talbot

(2000). The COYU method is – to some degree – an officially promoted method. But some

other methods are still under consideration. One of such methods uses the Bennett test

for coefficients of variation. The details of this new approach are given in paper by Zawieja

and Pilarczyk (2005, 2006, 2007) and by Zawieja, Pilarczyk and Kowalczyk (2009).

Some comparisons of uniformity decisions concerning winter wheat and oilseed rape varieties

based on COYU and Bennett’s test are also included in mentioned papers. During the annual

session of Technical Working Party on Automation and Computer Programs (held in

Alexandria, Virginia in June 2009) it was suggested to compare decisions on uniformity

of varieties using simulated data based on real measurements. So in the present paper

this problem is reconsidered using real data for oilseed varieties (reference set) and simulated

data (candidate varieties).

39

Gene’s selection based on statistical tests

Joanna Zyprych, Alicja Szabelska, Idzi Siatkowski



The technology of microarrays allows the investigation of thousands and millions of genes

at the same time. It enables to indicate the information about expression profile of genes.

Statistical analysis is widely used in searching over- and under expressed genes. Apparently,

there exist many statistical tests that verify the assumed hypothesizes. The classic example

of such tests is a group of tests verifying the equality of means of expression levels.

Researcher can often be mystified about the choice of the most appropriate test

in his investigation. Presented poster constitutes assistance in solving this problem.

Firstly, within the group of tests verifying the equality of means - Brown-Forsythe test,

F-ANOVA test and Kruskall-Wallis test - the analysis of efficiency of these tests

was performed with respect to classification of the differential genes. Secondly, the analogous

analysis was undergone for tests concerning the equality of variances, i.e. Bartlett test,

Fligner – Killeen test and Levene test. Thirdly, based on the previously selected genes

as the training set the prediction of the chosen sample with remaining genes was tested

applying several methods of machine learning techniques. As the results of this analysis

we present values of misclassified genes. With this approach we can determine the most

differential genes. The aim is the comparison of several statistical tests and review

of the usefulness of these tests in the selection of genes from microarray experiments.

All the computations were performed with usage of R platform, version 2.10.0.

The 40th International Biometrical Colloquium and Second...

Documents

Transcript of The 40th International Biometrical Colloquium and Second...