Cluster Analysis Using Quantitative, Qualitative and Molecular

4
159 Cluster Analysis Using Quantitative, Qualitative and Molecular Traits for the Study of the Genetic Diversity in Pineapple Genotypes C. de Fatima Machado a , F.V.D. Souza, J.R.S. Cabral, C.A. da Silva Ledo, A.P. de Matos and R. Ritzinger Embrapa Mandioca e Fruticultura Tropical CP 07, Cruz das Almas Bahia, 44.380-000 Brazil Keywords: Ananas spp., multivariate analysis, Gower algorithm, genetic distance, variability Abstract Cluster analysis using quantitative, qualitative and molecular variables is a useful tool in estimating genetic diversity between genotypes in germplasm collections. The objective of this study was to carry out a simultaneous analysis of quantitative, qualitative and molecular variables followed by clustering to study the genetic diversity between pineapple genotypes using the Gower algorithm. Eleven quantitative, five qualitative and forty three molecular characteristics in ninety pineapple genotypes were evaluated. The cophenetic correlation coefficient of the joint analysis was higher when compared with the individual analysis coefficients. Ten groups were observed which indicates higher variability among genotypes evaluated. The simultaneous analysis of the quantitative, qualitative and molecular variables was efficient in the expression of the genetic diversity between pineapple genotypes when compared to the individual analysis for each type of variable. INTRODUCTION The characterization of germplasm can be accomplished through the use of phenotypic traits (morphologic and agronomic descriptors), passport data and molecular markers. The characterization of the accessions is important to determine the genetic variability, to identify duplicated accessions and to establish nuclear collections. The use of multivariate techniques is one of the factors that has urged the increase in the studies about genetic diversity among genotypes (Gonçalves et al., 2008, 2009; Mohammadi and Prasanna, 2003; Podany and Schmera, 2006; Sudré et al., 2007). Cluster analysis of those traits are done individually, because the genetic distances are calculated according to the type of the trait. It has been reported by Cruz (2008) that the procedures to estimate dissimilarity measures can be based on quantitative traits (Euclidean distances or distances of Mahalanobis), binary traits (Jaccard index, Nei and Li index) and multicategorical traits (Cole-Rodgers et al., 1997). In that sense, several discrepancies are observed in relation to the clusters and to the inferences in relation to the quantification of the variability among accessions of a germplasm bank. A technique that allows the combined analysis of quantitative and qualitative data was proposed by Gower (1971). The objective of this effort was to accomplish the characterization of pineapple accessions of the Active Germplasm Bank of Embrapa Cassava and Tropical Fruits through the combined analysis of quantitative, qualitative and molecular data. MATERIALS AND METHODS Ninety pineapple accessions of the Active Germplasm Bank of Embrapa Cassava and Tropical Fruits were evaluated. Each accession was evaluated with regard to 11 quantitative traits (fruit weight, crown weight, fruit length and diameter, crown length, axis diameter, soluble solids and acidity of the pulp, plant height, stalk length and a [email protected] Proc. 7 th International Pineapple Symposium Eds.: H. Abdullah et al. Acta Hort. 902, ISHS 2011

description

Cluster Analysis Using Quantitative, Qualitative and Molecular

Transcript of Cluster Analysis Using Quantitative, Qualitative and Molecular

  • 159

    Cluster Analysis Using Quantitative, Qualitative and Molecular Traits for the Study of the Genetic Diversity in Pineapple Genotypes C. de Fatima Machadoa, F.V.D. Souza, J.R.S. Cabral, C.A. da Silva Ledo, A.P. de Matos and R. Ritzinger Embrapa Mandioca e Fruticultura Tropical CP 07, Cruz das Almas Bahia, 44.380-000 Brazil Keywords: Ananas spp., multivariate analysis, Gower algorithm, genetic distance,

    variability Abstract

    Cluster analysis using quantitative, qualitative and molecular variables is a useful tool in estimating genetic diversity between genotypes in germplasm collections. The objective of this study was to carry out a simultaneous analysis of quantitative, qualitative and molecular variables followed by clustering to study the genetic diversity between pineapple genotypes using the Gower algorithm. Eleven quantitative, five qualitative and forty three molecular characteristics in ninety pineapple genotypes were evaluated. The cophenetic correlation coefficient of the joint analysis was higher when compared with the individual analysis coefficients. Ten groups were observed which indicates higher variability among genotypes evaluated. The simultaneous analysis of the quantitative, qualitative and molecular variables was efficient in the expression of the genetic diversity between pineapple genotypes when compared to the individual analysis for each type of variable.

    INTRODUCTION

    The characterization of germplasm can be accomplished through the use of phenotypic traits (morphologic and agronomic descriptors), passport data and molecular markers. The characterization of the accessions is important to determine the genetic variability, to identify duplicated accessions and to establish nuclear collections. The use of multivariate techniques is one of the factors that has urged the increase in the studies about genetic diversity among genotypes (Gonalves et al., 2008, 2009; Mohammadi and Prasanna, 2003; Podany and Schmera, 2006; Sudr et al., 2007).

    Cluster analysis of those traits are done individually, because the genetic distances are calculated according to the type of the trait. It has been reported by Cruz (2008) that the procedures to estimate dissimilarity measures can be based on quantitative traits (Euclidean distances or distances of Mahalanobis), binary traits (Jaccard index, Nei and Li index) and multicategorical traits (Cole-Rodgers et al., 1997). In that sense, several discrepancies are observed in relation to the clusters and to the inferences in relation to the quantification of the variability among accessions of a germplasm bank. A technique that allows the combined analysis of quantitative and qualitative data was proposed by Gower (1971).

    The objective of this effort was to accomplish the characterization of pineapple accessions of the Active Germplasm Bank of Embrapa Cassava and Tropical Fruits through the combined analysis of quantitative, qualitative and molecular data.

    MATERIALS AND METHODS

    Ninety pineapple accessions of the Active Germplasm Bank of Embrapa Cassava and Tropical Fruits were evaluated. Each accession was evaluated with regard to 11 quantitative traits (fruit weight, crown weight, fruit length and diameter, crown length, axis diameter, soluble solids and acidity of the pulp, plant height, stalk length and a [email protected]

    Proc. 7th International Pineapple Symposium Eds.: H. Abdullah et al. Acta Hort. 902, ISHS 2011

  • 160

    diameter), 5 qualitative traits (leaf color, presence of spines along the borders of the leaves, fruit shape, colors of the fruit and the pulp) and 43 RAPD (Random Amplified Polymorphic - DNA) molecular markers.

    A combined analysis of the quantitative, qualitative and molecular data was accomplished for determination of the genetic distance, based on the algorithm described by Gower (1971). The hierarchical clusters of the accessions were obtained by the UPGMA - Unweighted Pair-Group Method With Arithmetic Averages (Sneath and Sokal, 1973). The validation of the clusters was determined by the cophenetic correlation coefficient (Sokal and Rohlf, 1962) and its significance was calculated by the test of Mantel with 10,000 permutations (Mantel, 1967).

    A statistical software was used (R Development Core Team, 2006) for the calculation of the algorithm of Gower (Gower, 1971). The cophenetic correlation coefficient was calculated using the Genes software (Cruz, 2008) and the dendrogram was generated based on the matrix distance using the software Statistica (Statsoft, 2005).

    RESULTS AND DISCUSSION

    The dendrogram of dissimilarity obtained through the algorithm of Gower (Gower, 1971) in the evaluation of 11 quantitative traits, 5 qualitative traits and 43 molecular markers is represented in Figure 1. The dissimilarity matrix mean calculated from the UPGMA method (0.15) provided the formation of 10 groups. On Table 1 the listed accessions belong to each one of the 10 formed groups.

    The cophenetic correlation coefficient (CCC) was 0.92** (significant by the test of Mantel with 10,000 permutations). The high CCC value indicates high correlation between the original distance matrices and the cluster matrix; Mohammadi and Prasanna (2003), Podani and Schmera (2006) and Gonalves et al. (2008) came to the same conclusion.

    The analysis through the algorithm of Gower (Gower, 1971) was efficient in expressing the degree of genetic diversity among the pineapple accessions evaluated, demonstrating that the combined analysis of quantitative and qualitative data with the molecular markers provides a greater efficiency in the knowledge of the genetic diversity. According to Gonalves et al. (2009), the type and number of the variables chosen can jeopardize the efficiency of the combined analysis, especially in the case of using a large number of data from molecular markers, in the quantification of the genetic diversity.

    CONCLUSION

    The pineapple accessions of the Active Germplasm Bank of Embrapa Cassava and Tropical Fruits show high genetic variability based on quantitative, qualitative and molecular traits.

    Literature Cited Cole-Rodgers, P., Smith, D.W. and Bosland, P.W. 1977. A novel statistical approach to

    analyze genetic resource evaluations using Capsicum as an example. Crop Sci. 37:1000-1002.

    Cruz, C.D. 2008. Programa genes (verso Windows): aplicativo computacional em gentica e estatstica. Viosa: UFV.

    Gonalves, L.S, Rodrigues, R., Amaral, A.T.Jr., Karasawa, M. and Sudr, C.P. 2009. Heirloom tomato gene bank: assessing genetic divergence based on morphological, agronomic and molecular data using a Ward-modified location model. Genet. Mol. Res. 8:364-374.

    Gonalves, L.S.A., Rodrigues, R. and Amaral Jnior, A.T. 2008. Comparison of multivariate statical algorithms to cluster tomato heirloom accessions. Genetics and Molecular Research 7:1289-1297.

    Gower, J.C. 1971. A general coefficient of similarity and some of its properties. Biometrics, Arlington 27:857-874.

    Mantel, N. 1967. The detection of disease clustering and generalized regression approach.

  • 161

    Cancer Research, Birmingham 27:209-220. Mohammadi, S.A. and Prasanna, B.M. 2003. Analysis of genetic diversity in crop plants -

    salient statistical tools and considerations. Crop Sci. 43:1235-1248. Podani, J. and Schmera, D. 2006. On dendrogram-based measures of functional diversity.

    Oikos 115:179-185. R Development Core Team. 2006. A language and environment for statistical computing.

    Vienna: R Foundation for Statistical Computing. Sneath, P.H. and Sokal, R.R. 1973. Numerical taxonomy: the principles and practice of

    numerical classification. San Francisco: W.H. Freeman. 573p. Sokal, R.R. and Rohlf, F.J. 1962. The comparison of dendrograms by objective methods.

    Taxon. 11:33-40. Statsoft, Inc. 2005. Statistica for Windows (data analysis software system), version 7.1.

    Statsoft, Tulsa, Oklahoma (USA). Sudr, C.P., Leonardez, E., Rodrigues, R., Amaral Jnior, A.T. et al. 2007. Genetic

    resources of vegetable crops: a survey in the Brazilian germplasm collections pictured through papers published in the journals of the Brazilian Society for Horticultural Science. Hortic. Bras. 25:496-503.

    Tables Table 1. Groups and accessions within groups formed in the cluster analysis obtained by

    the algorithm of Gower in the evaluation of 11 quantitative traits, 5 qualitative traits and 43 molecular markers.

    Groups Accessions 1 FRF-632 2 Guiana 3 Perolera 4 RBR-1, SNG-3 5 SNG-2 (Quinari) 6 FRF-678, FRF-762, FRF-1399, Pao de Acucar 7 LBB-1384, LBB-1374 8 LBB-1385 9 Hbrido-3607, Natal Queen, Puerto Rico I-67, Smooth Cayenne,

    Smooth Cayenne 3, Smooth Cayenne 2 10 AltoTuri, AUST-2480, BGA-25, Boituva, Champaka F-153, Codazzi.

    Comum, Fazenda Barreiro, Fernando Costa, FFR-1200, FRF-11, FRF-1202, FRF-1208, FRF-1220, FRF-1221, FRF-1226, FRF-1358, FRF-1369,

    FRF-1396, FRF-1403, FRF-150, FRF-154, FRF-156, FRF-160, FRF-162, FRF-168, FRF-235, FRF-250, FRF-297, FRF-351, FRF-585, FRF-609, FRF-634, FRF-640, FRF-652, FRF-680, FRF-684, FRF-7, FRF-733,

    FRF-737, FRF-770, FRF-8, FRF-800, FRF-820, FRF-845, G-79, G-80, Inerme de Rondnia, Jupi, Jupi 2, Jupi-TO, LBB-1383, LBB-1386,

    LBB-1396, LBB-1413, LBB-1444, LBB-1450, LBB-569, LBB-612, LocaldeTef, MD-2, Muito Rstico, Prola, Primavera, Rondon 2, Rondon 3,

    Roxo deTef, Semi-Selvagem, SNG-1, SNG-4, TD-240

  • 162

    Figures

    Fig. 1. Dendrogram of dissimilarity of 90 pineapple accessions obtained through the

    algorithm of Gower in the evaluation of 11 quantitative traits, 5 qualitative traits and 43 molecular markers.

    FR

    F-63

    2G

    uian

    aP

    erol

    era

    LBB

    -569

    RB

    R-1

    SN

    G-3

    SN

    G-2

    (Q

    uina

    ri)F

    RF-

    678

    FR

    F-76

    2F

    RF-

    1399

    Pao

    de

    Acu

    car

    LBB

    -138

    4LB

    B-1

    374

    LBB

    -138

    5H

    brid

    o-36

    07N

    atal

    Que

    enP

    uert

    o R

    ico

    I-67

    Sm

    ooth

    Cay

    enne

    2S

    moo

    th C

    ayen

    ne 3

    Sm

    ooth

    Cay

    enne

    Jupi

    -TO

    FR

    F-84

    5LB

    B-1

    386

    FR

    F-12

    21F

    azen

    da B

    arre

    iroA

    UST

    -248

    0In

    erm

    e de

    Ron

    dnia

    TD

    -240

    LBB

    -138

    3F

    RF-

    770

    Rox

    o de

    Tefe

    Sem

    i-Sel

    vage

    mM

    uito

    Rs

    tico

    FR

    F-80

    0F

    RF-

    1220

    LBB

    -141

    3F

    FR-

    1200

    BG

    A-2

    5F

    RF-

    737

    G-7

    9LB

    B-1

    450

    Fer

    nand

    o C

    osta

    Alto

    Tur

    iP

    rimav

    era

    LBB

    -144

    4R

    ondo

    n 3

    Ron

    don

    2F

    RF-

    820

    FR

    F-68

    4Lo

    cald

    eTef

    SN

    G-4

    FR

    F-13

    96C

    ham

    paka

    F-1

    53M

    D-2

    FR

    F-14

    03C

    odaz

    ziF

    RF-

    351

    G-8

    0B

    oitu

    vaLB

    B-6

    12F

    RF-

    235

    SN

    G-1

    FR

    F-7

    FR

    F-15

    6F

    RF-

    634

    FR

    F-16

    0F

    RF-

    250

    FR

    F-58

    5F

    RF-

    652

    FR

    F-64

    0F

    RF-

    168

    FR

    F-15

    4F

    RF-

    162

    FR

    F-15

    0LB

    B-1

    396

    FR

    F-60

    9C

    omum

    Jupi

    2F

    RF-

    1202

    FR

    F-13

    58F

    RF-

    1369

    FR

    F-73

    3F

    RF-

    680

    Jupi

    FR

    F-12

    08F

    RF-

    1226

    FR

    F-11

    FR

    F-29

    7F

    RF-

    8P

    rol

    a

    0.0

    0.1

    0.2

    0.3

    0.4

    Link

    age

    Dis

    tanc

    e