Author: DIEGO JARQUIN
JUAN DIEGO HERNANDEZ JARQUIN (2012)
Tesis (Doctorado en Ciencias, especialista en Estadística).- Colegio de Postgraduados, 2012.
El análisis de tablas de doble entrada es una herramienta estadística que se presenta en diversos campos de investigación; por ejemplo, en fitomejoramiento uno de los principales objetivos es evaluar la adaptabilidad y estabilidad genotípica en la selección de los padres para el siguiente ciclo de mejoramiento. Generalmente, este proceso se ve afectado por la presencia de la interacción Genotipo x Ambiente (GE). Bajo el enfoque clásico, para el estudio de la interacción se consideran modelos parsimoniosos como el AMMI ó el SREG y se obtienen estimaciones puntuales mediante Mínimos Cuadrados Ordinarios (MCO) por lo que no es trivial la construcción de intervalos de confianza y el diseño de pruebas de hipótesis. En este trabajo se propone una modelación bayesiana de los modelos lineales-bilineales que ofrece la ventaja de incorporar información a priori, con este enfoque se obtienen estimaciones puntuales encogidas de los eigenvalores. Por otro lado, una vez que se obtiene la distribución a posteriori es posible el cálculo de regiones bivariadas de alta probabilidad a posteriori (HPD) y de regiones de credibilidad para los parámetros scores; también es factible el diseño de pruebas de hipótesis bayesianas, a través de los factores Bayes, sobre el número de términos bilineales que debe contener el modelo. Para las matrices singulares derivadas de la descomposición en valores singulares de la matriz de interacción se propone como distribución a priori la distribución von Mises Fisher vectorial. La organización de este trabajo se divide en tres Capítulos. En el Capitulo 1 se propone el modelo AMMI bayesiano haciendo uso de distribuciones a priori no informativas; en el Capítulo 2 se plantea una formulación matricial del modelo AMMI bayesiano que ofrece la ventaja de incorporar información a priori sobre la interacción por medio de una matriz de medias a priori; el Capitulo 3 desarrolla un modelo jerárquico bayesiano cuya principal ventaja es el incorporar información de una serie de experimentos. _______________ BAYESIAN ANALYSIS OF LINEAR - BILINEAR MODELS. ABSTRACT: The two-way table analysis is a useful tool that arises in many fields of research; for example in plant breeding the main purpose is to asses genotypic adaptability and stability that will allow make an accurate selection of parents for the next breeding cycle. The presence of Genotype x Environmental Interaction (GE) complicates this process. Generally, the study of this interaction has been conducted using the least square method in parsimonious models, as the AMMI model and the SREG model, yielding punctual estimates. For this reason, is not trivial the construction of confidence intervals neither the design of hypothesis testing. This research proposed a bayesian modelation of the linear-bilinear models which offers advantages as incorporate prior information; this approach yields shrinkage estimates of the eigenvalues. By the other hand, the posterior distribution allows obtain bivariate highest posterior density (HPD) regions and credible intervals for the score parameters, design of bayesian hypothesis testing for determinate the number of components to be retained in the model through the use of the Bayes factor. For the singular matrices resulting from the singular value decomposition of the residual matrix the vectorial von Mises Fisher distribution is proposed as prior distribution. The structure of this document is as follows: the Chapter 1 shows the Bayesian model using noninformative priors; the Chapter 2 formulate a matrix notation of the Bayes AMMI, here is possible incorporate prior information about interaction parameters through a prior matrix of means; in the Chapter 3 a hierarchical Bayesian model is proposed, this model offers as principal advantage the incorporation of several data sources in the analysis.
Distribución von Mises-Fisher Inferencia bayesiana Fitomejoramiento Tablas de doble entrada con interacción Términos bilineales de interacción Bayesian inference Bilinear interaction terms Plant breeding Two-way tables with interaction Von mises fisher distribution Estadística Doctorado CIENCIAS SOCIALES
In this study, we evaluated genome-based prediction using 35,403 wheat lines from the Global Wheat Breeding Program of the International Maize and Wheat Improvement Center (CIMMYT). We implemented eight statistical models that included genome-wide molecular marker and pedigree information in two different validation schemes. All models included main effects, and others also considered interactions between the different types of covariates via Hadamard products of similarity structures. The pedigree models always gave better results predicting new lines in observed environments than the genome-based models when only main effects were fitted. However, for all traits, the highest predictive abilities were obtained when interactions between pedigree, markers and environments were included. When new lines were predicted in unobserved environments in almost all trait/year combinations, the marker main-effects model was the best. These results provide strong evidence that the different sources of genetic information (molecular markers and pedigree) are not equally useful at different stages of the breeding pipelines, and can be employed differentially to improve the design of future breeding programs.
Statistical analyses of two-way tables with interaction arise in many different fields of research. This study proposes the von Mises-Fisher distribution as a prior on the set of orthogonal matrices in a linear-bilinear model for studying and interpreting interaction in a two-way table. Simulated and empirical plant breeding data were used for illustration; the empirical data consist of a multi-environment trial established in two consecutive years. For the simulated data, vague but proper prior distributions were used, and for the real plant breeding data, observations from the first year were used to elicit a prior for parameters of the model for data of the second year trial. Bivariate Highest Posterior Density (HPD) regions for the posterior scores are shown in the biplots, and the significance of the bilinear terms was tested using the Bayes factor. Results of the plant breeding trials show the usefulness of this general Bayesian approach for breeding trials and for detecting groups of genotypes and environments that cause significant genotype × environment interaction. The present Bayes inference methodology is general and may be extended to other linear-bilinear models by fixing certain parameters equal to zero and relaxing some model constraints.
This study contains spring wheat yield data (1st, 2nd, and 3rd WYCYTs and 1st, 2nd, 3rd and 4th SATYNs) from 136 international environments that were used to evaluate the predictive ability of different models in diverse environments by modeling G×E using the pedigree-derived additive relationship matrix (A matrix).
Genomic prediction studies incorporating genotype × environment (G×E) interaction effects are limited in durum wheat. We tested the genomic-enabled prediction accuracy (PA) of Genomic Best Linear Unbiased Predictor (GBLUP) models?six non-G × E and three G × E models?on three basic cross-validation (CV) schemes? in predicting incomplete field trials (CV2), new lines (CV1), and lines in untested environments (CV0)? in a durum wheat panel grown under yield potential, drought stress, and heat stress conditions. For CV0, three scenarios were considered: (i) leave-one environment out (CV0-Env); (ii) leave one site out (CV0-Site); and (iii) leave 1 yr out (CV0-Year). The reaction norm models with G × E effects showed higher PA than the non-G × E models. Among the CV schemes, CV2 and CV0-Env had higher PA (0.58 each) than the CV1 scheme (0.35). When the average of all the models and CV schemes were considered, among the eight traits? grain yield, thousand grain weight, grain number, days to anthesis, days to maturity, plant height, and normalized difference vegetation index at vegetative (NDVIvg) and grain filling (NDVIllg)?, plant height had the highest PA (0.68) and moderate values were observed for grain yield (0.34). The results indicated that genomic selection models incorporating G × E interaction show great promise for forward prediction and application in durum wheat breeding to increase genetic gains.
Genotype × environment (G × E) interaction can be studied through multienvironment trials used to select wheat (Triticum aestivum L.) lines. We used spring wheat yield data from 136 international environments to evaluate the predictive ability (PA) of different models in diverse environments by modeling G × E using the pedigree-derived additive relationship matrix (A matrix). These analyses focused on 109 wheat lines from three Wheat Yield Collaboration Yield Trials (WYCYTs) and 168 lines from four Stress Adapted Trait Yield Nurseries (SATYNs) developed by CIMMYT for yield potential conditions and stress conditions, respectively. The main objectives of this study were to use various pedigree-based reaction norm models to predict sites included in each of the three WYCYT nurseries and each of the four SATYN nurseries (individual population) and to predict environments (site-year combinations) when combining the three WYCYT and four SATYN trials (combined population). Results of the PA for the individual- and combined-population analyses indicated that best predictive Model 6 (E + L + A + AE + e) always included the G × E denoted as the interaction between the A matrix and environments. The most predictable sites in WYCYTs were Iran DZ (Dezful) and Pak I (Islamabad), whereas the most predictable sites in SATYNs were India I (Indore), Iran DZ, and Mex CM (Cd. Obregon). Heritability was correlated with PA for individual-population prediction analyses, but not for combined-population prediction analyses. Our results indicate pedigree-based reaction norm models with G × E can be useful for predicting the performance of lines and selecting good predictable key sites (or environments) to reduce phenotyping costs.
In agriculture and plant breeding, multienvironment trials over multiple years are conducted to evaluate and predict genotypic performance under different environmental conditions and to analyze, study, and interpret genotype´ environment interaction (g x e). In this study, we propose a hierarchical bayesian formulation of a linear–bilinear model, where the conditional conjugate prior for the bilinear (multiplicative) g x e term is the matrix von mises–fisher (mvmf) distribution (with environments and sites defined as synonymous). A hierarchical normal structure is assumed for linear effects of sites, and priors for precision parameters are assumed to follow gamma distributions. Bivariate highest posterior density (hpd) regions for the posterior multiplicative components of the interaction are shown within the usual biplots. Simulated and real maize (zea mays l.) breeding multisite data sets were analyzed. Results showed that the proposed model facilitates identifying groups of genotypes and sites that cause g ´ e across years and within years, since the hierarchical bayesian structure allows using plant breeding data from different years by borrowing information among them. This model offers the researcher valuable information about g x e patterns not only for each 1-yr period of the breeding trials but also for the general process that originates the response across these periods.
In most agricultural crops the effects of genes on traits are modulated by environmental conditions, leading to genetic by environmental interaction (g × e). Modern genotyping technologies allow characterizing genomes in great detail and modern information systems can generate large volumes of environmental data. In principle, g × e can be accounted for using interactions between markers and environmental covariates (ecs). However, when genotypic and environmental information is high dimensional, modeling all possible interactions explicitly becomes infeasible. In this article we show how to model interactions between high-dimensional sets of markers and ecs using covariance functions. The model presented here consists of (random) reaction norm where the genetic and environmental gradients are described as linear functions of markers and of ecs, respectively. We assessed the proposed method using data from arvalis, consisting of 139 wheat lines genotyped with 2,395 snps and evaluated for grain yield over 8 years and various locations within northern france. A total of 68 ecs, defined based on five phases of the phenology of the crop, were used in the analysis. Interaction terms accounted for a sizable proportion (16 %) of the within-environment yield variance, and the prediction accuracy of models including interaction terms was substantially higher (17–34 %) than that of models based on main effects only. Breeding for target environmental conditions has become a central priority of most breeding programs. Methods, like the one presented here, that can capitalize upon the wealth of genomic and environmental information available, will become increasingly important.
Genomic selection (GS) by selecting lines prior to field phenotyping using genotyping data has the potential to enhance the rate of genetic gains. Genotype × environment (G × E) interaction inclusion in GS models can improve prediction accuracy hence aid in selection of lines across target environments. Phenotypic data on 320 chickpea breeding lines for eight traits for three seasons at two locations were recorded. These lines were genotyped using DArTseq (1.6 K SNPs) and Genotyping-by-Sequencing (GBS; 89 K SNPs). Thirteen models were fitted including main effects of environment and lines, markers, and/or naïve and informed interactions to estimate prediction accuracies. Three cross-validation schemes mimicking real scenarios that breeders might encounter in the fields were considered to assess prediction accuracy of the models (CV2: incomplete field trials or sparse testing; CV1: newly developed lines; and CV0: untested environments). Maximum prediction accuracies for different traits and different models were observed with CV2. DArTseq performed better than GBS and the combined genotyping set (DArTseq and GBS) regardless of the cross validation scheme with most of the main effect marker and interaction models. Improvement of GS models and application of various genotyping platforms are key factors for obtaining accurate and precise prediction accuracies, leading to more precise selection of candidates.
Developing genomic selection (GS) models is an important step in applying GS to accelerate the rate of genetic gain in grain yield in plant breeding. In this study, seven genomic prediction models under two cross-validation (CV) scenarios were tested on 287 advanced elite spring wheat lines phenotyped for grain yield (GY), thousand-grain weight (GW), grain number (GN), and thermal time for flowering (TTF) in 18 international environments (year-location combinations) in major wheat-producing countries in 2010 and 2011. Prediction models with genomic and pedigree information included main effects and interaction with environments. Two random CV schemes were applied to predict a subset of lines that were not observed in any of the 18 environments (CV1), and a subset of lines that were not observed in a set of the environments, but were observed in other environments (CV2). Genomic prediction models, including genotype · environment (G·E) interaction, had the highest average prediction ability under the CV1 scenario for GY (0.31), GN (0.32), GW (0.45), and TTF (0.27). For CV2, the average prediction ability of the model including the interaction terms was generally high for GY (0.38), GN (0.43), GW (0.63), and TTF (0.53). Wheat lines in siteyear combinations in Mexico and India had relatively high prediction ability for GY and GW. Results indicated that prediction ability of lines not observed in certain environments could be relatively high for genomic selection when predicting G·E interaction in multi-environment trials.