Author: Jose Crossa

A general Bayesian estimation method of linear-bilinear models applied to plant breeding trials with genotype × environment interaction

Jose Crossa (2012)

Statistical analyses of two-way tables with interaction arise in many different fields of research. This study proposes the von Mises?Fisher distribution as a prior on the set of orthogonal matrices in a linear?bilinear model for studying and interpreting interaction in a two-way table. Simulated and empirical plant breeding data were used for illustration; the empirical data consist of a multi-environment trial established in two consecutive years. For the simulated data, vague but proper prior distributions were used, and for the real plant breeding data, observations from the first year were used to elicit a prior for parameters of the model for data of the second year trial. Bivariate Highest Posterior Density (HPD) regions for the posterior scores are shown in the biplots, and the significance of the bilinear terms was tested using the Bayes factor. Results of the plant breeding trials show the usefulness of this general Bayesian approach for breeding trials and for detecting groups of genotypes and environments that cause significant genotype × environment interaction. The present Bayes inference methodology is general and may be extended to other linear?bilinear models by fixing certain parameters equal to zero and relaxing some model constraints.


CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA Bayesian inference Bilinear interaction terms Two-way table with interaction von Mises-Fisher

Canopy temperature and vegetation indices from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat

Jose Crossa (2017)

Genomic selection can be applied prior to phenotyping, enabling shorter breeding cycles and greater rates of genetic gain relative to phenotypic selection. Traits measured using high-throughput phenotyping based on proximal or remote sensing could be useful for improving pedigree and genomic prediction model accuracies for traits not yet possible to phenotype directly. We tested if using aerial measurements of canopy temperature, and green and red normalized difference vegetation index as secondary traits in pedigree and genomic best linear unbiased prediction models could increase accuracy for grain yield in wheat, Triticum aestivum L., using 557 lines in five environments. Secondary traits on training and test sets, and grain yield on the training set were modeled as multivariate, and compared to univariate models with grain yield on the training set only. Cross validation accuracies were estimated within and across-environment, with and without replication, and with and without correcting for days to heading. We observed that, within environment, with unreplicated secondary trait data, and without correcting for days to heading, secondary traits increased accuracies for grain yield by 56% in pedigree, and 70% in genomic prediction models, on average. Secondary traits increased accuracy slightly more when replicated, and considerably less when models corrected for days to heading. In across-environment prediction, trends were similar but less consistent. These results show that secondary traits measured in high-throughput could be used in pedigree and genomic prediction to improve accuracy. This approach could improve selection in wheat during early stages if validated in early-generation breeding plots.



META: A suite of sas programs to analyze multienvironment breeding trials

Jose Crossa (2013)

Multi-environment trials (METs) enable the evaluation of the same genotypes in a variety of environments and management conditions. We present here META (Multi Environment Trial Analysis), a suite of 31 SAS programs that analyze METs with complete or incomplete block designs, with or without adjustment by a covariate. The entire program is run through a graphical user interface. The program can produce boxplots or histograms for all traits, as well as univariate statistics. It also calculates Best Linear Unbiased Estimators (BLUEs) and Best Linear Unbiased Predictors (BLUPs) for the main response variable and BLUEs for all other traits. For all traits it calculates variance components by Restricted Maximum Likelihood (REML), Least Significant Differences (LSD), Coefficient of Variation (CV), and broad-sense heritability using PROC MIXED. The program can analyze each location separately, combine the analysis by management conditions, or combine all locations. The flexibility and simplicity of use of this program makes it a valuable tool for the analysis of METs in breeding and agronomy. The META program can be used by researcher knowing few principles of SAS.



Expectation and variance of the estimator of the maximized selection response of linear selection indices with normal distribution

Jose Crossa (2020)

Key message The expectation and variance of the estimator of the maximized index selection response allow the breeders to construct confidence intervals and to complete the analysis of a selection process. The maximized selection response and the correlation of the linear selection index (LSI) with the net genetic merit are the main criterion to compare the efficiency of any LSI. The estimator of the maximized selection response is the square root of the variance of the estimated LSI values multiplied by the selection intensity. The expectation and variance of this estimator allow the breeder to construct confidence intervals and determine the appropriate sample size to complete the analysis of a selection process. Assuming that the estimated LSI values have normal distribution, we obtained those two parameters as follows. First, with the Fourier transform, we found the distribution of the variance of the estimated LSI values, which was a Gamma distribution; therefore, the expectation and variance of this distribution were the expectation and variance of the variance of the estimated LSI values. Second, with these results, we obtained the expectation and the variance of the estimator of the selection response using the Delta method. We validated the theoretical results in the phenotypic selection context using real and simulated dataset. With the simulated dataset, we compared the LSI efficiency when the genotypic covariance matrix is knownversuswhen this matrix is estimated; the differences were not significant. We concluded that our results are valid for any LSI with normal distribution and that the method described in this work is useful for finding the expectation and variance of the estimator of any LSI response in the phenotypic or genomic selection context.



Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat

Jose Crossa (2012)

In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.