Author: Gustavo de los Campos

Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R

Paulino Pérez-Rodríguez Gustavo de los Campos Jose Crossa (2010)

The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression) implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO) in a unifi ed framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyperparameters, are also addressed.

Article

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

A reaction norm model for genomic selection using high-dimensional genomic and environmental data

DIEGO JARQUIN Jose Crossa Paulino Pérez-Rodríguez Mario Calus Juan Burgueño Gustavo de los Campos (2013)

In most agricultural crops the effects of genes on traits are modulated by environmental conditions, leading to genetic by environmental interaction (g × e). Modern genotyping technologies allow characterizing genomes in great detail and modern information systems can generate large volumes of environmental data. In principle, g × e can be accounted for using interactions between markers and environmental covariates (ecs). However, when genotypic and environmental information is high dimensional, modeling all possible interactions explicitly becomes infeasible. In this article we show how to model interactions between high-dimensional sets of markers and ecs using covariance functions. The model presented here consists of (random) reaction norm where the genetic and environmental gradients are described as linear functions of markers and of ecs, respectively. We assessed the proposed method using data from arvalis, consisting of 139 wheat lines genotyped with 2,395 snps and evaluated for grain yield over 8 years and various locations within northern france. A total of 68 ecs, defined based on five phases of the phenology of the crop, were used in the analysis. Interaction terms accounted for a sizable proportion (16 %) of the within-environment yield variance, and the prediction accuracy of models including interaction terms was substantially higher (17–34 %) than that of models based on main effects only. Breeding for target environmental conditions has become a central priority of most breeding programs. Methods, like the one presented here, that can capitalize upon the wealth of genomic and environmental information available, will become increasingly important.

Article

Prediction Accuracy Covariance Function Covariance Structure Prediction Problem Multiplicative Operator CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Single-step genomic and pedigree genotype x environment interaction models for predicting wheat lines in international environments

Paulino Pérez-Rodríguez Jose Crossa Jessica Rutkoski Ravi Singh Gustavo de los Campos Juan Burgueño Susanne Dreisigacker (2017)

Genomic prediction models have been commonly used in plant breeding but only in reduced datasets comprising a few hundred genotyped individuals. However, pedigree information for an entire breeding population is frequently available, as are historical data on the performance of a large number of selection candidates. The single-step method extends the genomic relationship information from genotyped individuals to pedigree information from a larger number of phenotyped individuals in order to combine relationship information on all members of the breeding population. Furthermore, genomic prediction models that incorporate genotype × environment interactions (G × E) have produced substantial increases in prediction accuracy compared with single-environment genomic prediction models. Our main objective was to show how to use single-step genomic and pedigree models to assess the prediction accuracy of 58,798 CIMMYT wheat (Triticum aestivum L.) lines evaluated in several simulated environments in Ciudad Obregon, Mexico, and to predict the grain yield performance of some of them in several sites in South Asia (India, Pakistan, and Bangladesh) using a reaction norm model that incorporated G × E. Another objective was to describe the statistical and computational challenges encountered when developing the pedigree and single-step models in such large datasets. Results indicate that the genomic prediction accuracy achieved by models using pedigree only, markers only, or both pedigree and markers to predict various environments in India, Pakistan, and Bangladesh is higher (0.25–0.38) than prediction accuracy of models that use only phenotypic prediction (0.20) or do not include the G × E term.

Article

Genomics Breeding methods Genetic improvement Wheats CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Increased prediction accuracy in wheat breeding trials using a marker x environment interaction Genomic Selection model

Jesse Poland Jean-Luc Jannink Gustavo de los Campos Jose Crossa Ravi Singh Susanne Dreisigacker (2015)

Genomic selection (GS) models use genome-wide genetic information to predict genetic values of candidates of selection. Originally, these models were developed without considering genotype · environment interaction( G·E). Several authors have proposed extensions of the single-environment GS model that accommodate G·E using either covariance functions or environmental covariates. In this study, we model G·E using a marker · environment interaction (M·E) GS model; the approach is conceptually simple and can be implemented with existing GS software.We discuss how themodel can be implemented by using an explicit regression of phenotypes on markers or using co-variance structures (a genomic best linear unbiased prediction-type model). We used the M·E model to analyze three CIMMYT wheat data sets (W1, W2, and W3), where more than 1000 lines were genotyped using genotyping-by-sequencing and evaluated at CIMMYT’s research station in Ciudad Obregon, Mexico, under simulated environmental conditions that covered different irrigation levels, sowing dates and planting systems.We compared the M·E model with a stratified (i.e., within-environment) analysis and with a standard (across-environment) GS model that assumes that effects are constant across environments (i.e., ignoring G·E). The prediction accuracy of the M·E model was substantially greater of that of an across-environment analysis that ignores G·E. Depending on the prediction problem, the M·E model had either similar or greater levels of prediction accuracy than the stratified analyses. The M·E model decomposes marker effects and genomic values into components that are stable across environments (main effects) and others that are environment-specific (interactions). Therefore, in principle, the interaction model could shed light over which variants have effects that are stable across environments and which ones are responsible for G·E. The data set and the scripts required to reproduce the analysis are publicly available as Supporting Information.

Article

Genomics CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Correction to: bayesian functional regression as an alternative statistical analysis of high-throughput phenotyping data of modern agriculture

Osval Antonio Montesinos-Lopez Gustavo de los Campos Jose Crossa Juan Burgueño Francisco Javier Luna Vázquez (2018)

Unfortunately, in the original version [1] of this article, a funder note was missed out in the acknowledgement. Te corrected acknowledgement is given below: Acknowledgements Te authors thank all the feld and lab assistants of CIMMYT’s Global Wheat Breeding Program who collected and processed the agronomic and breeding feld data as well as the image data. Te data used in this study was collected under projects supported by Bill and Melinda Gates Foundation and USAID

Article

Agriculture Bayesian theory Statistical analysis Hyperspectral Data Functional Regression Analyses Bayesian Functional Regression Functional Data Bayesian Ridge Regression DATA ANALYSIS REGRESSION ANALYSIS STATISTICAL METHODS BAYESIAN THEORY CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Regularized selection indices for breeding value prediction using hyper-spectral image data

Marco Lopez-Cruz Jose Crossa Susanne Dreisigacker Suchismita Mondal Ravi Singh Gustavo de los Campos (2020)

High-throughput phenotyping (HTP) technologies can produce data on thousands of phenotypes per unit being monitored. These data can be used to breed for economically and environmentally relevant traits (e.g., drought tolerance); however, incorporating high-dimensional phenotypes in genetic analyses and in breeding schemes poses important statistical and computational challenges. To address this problem, we developed regularized selection indices; the methodology integrates techniques commonly used in high-dimensional phenotypic regressions (including penalization and rank-reduction approaches) into the selection index (SI) framework. Using extensive data from CIMMYT?s (International Maize and Wheat Improvement Center) wheat breeding program we show that regularized SIs derived from hyper-spectral data offer consistently higher accuracy for grain yield than those achieved by standard SIs, and by vegetation indices commonly used to predict agronomic traits. Regularized SIs offer an effective approach to leverage HTP data that is routinely generated in agriculture; the methodology can also be used to conduct genetic studies using high-dimensional phenotypes that are often collected in humans and model organisms including body images and whole-genome gene expression profiles.

Article

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA QUANTITATIVE TRAIT LOCI STATISTICAL METHODS PHENOTYPES

Bayesian functional regression as an alternative statistical analysis of high-throughput phenotyping data of modern agriculture

Osval Antonio Montesinos-Lopez Gustavo de los Campos Jose Crossa Juan Burgueño Francisco Javier Luna Vázquez (2018)

Modern agriculture uses hyperspectral cameras with hundreds of reflectance data at discrete narrow bands measured in several environments. Recently, Montesinos-López et al. (Plant Methods 13(4):1–23, 2017a. https://doi.org/10.1186/s13007-016-0154-2; Plant Methods 13(62):1–29, 2017b. https://doi.org/10.1186/s13007-017-0212-4) proposed using functional regression analysis (as functional data analyses) to help reduce the dimensionality of the bands and thus decrease the computational cost. The purpose of this paper is to discuss the advantages and disadvantages that functional regression analysis offers when analyzing hyperspectral image data. We provide a brief review of functional regression analysis and examples that illustrate the methodology. We highlight critical elements of model specification: (i) type and number of basis functions, (ii) the degree of the polynomial, and (iii) the methods used to estimate regression coefficients. We also show how functional data analyses can be integrated into Bayesian models. Finally, we include an in-depth discussion of the challenges and opportunities presented by functional regression analysis.

Article

Agriculture Bayesian theory Phenotyping Hyperspectral Data Functional Regression Analyses Bayesian Functional Regression Functional Data Bayesian Ridge Regression DATA ANALYSIS REGRESSION ANALYSIS STATISTICAL METHODS BAYESIAN THEORY CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Use of hyperspectral image data outperforms vegetation indices in prediction of maize yield

Samuel Trachsel Lorena González Pérez Juan Burgueño Jose Crossa Gustavo de los Campos (2017)

Hyperspectral cameras can provide reflectance data at hundreds of wavelengths. This information can be used to derive vegetation indices (VIs) that are correlated with agronomic and physiological traits. However, the data generated by hyperspectral cameras are richer than what can be summarized in a VI. Therefore, in this study, we examined whether prediction equations using hyperspectral image data can lead to better predictive performance for grain yield than what can be achieved using VIs. For hyperspectral prediction equations, we considered three estimation methods: ordinary least squares, partial least squares (a dimension reduction method), and a Bayesian shrinkage and variable selection procedure. We also examined the benefits of combining reflectance data collected at different time points. Data were generated by CIMMYT in 11 maize (Zea mays L.) yield trials conducted in 2014 under heat and drought stress. Our results indicate that using data from 62 bands leads to higher prediction accuracy than what can be achieved using individual VIs. Overall, the shrinkage and variable selection method was the best-performing one. Among the models using data from a single time point, the one using reflectance collected at 28 d after flowering gave the highest prediction accuracy. Combining image data collected at multiple time points led to an increase in prediction accuracy compared with using single-time-point data.

Article

Forecasting Image analysis reflectance Maize CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Bayesian Genomic Prediction with Genotype x Environment Interaction Kernel Models

Jaime Cuevas Osval Antonio Montesinos-Lopez Juan Burgueño Paulino Pérez-Rodríguez Gustavo de los Campos (2017)

The phenomenon of genotype · environment (G · E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G · E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G · E interaction are extensions of a single environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects (u) that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model (u) plus an extra component, f, that captures random effects between environments that were not captured by the random effects u: We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G · E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u and f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u.

Article

Bayesian theory Genotype environment interaction CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Bayesian functional regression as an alternative statistical analysis of high‑throughput phenotyping data of modern agriculture

Osval Antonio Montesinos-Lopez Gustavo de los Campos Jose Crossa Juan Burgueño Francisco Javier Luna Vázquez (2018)

Modern agriculture uses hyperspectral cameras with hundreds of reflectance data at discrete narrow bands measured in several environments. Recently, Montesinos-López et al. (Plant Methods 13(4):1–23, 2017a. https ://doi.org/10.1186/s1300 7-016-0154-2; Plant Methods 13(62):1–29, 2017b. https ://doi.org/10.1186/s1300 7-017-0212- 4) proposed using functional regression analysis (as functional data analyses) to help reduce the dimensionality of the bands and thus decrease the computational cost. The purpose of this paper is to discuss the advantages and disadvantages that functional regression analysis offers when analyzing hyperspectral image data. We provide a brief review of functional regression analysis and examples that illustrate the methodology. We highlight critical elements of model specification: (i) type and number of basis functions, (ii) the degree of the polynomial, and (iii) the methods used to estimate regression coefficients. We also show how functional data analyses can be integrated into Bayesian models. Finally, we include an in-depth discussion of the challenges and opportunities presented by functional regression analysis. Results: We used seven model-methods, one with the conventional model (M1), three methods using the B-splines model (M2, M4, and M6) and three methods using the Fourier basis model (M3, M5, and M7). The data set we used comprises 976 wheat lines under irrigated environments with 250 wavelengths. Under a Bayesian Ridge Regression (BRR), we compared the prediction accuracy of the model-methods proposed under different numbers of basis functions, and compared the implementation time (in seconds) of the seven proposed model-methods for different numbers of basis. Our results as well as previously analyzed data (Montesinos-López et al. 2017a, 2017b) support that around 23 basis functions are enough. Concerning the degree of the polynomial in the context of B-splines, degree 3 approximates most of the curves very well. Two satisfactory types of basis are the Fourier basis for period curves and the B-splines model for non-periodic curves. Under nine different basis, the seven method-models showed similar prediction accuracy. Regarding implementation time, results show that the lower the number of basis, the lower the implementation time required. Methods M2, M3, M6 and M7 were around 3.4 times faster than methods M1, M4 and M5. Conclusions: In this study, we promote the use of functional regression modeling for analyzing high-throughput phenotypic data and indicate the advantages and disadvantages of its implementation. In addition, many key elements that are needed to understand and implement this statistical technique appropriately are provided using a real data set. We provide details for implementing Bayesian functional regression using the developed genomic functional regression (GFR) package. In summary, we believe this paper is a good guide for breeders and scientists interested in using functional regression models for implementing prediction models when their data are curves. Keywords: Hyperspectral data, Functional regression analysis, Bayesian functional regression, Functional data, Bayesian Ridge Regression.

Article

Phenotypes Economic activities Statistical methods Regression analysis Hyperspectral Data Functional Regression Analyses Bayesian Functional Regression Functional Data Bayesian Ridge Regression DATA ANALYSIS REGRESSION ANALYSIS STATISTICAL METHODS BAYESIAN THEORY CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA