Author: Paulino Pérez-Rodríguez

Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R

Paulino Pérez-Rodríguez Gustavo de los Campos Jose Crossa (2010)

The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression) implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO) in a unifi ed framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyperparameters, are also addressed.



A bayesian genomic regression model with skew normal random errors

Paulino Pérez-Rodríguez Sergio Pérez-Elizalde Jose Crossa (2018)

Genomic selection (GS) has become a tool for selecting candidates in plant and animal breeding programs. In the case of quantitative traits, it is common to assume that the distribution of the response variable can be approximated by a normal distribution. However, it is known that the selection process leads to skewed distributions. There is vast statistical literature on skewed distributions, but the skew normal distribution is of particular interest in this research. This distribution includes a third parameter that drives the skewness, so that it generalizes the normal distribution. We propose an extension of the Bayesian whole-genome regression to skew normal distribution data in the context of GS applications, where usually the number of predictors vastly exceeds the sample size. However, it can also be applied when the number of predictors is smaller than the sample size. We used a stochastic representation of a skew normal random variable, which allows the implementation of standard Markov Chain Monte Carlo (MCMC) techniques to efficiently fit the proposed model. The predictive ability and goodness of fit of the proposed model were evaluated using simulated and real data, and the results were compared to those obtained by the Bayesian Ridge Regression model. Results indicate that the proposed model has a better fit and is as good as the conventional Bayesian Ridge Regression model for prediction, based on the DIC criterion and cross-validation, respectively. A computing program coded in the R statistical package and C programming language to fit the proposed model is available as supplementary material.


Genomics Bayesian theory Genomic Selection Data Augmentation Assymetric Distributions GBLUP Ridge Regression GenPred Shared Data Resources BAYESIAN THEORY REGRESSION ANALYSIS STATISTICAL METHODS CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Multi-trait Bayesian decision for parental selection

Jose Crossa Fernando Henrique Toledo Paulino Pérez-Rodríguez (2020)

The files included in this study contains the data used with three promising multivariate loss functions: Kullback-Leibler (KL); the Energy Score; and the Multivariate Asymmetric Loss (MALF); to select the best performing parents for the next breeding cycle in two extensive real wheat data sets.



A Bayesian decision theory approach for genomic selection

Bartolo de Jesús Villar-Hernández Sergio Pérez-Elizalde Jose Crossa Paulino Pérez-Rodríguez Juan Burgueño (2018)

Plant and animal breeders are interested in selecting the best individuals from a candidate set for the next breeding cycle. In this paper, we propose a formal method under the Bayesian decision theory framework to tackle the selection problem based on genomic selection (GS) in single- and multi-trait settings. We proposed and tested three univariate loss functions (Kullback-Leibler, KL; Continuous Ranked Probability Score, CRPS; Linear-Linear loss, LinLin) and their corresponding multivariate generalizations (Kullback-Leibler, KL; Energy Score, EnergyS; and the Multivariate Asymmetric Loss Function, MALF). We derived and expressed all the loss functions in terms of heritability and tested them on a real wheat dataset for one cycle of selection and in a simulated selection program. The performance of each univariate loss function was compared with the standard method of selection (Std) that does not use loss functions. We compared the performance in terms of the selection response and the decrease in the population's genetic variance during recurrent breeding cycles. Results suggest that it is possible to obtain better performance in a long-term breeding program using the single-trait scheme by selecting 30% of the best individuals in each cycle but not by selecting 10% of the best individuals. For the multi-trait approach, results show that the population mean for all traits under consideration had positive gains, even though two of the traits were negatively correlated. The corresponding population variances were not statistically different from the different loss function during the 10th selection cycle. Using the loss function should be a useful criterion when selecting the candidates for selection for the next breeding cycle.


Bayesian theory Genomics Plant breeding Decision Theory Loss Function Scenarios GenPred Shared Data Resources BAYESIAN THEORY GENOMICS SELECTION SIMULATION CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Supplemental data for hybrid wheat prediction using genomic, pedigree and environmental covariables interaction models

BHOJA BASNET Jose Crossa Paulino Pérez-Rodríguez Ravi Singh Fatima Camarillo-Castillo (2018)

Genomic prediction of hybrids unobserved in field evaluations is crucial. In this study, we used genomic G×E models for hybrid prediction, where similarity between lines was assessed by pedigree and molecular markers, and similarity between environments was accounted for by environmental covariables.



A reaction norm model for genomic selection using high-dimensional genomic and environmental data

DIEGO JARQUIN Jose Crossa Paulino Pérez-Rodríguez Mario Calus Juan Burgueño Gustavo de los Campos (2013)

In most agricultural crops the effects of genes on traits are modulated by environmental conditions, leading to genetic by environmental interaction (g × e). Modern genotyping technologies allow characterizing genomes in great detail and modern information systems can generate large volumes of environmental data. In principle, g × e can be accounted for using interactions between markers and environmental covariates (ecs). However, when genotypic and environmental information is high dimensional, modeling all possible interactions explicitly becomes infeasible. In this article we show how to model interactions between high-dimensional sets of markers and ecs using covariance functions. The model presented here consists of (random) reaction norm where the genetic and environmental gradients are described as linear functions of markers and of ecs, respectively. We assessed the proposed method using data from arvalis, consisting of 139 wheat lines genotyped with 2,395 snps and evaluated for grain yield over 8 years and various locations within northern france. A total of 68 ecs, defined based on five phases of the phenology of the crop, were used in the analysis. Interaction terms accounted for a sizable proportion (16 %) of the within-environment yield variance, and the prediction accuracy of models including interaction terms was substantially higher (17–34 %) than that of models based on main effects only. Breeding for target environmental conditions has become a central priority of most breeding programs. Methods, like the one presented here, that can capitalize upon the wealth of genomic and environmental information available, will become increasingly important.


Prediction Accuracy Covariance Function Covariance Structure Prediction Problem Multiplicative Operator CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Single-step genomic and pedigree genotype x environment interaction models for predicting wheat lines in international environments

Paulino Pérez-Rodríguez Jose Crossa Jessica Rutkoski Ravi Singh Gustavo de los Campos Juan Burgueño Susanne Dreisigacker (2017)

Genomic prediction models have been commonly used in plant breeding but only in reduced datasets comprising a few hundred genotyped individuals. However, pedigree information for an entire breeding population is frequently available, as are historical data on the performance of a large number of selection candidates. The single-step method extends the genomic relationship information from genotyped individuals to pedigree information from a larger number of phenotyped individuals in order to combine relationship information on all members of the breeding population. Furthermore, genomic prediction models that incorporate genotype × environment interactions (G × E) have produced substantial increases in prediction accuracy compared with single-environment genomic prediction models. Our main objective was to show how to use single-step genomic and pedigree models to assess the prediction accuracy of 58,798 CIMMYT wheat (Triticum aestivum L.) lines evaluated in several simulated environments in Ciudad Obregon, Mexico, and to predict the grain yield performance of some of them in several sites in South Asia (India, Pakistan, and Bangladesh) using a reaction norm model that incorporated G × E. Another objective was to describe the statistical and computational challenges encountered when developing the pedigree and single-step models in such large datasets. Results indicate that the genomic prediction accuracy achieved by models using pedigree only, markers only, or both pedigree and markers to predict various environments in India, Pakistan, and Bangladesh is higher (0.25–0.38) than prediction accuracy of models that use only phenotypic prediction (0.20) or do not include the G × E term.


Genomics Breeding methods Genetic improvement Wheats CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA