Author: Osval Antonio Montesinos-Lopez

A singular value decomposition Bayesian multiple-trait and multiple-environment genomic model

Osval Antonio Montesinos-Lopez Jose Crossa (2018)

In this paper, we propose a two-stage analysis for multiple-trait data; in the first stage, we perform a singular value decomposition (SVD) on the resulting matrix of traits responses, and in the second stage, multiple trait analysis on transformed responses is performed. We use simulated as well as wheat and maize data sets

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Deep learning genomic-enabled prediction of plant traits

Osval Antonio Montesinos-Lopez Jose Crossa (2018)

Machine learning (ML) is a field of computer science that uses statistical techniques to give computer systems the ability to "learn" (i.e., progressively improve performance on a specific task) from data, without being explicitly programmed to do this. ML is closely related to (and often overlaps with) computational statistics, which also focuses on making predictions through the use of computers. In general, ML explores algorithms that can learn from current data and make predictions on new data, through building a model from sample inputs. The field of statistics and ML had a root in common and will continue to come closer together in the future. In this paper we explore the novel deep learning (DL) methodology in the context of genomic selection. DL models with densely connected network architecture were compared with one of the most often used genome-enabled prediction models genomic best linear unbiased prediction (GBLUP). We used nine published real genomic data sets to compare the models and obtain a “meta picture” of the performance of DL models with a densely connected network architecture.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: A guide for generalized kernel regression methods for genomic-enabled prediction

Osval Antonio Montesinos-Lopez Jose Crossa (2020)

The data contained in these datasets can be used to implement Bayesian generalized kernel regression methods for genome-enabled prediction in the statistical software R, The accompanying paper describes the building process of 7 kernel methods (linear, polynomial, sigmoid, Gaussian and Arc-cosine 1, Arc-cosine L).

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: A multivariate Poisson deep learning model for genomic prediction of count data

Osval Antonio Montesinos-Lopez Pawan Singh Jose Crossa (2020)

Genomic selection (GS) is an important method used in plant and animal breeding. The experimental data provided in this study contain counting data. These datasets were used to support research on efficient methodologies for multivariate count data outcomes including a multivariate Poisson deep neural network (MPDN) model, a conventional multivariate generalized Poisson regression model, and a univariate Poisson deep learning models. The results of the analyses are presented in a corresponding publication.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Using an incomplete block design to allocate lines to environments improves sparse genome-based prediction in plant breeding

Osval Antonio Montesinos-Lopez ABELARDO MONTESINOS LOPEZ RICARDO ACOSTA DIAZ Rajeev Varshney Jose Crossa ALISON BENTLEY (2022)

Genomic selection (GS) is a predictive methodology that trains statistical machine-learning models with a reference population that is used to perform genome-enabled predictions of new lines. In plant breeding, it has the potential to increase the speed and reduce the cost of selection. However, to optimize resources, sparse testing methods have been proposed. A common approach is to guarantee a proportion of nonoverlapping and overlapping lines allocated randomly in locations, that is, lines appearing in some locations but not in all. In this study we propose using incomplete block designs (IBD), principally, for the allocation of lines to locations in such a way that not all lines are observed in all locations. We compare this allocation with a random allocation of lines to locations guaranteeing that the lines are allocated to

the same number of locations as under the IBD design. We implemented this benchmarking on several crop data sets under the Bayesian genomic best linear unbiased predictor (GBLUP) model, finding that allocation under the principle of IBD outperformed random allocation by between 1.4% and 26.5% across locations, traits, and data sets in terms of mean square error. Although a wide range of performance improvements were observed, our results provide evidence that using IBD for the allocation of lines to locations can help improve predictive performance compared with random allocation. This has the potential to be applied to large-scale plant breeding programs.

Article

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA Bayes Theorem Genome Inflammatory Bowel Diseases Models, Genetic Plant Breeding

Replication Data for: A comparison between three machine learning methods for multivariate genomic prediction using the Sparse Kernels Methods (SKM) library

Osval Antonio Montesinos-Lopez Pedro César Santana Mancilla Jose Crossa (2022)

Genomic selection (GS) provides a new way for plant breeders select the best genotype. It draws upon historical phenotypic and genotypic information for training a statistical machine learning model which is used for predicting phenotypic (or breeding) values of new lines for which only genotypic information is available. Many statistical machine learning methods have been proposed for this task, but multi-trait (MT) genomic prediction models are preferred because they take advantage of correlated traits to improve the prediction accuracy. This study contains six datasets that were used to compare the prediction performance of three MT methods: the MT genomic best linear unbiased predictor (GBLUP), the MT partial least square (PLS) and the multi-trait Random Forest (RF). The data come from groundnuts, rice, and wheat. The accompanying article describes the results of the analysis.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA