Autor: Jose Crossa

Deep kernel and deep learning for genomic-based prediction

Jose Crossa Paulino Pérez-Rodríguez Juan Burgueño Ravi Singh Philomin Juliana Osval Antonio Montesinos-Lopez Jaime Cuevas (2019)

Deep learning (DL) is a promising method in the context of genomic prediction for selecting individuals early in time without measuring their phenotypes. iI this paper we compare the performance in terms of genome-based prediction of the DL method, deep kernel (arc-cosine kernel, AK) method, Gaussian kernel (GK) method and the conventional kernel method (Genomic Best Linear Unbiased Predictor, GBLUP, GB). We used two real wheat data sets for the benchmarking of these methods. We found that the GK and deep kernel AK methods outperformed the DL and the conventional GB methods, although the gain in terms of prediction performance of AK and GK was not very large but they have the advantage that no tuning parameters are required. Furthermore, although AK and GK had similar genomic-based performance, deep kernel AK is easier to implement than the GK. For this reason, our results suggest that AK is an alternative to DL models with the advantage that no tuning process is required.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Use of Remote Sensing for Genome-Wide Association Studies and Genomic Prediction

Alexander Loladze Francelino Rodrigues Cesar Petroli Felix San Vicente Garcia Bruno Gerard Osval Antonio Montesinos-Lopez Jose Crossa Johannes Martini (2023)

Disease resistance improvement efforts in plant breeding can help to reduce the negative impact of biotic stresses on crop production.Disease resistance can be assessed through a labor-intensive process of assigning visual scores (VS) of susceptibility (or resistance) by specially trained staff. Remote sensing (RS) tools can also be used to measure traits such as vegetation indices that can also be used to assess plant responses to diseases. This dataset contains phenotypic and genotypic data from a two-year evaluation trial of three newly developed biparental populations of maize doubled haploid lines (DH). Data from VS and RS methods for assessing common rust resistance were used in genome wide association study (GWAS) as well as genomic prediction (GP) analyses. A report on the comparison of the results of these analyses is provided in the accompanying article.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Genome-based prediction of multiple wheat quality traits in multiple years

Maria Itria Ibba Jose Crossa Osval Antonio Montesinos-Lopez Philomin Juliana Carlos Guzman Susanne Dreisigacker Jesse Poland (2020)

The use of genomic prediction could greatly help to increase the efficiency of selecting for wheat quality traits by reducing the cost and time required for this analysis. This study contains data used to evaluate the prediction performances of 13 wheat quality traits under two multi-trait models [Bayesian multi-trait multi-environment (BMTME) and multi-trait ridge regression (MTR)]. Separate files are provided for each year of data. An additional supplemental data file provides R code for running the analyses as well as a table describing the Average Pearson´s correlation (APC) and mean arctangent absolute percentage error (MAAPE) for the testing sets for each dataset and trait.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Joint use of genome, pedigree and their interaction with environment for predicting the performance of wheat lines in new environments

Osval Antonio Montesinos-Lopez Philomin Juliana Ravi Singh Jesse Poland Paulino Pérez-Rodríguez Jose Crossa DIEGO JARQUIN (2019)

In this study, we evaluated genome-based prediction using 35,403 wheat lines from the Global Wheat Breeding Program of the International Maize and Wheat Improvement Center (CIMMYT). We implemented eight statistical models that included genome-wide molecular marker and pedigree information in two different validation schemes. All models included main effects, and others also considered interactions between the different types of covariates via Hadamard products of similarity structures. The pedigree models always gave better results predicting new lines in observed environments than the genome-based models when only main effects were fitted. However, for all traits, the highest predictive abilities were obtained when interactions between pedigree, markers and environments were included. When new lines were predicted in unobserved environments in almost all trait/year combinations, the marker main-effects model was the best. These results provide strong evidence that the different sources of genetic information (molecular markers and pedigree) are not equally useful at different stages of the breeding pipelines, and can be employed differentially to improve the design of future breeding programs.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Multimodal Deep Learning Methods Enhance Genomic Prediction of Wheat Breeding

Carolina Rivera-Amado Francisco Pinto Francisco Javier Pinera-Chavez David González-Diéguez Paulino Pérez-Rodríguez Huihui Li Osval Antonio Montesinos-Lopez Jose Crossa (2023)

In plant breeding research, several statistical machine learning methods have been developed and studied for assessing the genomic prediction (GP) accuracy of unobserved phenotypes. To increase the GP accuracy of unobserved phenotypes while simultaneously accounting for the complexity of genotype × environment interaction (GE), deep learning (DL) neural networks have been developed.These analyses can potentially include phenomics data obtained through imaging. The two datasets included in this study contain phenomic, phenotypic, and genotypic data for a set of wheat materials. They have been used to compare a novel DL method with conventional GP models.The results of these analyses are reported in the accompanying journal article.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Optimizing sparse testing for genomic prediction of plant breeding crops

Osval Antonio Montesinos-Lopez Carolina Saint Pierre Brandon Alejandro Mosqueda González Alison Bentley Yoseph Beyene Manje Gowda Leonardo Abdiel Crespo Herrera Jose Crossa (2022)

In plant breeding, sparse testing methods have been suggested to improve the efficiency of the genomic selection methodology. The data provided in this dataset were used to evaluate four methods for allocating lines to environments for sparse testing in multi-environment trials. The analysis was conducted using a multi-trait and uni-trait framework. The accompanying article describes the results of the evaluation as well as a cost-benefit analysis to identify the benefits that can be obtained using sparse testing methods.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication data for: Increased ranking change in wheat breeding under climate change

Wei Xiong Matthew Paul Reynolds Jose Crossa Urs Schulthess Kai Sonder Carlo Montes Nicoletta Addimando Ravi Singh Karim Ammar Bruno Gerard Thomas Payne (2022)

A standard quantitative genetic model was used to examine how genotype-environment interactions have changed over the past decades from four spring wheat trial data sets. The variability of cross interactions for yield from one year to another is explained in more than 70% by climatic factors.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA