Autor: Pedro César Santana Mancilla

Replication Data for: A comparison between three machine learning methods for multivariate genomic prediction using the Sparse Kernels Methods (SKM) library

Osval Antonio Montesinos-Lopez Pedro César Santana Mancilla Jose Crossa (2022)

Genomic selection (GS) provides a new way for plant breeders select the best genotype. It draws upon historical phenotypic and genotypic information for training a statistical machine learning model which is used for predicting phenotypic (or breeding) values of new lines for which only genotypic information is available. Many statistical machine learning methods have been proposed for this task, but multi-trait (MT) genomic prediction models are preferred because they take advantage of correlated traits to improve the prediction accuracy. This study contains six datasets that were used to compare the prediction performance of three MT methods: the MT genomic best linear unbiased predictor (GBLUP), the MT partial least square (PLS) and the multi-trait Random Forest (RF). The data come from groundnuts, rice, and wheat. The accompanying article describes the results of the analysis.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA