Author: Philomin Juliana

Prediction of multiple-trait and multiple-environment genomic data using recommender systems

Osval Antonio Montesinos-Lopez Jose Crossa Ravi Singh Suchismita Mondal Philomin Juliana (2017)

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, while researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although statistical models are usually mathematically elegant, they are also computatio nally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: a) item-based collaborative filtering (IBCF; method M1) and b) the matrix factorization algorithm (method M2) in the context of multiple traits and multiple environments. The IBCF and matrix factorization methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique (method M1) was slightly better in terms of prediction accuracy than the two conventional methods and the matrix factorization method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Genotypic data for the South Asian panel with 184 lines

Xinyao He Philomin Juliana Gyanendra Singh Aakash Chawade arun joshi Ravi Singh Pawan Singh (2021)

Genotypic data for the South Asian panel with 184 lines intended for multiple diseases resistance analysis

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Allocation of wheat lines in sparse testing for genome-based multi-environment prediction

Leonardo Abdiel Crespo Herrera Ravi Singh Suchismita Mondal Philomin Juliana DIEGO JARQUIN Jose Crossa (2021)

Sparse testing can be used in plant breeding and genome-based prediction. In sparse testing not all of the lines are sown in all environments. The phenotypic and genotypic data files provided in this dataset were used to execute an analysis of three general cases of the composition of the sparse testing allocation design for wheat breeding.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Prediction of multiple-trait and multiple-environment genomic data using recommender systems

Osval Antonio Montesinos-Lopez Jose Crossa Ravi Singh Suchismita Mondal Philomin Juliana (2017)

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, while researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although statistical models are usually mathematically elegant, they are also computatio nally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: a) item-based collaborative filtering (IBCF; method M1) and b) the matrix factorization algorithm (method M2) in the context of multiple traits and multiple environments. The IBCF and matrix factorization methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique (method M1) was slightly better in terms of prediction accuracy than the two conventional methods and the matrix factorization method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

New deep learning genomic prediction model for multi-traits with mixed binary, ordinal, and continuous phenotypes

Osval Antonio Montesinos-Lopez Francisco Javier Martin Vallejo Jose Crossa Philomin Juliana Ravi Singh (2018)

The seven data sets are wheat data from CIMMYT Global Wheat Breeding program. They comprise different traits, like days to heading, days to maturity, grain yield, grain color, different type of leaf and stripe rust in wheat. Also the trials were run in different environments.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA