Autor: Suchismita Mondal

Aerial High-Throughput Phenotyping Enabling Indirect Selection for Grain Yield at the Early-generation Seed-limited Stages in Breeding Program - data for publication

Suchismita Mondal Jose Crossa Ravi Singh Jesse Poland (2020)

The files contain pedigree information on lines used in the study, trait data for grain yield, spectral traits and other agronomic data and genotypic data

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Genomic selection models based on integration of GWAS loci and epistatic interactions

deepmala sehgal Suchismita Mondal Ravi Singh Susanne Dreisigacker (2020)

The potential to integrate consistent associations identified from GWAS as fixed variables in GP models to improve prediction accuracy for complex traits (for example, grain yield) has not been investigated comprehensively in wheat. Here, we untangled the genetic architecture of grain yield and yield stability by haplotypes-based GWAS and epistatic scan of the genome. We then integrated robust and stable associations (and interacting loci) as fixed effects in GP models to investigate the importance of these associations in improving prediction accuracies of the said traits. We concluded that the utility of GP incorporating GWAS results is noteworthy for GY when GWAS results identify significant and robust genomic regions.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: High-resolution spectral information enables phenotyping of leaf epicuticular wax in wheat

Fatima Camarillo-Castillo Suchismita Mondal Matthew Paul Reynolds (2020)

Current limitations on phenotyping EW restrict the integration of this secondary trait into wheat breeding pipelines. In this study we evaluated the application of high-resolution spectral information as a proxy estimate of the trait and developed an efficient indirect method for the selection of genotypes with high EW density.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Fifty years of semi-dwarf spring wheat breeding at CIMMYT: Grain yield progress in optimum, drought and heat stress environments

Suchismita Mondal Somak Dutta Leonardo Abdiel Crespo Herrera JULIO HUERTA_ESPINO Hans-Joachim Braun Ravi Singh (2019)

This dataset provides supplementary files related to fifty years of semi-dwarf spring wheat breeding at CIMMYT: Grain yield progress in optimum, drought and heat stress environments.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Prediction of multiple-trait and multiple-environment genomic data using recommender systems

Osval Antonio Montesinos-Lopez Jose Crossa Ravi Singh Suchismita Mondal Philomin Juliana (2017)

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, while researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although statistical models are usually mathematically elegant, they are also computatio nally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: a) item-based collaborative filtering (IBCF; method M1) and b) the matrix factorization algorithm (method M2) in the context of multiple traits and multiple environments. The IBCF and matrix factorization methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique (method M1) was slightly better in terms of prediction accuracy than the two conventional methods and the matrix factorization method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Allocation of wheat lines in sparse testing for genome-based multi-environment prediction

Leonardo Abdiel Crespo Herrera Ravi Singh Suchismita Mondal Philomin Juliana DIEGO JARQUIN Jose Crossa (2021)

Sparse testing can be used in plant breeding and genome-based prediction. In sparse testing not all of the lines are sown in all environments. The phenotypic and genotypic data files provided in this dataset were used to execute an analysis of three general cases of the composition of the sparse testing allocation design for wheat breeding.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Prediction of multiple-trait and multiple-environment genomic data using recommender systems

Osval Antonio Montesinos-Lopez Jose Crossa Ravi Singh Suchismita Mondal Philomin Juliana (2017)

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, while researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although statistical models are usually mathematically elegant, they are also computatio nally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: a) item-based collaborative filtering (IBCF; method M1) and b) the matrix factorization algorithm (method M2) in the context of multiple traits and multiple environments. The IBCF and matrix factorization methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique (method M1) was slightly better in terms of prediction accuracy than the two conventional methods and the matrix factorization method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Sparse kernel models provide optimization of training set design for genome-based prediction in multi-year wheat breeding data

Marco Lopez-Cruz Susanne Dreisigacker Leonardo Abdiel Crespo Herrera Alison Bentley Ravi Singh Suchismita Mondal Paulino Pérez-Rodríguez Jose Crossa (2021)

When genomic selection (GS) is used in breeding schemes, data from multiple generations can provide opportunities to increase sample size and thus the likelihood of extracting useful information from the training data. The Sparse Selection Index (SSI), is is a method for optimizing training data selection. The data files provided with this study include a large multigeneration wheat dataset of grain yield for 68,836 lines generated across eight cycles (years) as well as genotypic data that were analyzed to test this method. The results of the analysis are published in the corresponding journal article.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA