Author: Hugo Jair Escalante
HUGO JAIR ESCALANTE BALDERAS (2010)
This document describes the methods we proposed for image annotation and retrieval
that are based on the semantic cohesion among multimodal terms. The semantic cohesion is
the degree of association among the terms that compose a document according to their meaning
in a certain context. Hence, the proposed techniques aim at exploiting the relationship
among terms from different modalities, but that occur in common documents, to improve the
performance of current techniques for annotation and retrieval of images.
On the one hand, we propose an energy-based model for automatic image annotation that
attempts to maximize an estimate of the semantic cohesion among labels assigned to adjacent
regions in segmented images. The proposed method incorporates visual information extracted
from the images as well as estimates of association among labels. Visual information is
incorporated by means of the outputs of supervised classification techniques; whereas the
association among labels, which is estimated through co-occurrence statistics, is incorporated
directly into the model. Experimental results in several collections give evidence of the
validity of our approach. Our results outperformed those obtained by related works on the
same image collections. Furthermore, the proposed model is very general, which facilitates
its application to heterogeneous collections, is highly efficient and can be extended in several
On the other hand, we propose methods based on the semantic cohesion among labels
and text to represent documents for the task of multimedia image retrieval. Specifically, we
propose two indexing techniques that take advantage of distributional term representations.
Under our approach the content of images is modeled through occurrence and co-occurrence
statistics among multimodal terms derived from images. In this way, we attempt to represent
each image by patterns that reflect the cohesion of the multimodal terms that occur in it. We
also study standard methods for combining information from labels and text. Experimental
results show that standard techniques are very effective; however, the latter techniques were
significantly outperformed by the representations based on semantic cohesion. Our results
motivate further research in several aspects that we would like to explore as future work.
During our research the need of a data set that allowed us to evaluate our methods (both
on annotation and retrieval) arose.
En este documento se proponen métodos para la anotación y recuperación de imágenes
que se basan en la cohesión semántica entre términos multimodales. La cohesión semántica es
el grado de relación entre los términos de un documento de acuerdo al significado de estos en
cierto contexto. Así, los métodos propuestos intentan explotar la asociación entre términos de
distintas modalidades pero que ocurren en documentos comunes para mejorar la efectividad
de las técnicas para anotación y recuperación existentes.
Por un lado, se propone un modelo basado en energía para anotación automática de imágenes
que intenta maximizar un estimado de la cohesión semántica entre etiquetas asignadas
a regiones adyacentes en imágenes segmentadas. El método propuesto incorpora atributos
visuales extraídos de las imágenes así como información de asociación entre etiquetas. Los
atributos visuales se integran por medio de las predicciones de métodos de clasificación supervisada;
mientras que la asociación entre etiquetas, estimada a través de coocurrencias,
se incorpora directamente al modelo. Resultados experimentales obtenidos en varias colecciones
dan evidencia de la validez del enfoque propuesto, superando a trabajos relacionados
que han usado las mismas colecciones. Además, el método propuesto es muy general, lo que
facilita su aplicación en bases de datos de características heterogéneas, es altamente eficiente
y puede ser extendido de varias maneras.
Por otro lado, se proponen métodos basados en la cohesión semántica entre etiquetas y
texto para representar documentos para la recuperación multimodal de imágenes. Específicamente,
se proponen dos métodos de indexado que modelan el contenido de imágenes por
medio de estadísticas de ocurrencia y coocurrencia entre términos multimodales derivados
de las imágenes. De esta manera, intentamos representar cada documento por patrones que
reflejen la cohesión de los términos multimodales que ocurren en el documento. También, se
estudian métodos estándar para combinar etiquetas y texto. Resultados experimentales muestran
que los métodos estándar son altamente efectivos para buscar imágenes, aunque estos
últimos fueron superados significativamente por las técnicas de indexado que se basan en la
Durante el desarrollo de la investigación surgió la necesidad de considerar una colección
que permitiera la evaluación de los métodos propuestos (tanto en anotación como en recuperación).
Prototype generation (PG) methods aim to find a subset of instances taken from a large training data set, in such a way that classification performance (commonly, using a 1NN classifier) when using prototypes is equal or better than that obtained when using the original training set. Several PG methods have been proposed so far, most of them consider a small subset of training instances as initial prototypes and modify them trying to maximize the classification performance on the whole training set. Although some of these methods have obtained acceptable results, training instances may be under-exploited, because most of the times they are only used to guide the search process. This paper introduces a PG method based on genetic programming in which many training samples are combined through arithmetic operators to build highly effective prototypes. The genetic program aims to generate prototypes that maximize an estimate of the generalization performance of an 1NN classifier. Experimental results are reported on benchmark data to assess PG methods. Several aspects of the genetic program are evaluated and compared to many alternative PG methods. The empirical assessment shows the effectiveness of the proposed approach outperforming most of the state of the art PG techniques when using both small and large data sets. Better results were obtained for data sets with numeric attributes only, although the performance of the proposed technique on mixed data was very competitive as well.
This paper introduces two novel strategies for representing multimodal images with application to multimedia image retrieval. We consider images that are composed of both text and labels: while text describes the image content at a very high semantic level (e.g., making reference to places, dates or events), labels provide a mid-level description of the image (i.e., in terms of the objects that can be seen in the image). Accordingly, the main assumption of this work is that by combining information from text and labels we can develop very effective retrieval methods. We study standard information fusion techniques for combining both sources of information. However, whereas the performance of such techniques is highly competitive, they cannot capture effectively the content of images. Therefore, we propose two novel representations for multimodal images that attempt to exploit the semantic cohesion among terms from different modalities. Such representations are based on distributional term representations widely used in computational linguistics. Under the considered representations the content of an image is modeled by a distribution of co-occurrences over terms or of occurrences over other images, in such a way that the representation can be considered an expansion of the multimodal terms in the image. We report experimental results using the SAIAPR TC12 benchmark on two sets of topics used in ImageCLEF competitions with manually and automatically generated labels. Experimental results show that the proposed representations outperform significantly both, standard multimodal techniques and unimodal methods. Results on manually assigned labels provide an upper bound in the retrieval performance that can be obtained, whereas results with automatically generated labels are encouraging. The novel representations are able to capture more effectively the content of multimodal images. We emphasize that although we have applied our representations to multimedia image retrieval the same formulation can be adopted for modeling other multimodal documents (e.g., videos).
Multimedia image retrieval Image annotation Distributional term representations Semantic cohesion modeling CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES CIENCIA DE LOS ORDENADORES
This paper proposes the application of particle swarm optimization (PSO) to the problem of full model selection, FMS, for classification tasks. FMS is defined as follows: given a pool of preprocessing methods, feature selection and learning algorithms, to select the combination of these that obtains the lowest classification error for a given data set; the task also includes the selection of hyperparameters for the considered methods. This problem generates a vast search space to be explored, well suited for stochastic optimization techniques. FMS can be applied to any classification domain as it does not require domain knowledge. Different model types and a variety of algorithms can be considered under this formulation. Furthermore, competitive yet simple models can be obtained with FMS. We adopt PSO for the search because of its proven performance in different problems and because of its simplicity, since neither expensive computations nor complicated operations are needed. Interestingly, the way the search is guided allows PSO to avoid overfitting to some extend. Experimental results on benchmark data sets give evidence that the proposed approach is very effective, despite its simplicity. Furthermore, results obtained in the framework of a model selection challenge show the competitiveness of the models selected with PSO, compared to models selected with other techniques that focus on a single algorithm and that use domain knowledge.
This paper introduces an energy-based model (EBM) for region labeling that takes advantage of both context and semantics present in segmented images.The proposed method refines the output of multiclass classification methods that are based on the one-vs-all (OVA) formulation. Intuitively, the EBM maximizes the semantic cohesion among labels assigned to neighboring regions; that is, a tradeoff between label-association information and the predictions from the base classifier. Additionally, we study the suitability of OVA classification for the region labeling task. We report experimental results of our methods in 12 heterogeneous data sets that have been used for the evaluation of different tasks besides region labeling. On the one hand, our results reveal that the OVA approach offers an important potential of improvement in terms of labeling performance that can be exploited by refinement techniques similar to ours. On the other hand, experimental results show that our EBM improves the labeling provided by the base classifier. The EBM is highly efficient and it can be applied without modifications to different data sets. The heterogeneity of the considered databases shows the generality of our approach and its robustness to different scenarios. Our results are superior to other techniques that have been tested in the same collections. Furthermore, results on image retrieval show that the labels generated with our EBM can be helpful for annotation-based image retrieval.
Region labeling Energy-based modeling Random forest Image annotation Object recognition CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES CIENCIA DE LOS ORDENADORES
This article describes the application of particle swarm model selection (PSMS) to the problem of automatic image annotation (AIA). PSMS can be considered a black-box tool for the selection of effective classifiers in binary classification problems. We face the AIA problem as one of multi-class classification, considering a one-vs-all (OVA) strategy. OVA makes a multi-class problem into a series of binary classification problems, each of which deals with whether a region belongs to a particular class or not. We use PSMS to select the models that compose the OVA classifier and propose a new technique for making multi-class decisions from the selected classifiers. This way, effective classifiers can be obtained in acceptable times; specific methods for preprocessing, feature selection and classification are selected for each class; and, most importantly, very good annotation performance can be obtained. We present experimental results in six data sets that give evidence of the validity of our approach; to the best of our knowledge the results reported herein are the best obtained so far in the data sets we consider. It is important to emphasize that despite the application domain we consider is AIA, nothing restricts us of applying the methods described in this article to any other multi-class classification problem.
Classification Particle swarm optimization Particle swarm model selection Machine learning Image annotation Object recognition CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES CIENCIA DE LOS ORDENADORES
We present methods for image annotation and retrieval based on semantic cohesion among terms. On the one hand, we propose a region labeling technique that assigns an image the label that maximizes an estimate of semantic cohesion among candidate labels associated to regions in segmented images. On the other hand, we propose document representation techniques based on semantic cohesion among multimodal terms that compose images. We report experimental results that show the effectiveness of the proposed techniques. Additionally, we describe an extension of a benchmark collection for evaluation of the proposed techniques.
Presentamos métodos para la anotación y recuperación de imágenes que se basan en la cohesión semántica entre términos. Por un lado, proponemos una técnica para etiquetar regiones que asigna a cada imagen el conjunto de etiquetas que maximiza un estimado de la cohesión semántica entre estas. Por otro lado, proponemos métodos para representar imágenes anotadas que se basan en la cohesión semántica entre términos multimodales que aparecen en las imágenes. Reportamos resultados experimentales que muestran la efectividad de las técnicas propuestas. Adicionalmente describimos la extensión que realizamos a una colección estándar para la evaluación de los métodos propuestos.
Automatic image annotation Region labeling Multimedia image retrieval Ground truth data creation Anotación automática de imágenes Etiquetado de regiones Recuperación multimodal de imágenes Creación de datos para evaluación CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES CIENCIA DE LOS ORDENADORES
This paper describes a novel approach to learning term-weighting schemes (TWSs) in the context of text classification. In text mining a TWS determines the way in which documents will be represented in a vector space model, before applying a classifier. Whereas acceptable performance has been obtained with standard TWSs (e.g., Boolean and term-frequency schemes), the definition of TWSs has been traditionally an art. Further, it is still a difficult task to determine what is the best TWS for a particular problem and it is not clear yet, whether better schemes, than those currently available, can be generated by combining known TWS. We propose in this article a genetic program that aims at learning effective TWSs that can improve the performance of current schemes in text classification. The genetic program learns how to combine a set of basic units to give rise to discriminative TWSs. We report an extensive experimental study comprising data sets from thematic and non-thematic text classification as well as from image classification. Our study shows the validity of the proposed method; in fact, we show that TWSs learned with the genetic program outperform traditional schemes and other TWSs proposed in recent works. Further, we show that TWSs learned from a specific domain can be effectively used for other tasks.
Hugo Jair Escalante Balderas Manuel Montes y Gómez Jesús Antonio González Bernal María del Pilar Gómez Gil Leopoldo Altamirano Robles CARLOS ALBERTO REYES GARCIA CAROLINA RETA CASTRO ALEJANDRO ROSALES PEREZ (2012)
Objective: Acute leukemia is a malignant disease that affects a large proportion of the world population. Different types and subtypes of acute leukemia require different treatments. In order to assign the correct treatment, a physician must identify the leukemia type or subtype. Advanced and precise methods are available for identifying leukemia types, but they are very expensive and not available in most hospitals in developing countries. Thus, alternative methods have been proposed. An option explored in this paper is based on the morphological properties of bone marrow images, where features are extracted from medical images and standard machine learning techniques are used to build leukemia type classifiers. Methods and materials: This paper studies the use of ensemble particle swarm model selection (EPSMS), which is an automated tool for the selection of classification models, in the context of acute leukemia classification. EPSMS is the application of particle swarm optimization to the exploration of the search space of ensembles that can be formed by heterogeneous classification models in a machine learning toolbox. EPSMS does not require prior domain knowledge and it is able to select highly accurate classification models without user intervention. Furthermore, specific models can be used for different classification tasks. Results: We report experimental results for acute leukemia classification with real data and show that EPSMS outperformed the best results obtained using manually designed classifiers with the same data. The highest performance using EPSMS was of 97.68% for two-type classification problems and of 94.21% for more than two types problems. To the best of our knowledge, these are the best results reported for this data set. Compared with previous studies, these improvements were consistent among different type/subtype classification tasks, different features extracted from images, and different feature extraction regions. The performance improvements were statistically significant.Weimproved previous results by an average of 6% and there are improvements of more than 20% with some settings. In addition to the performance improvements, we demonstrated that no manual effort was required during acute leukemia type/subtype classification.
Conclusions: Morphological classification of acute leukemia usingEPSMSprovides an alternative to expensive diagnostic methods in developing countries. EPSMS is a highly effective method for the automated construction of ensemble classifiers for acute leukemia classification, which requires no significant user intervention. EPSMS could also be used to address other medical classification tasks.
Ensemble learning Swarm optimization Full model selection Morphological classification Analysis of bone marrow cell images Acute leukemia classification CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES CIENCIA DE LOS ORDENADORES
HUGO JAIR ESCALANTE BALDERAS CARLOS ARTURO HERNANDEZ GRACIDAS JESUS ANTONIO GONZALEZ BERNAL AURELIO LOPEZ LOPEZ MANUEL MONTES Y GOMEZ EDUARDO FRANCISCO MORALES MANZANARES LUIS ENRIQUE SUCAR SUCCAR LUIS VILLASEÑOR PINEDA (2009)
Automatic image annotation (AIA), a highly popular topic in the field of information retrieval research, has experienced significant progress within the last decade. Yet, the lack of a standardized evaluation platform tailored to the needs of AIA, has hindered effective evaluation of its methods, especially for region-based AIA. Therefore in this paper, we introduce the segmented and annotated IAPR TC-12 benchmark; an extended resource for the evaluation of AIA methods as well as the analysis of their impact on multimedia information retrieval. We describe the methodology adopted for the manual segmentation and annotation of images, and present statistics for the extended collection. The extended collection is publicly available and can be used to evaluate a variety of tasks in addition to image annotation. We also propose a soft measure for the evaluation of annotation performance and identify future research areas in which this extended test collection is likely to make a contribution.