Advanced search

Knowledge area

152 results, page 1 of 10

Particle Swarm Model Selection


This paper proposes the application of particle swarm optimization (PSO) to the problem of full model selection, FMS, for classification tasks. FMS is defined as follows: given a pool of preprocessing methods, feature selection and learning algorithms, to select the combination of these that obtains the lowest classification error for a given data set; the task also includes the selection of hyperparameters for the considered methods. This problem generates a vast search space to be explored, well suited for stochastic optimization techniques. FMS can be applied to any classification domain as it does not require domain knowledge. Different model types and a variety of algorithms can be considered under this formulation. Furthermore, competitive yet simple models can be obtained with FMS. We adopt PSO for the search because of its proven performance in different problems and because of its simplicity, since neither expensive computations nor complicated operations are needed. Interestingly, the way the search is guided allows PSO to avoid overfitting to some extend. Experimental results on benchmark data sets give evidence that the proposed approach is very effective, despite its simplicity. Furthermore, results obtained in the framework of a model selection challenge show the competitiveness of the models selected with PSO, compared to models selected with other techniques that focus on a single algorithm and that use domain knowledge.


Full model selection Machine learning challenge Particle swarm optimization Experimentation Cross validation CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES

Algoritmos de agrupamiento global para datos mezlados


Clustering problem arises in many practical applications in several areas such as Pat-

tern Recognition, Machine Learning, Data Mining, Digital Image Processing, etc. The

k-means algorithm is one of the most frequently algorithms used to solve the clustering

problem, this is due its simplicity but, it has many drawbacks such as: i) it only allows

working with numeric data and ii) it heavily depends on the initial conditions.

On the other hand, in soft sciences such as Medicine, Geology, Sociology, Market-

ing, etc, it is common that objects are described in terms of numeric and no numeric

features (mixed data).

In this context, we propose two clustering algorithms based in the k-Means algo-

rithm. Both algorithms allow working with mixed data and they don't depend on the

initial conditions. The proposed algorithms are tested with data sets obtained from

one public repository and they are compared against other clustering algorithms.

El agrupamiento es un problema que se presenta en una gran cantidad de aplicaciones

prácticas en varios campos tales como Reconocimiento de Patrones, Aprendizaje Automático,

Minería de Datos, Procesamiento Digital de Imágenes, etc. El algoritmo k-Means

es uno de los algoritmos más frecuentemente usados para resolver el problema

de agrupamiento, debido principalmente a su simplicidad, pero tiene varias desventa-

jas entre las que se tienen: i) sólo permite trabajar con datos exclusivamente numéricos

y ii) depende fuertemente de las condiciones iniciales con las que sea ejecutado.

Por otro lado, se tiene que en ciencias denominadas \suaves" (soft sciences) tales

como Medicina, Geología, Sociología, Mercadotecnia, etc. es común que los datos se

encuentren descritos por medio de atributos numéricos y no numéricos (datos mezclados)


Dentro de este contexto, en este trabajo se proponen dos algoritmos de agrupamiento

restringido basados en el algoritmo k-Means. Ambos algoritmos permiten trabajar

con datos mezclados y no dependen de las condiciones iniciales con las que sean ejecutados.

Los algoritmos propuestos son evaluados usando conjuntos de datos obtenidos

de un repositorio público y son comparados contra otros algoritmos de agrupamiento


Master thesis


Mexican sign language alphanumerical gestures recognition using 3D haar-like features


The Mexican Sign Language (LSM) is a language of the deaf Mexican community, which consists of a series of gestural signs articulated by hands and accompanied with facialexpressions. The lack of automated systems to translate signs from LSM makes integration of hearing-impaired people to society more difficult. This work presents a new method for LSM alphanumerical signs recognition based on 3D Haar-like featuresextracted from depth images captured by the Microsoft Kinect sensor. Features are processed with a boosting algorithm. To evaluate performance of our method, we recognized a set of signs from letters and numbers, and compared the results with the useof traditional 2D Haar-like features. Our system is able to recognize static LSM signs with a higher accuracy rate than theone obtained with widely used 2D features.


INGENIERÍA Y TECNOLOGÍA Boosting Gesture recognition Sign language Machine learning 3D Haar-like features

Using machine learning for extracting information from natural disaster news reports

Usando aprendizaje automático para extraer información de noticias de desastres naturales


The disasters caused by natural phenomena have been present all along human history; nevertheless, their consequences are greater each time. This tendency will not be reverted in the coming years; on the contrary, it is expected that natural phenomena will increase in number and intensity due to the global warming. Because of this situation it is of great interest to have sufficient data related to natural disasters, since these data are absolutely necessary to analyze their impact as well as to establish links between their occurrence and their effects. In accordance to this necessity, in this paper we describe a system based on Machine Learning methods that improves the acquisition of natural disaster data. This system automatically populates a natural disaster database by extracting information from online news reports. In particular, it allows extracting information about five different types of natural disasters: hurricanes, earthquakes, forest fires, inundations, and droughts. Experimental results on a collection of Spanish news show the effectiveness of the proposed system for detecting relevant documents about natural disasters (reaching an F-measure of 98%), as well as for extracting relevant facts to be inserted into a given database (reaching an F-measure of 76%).

Los desastres causados por fenómenos naturales han estado presentes desde el principio de la historia del hombre; sin embargo, sus consecuencias son cada vez mayores. Esta tendencia podría no ser revertida en los próximos años; al contrario, se espera que los fenómenos naturales puedan incrementar en número e intensidad debido al calentamiento global. A causa de esta situación es de gran interés tener suficientes datos relacionados a los desastres naturales, ya que estos datos son absolutamente necesarios para analizar su impacto así como para establecer conexiones entre su ocurrencia y sus efectos. En correspondencia con esta necesidad, en este artículo describimos un sistema basado en métodos de Aprendizaje Automático que mejora la adquisición de datos de desastres naturales. Este sistema automáticamente llena una base de datos de desastres naturales con la información extraída de noticias de periódicos en línea. En particular, este sistema permite extraer información acerca de cinco tipos de desastres naturales: huracanes, temblores, incendios forestales, inundaciones y sequías. Los resultados experimentales en una colección de noticias en Español muestran la eficacia del sistema propuesto tanto para detectar documentos relevantes sobre desastres naturales (alcanzando una medida-F de 98%), así como para extraer hechos relevantes para ser insertados en una base de datos dada (alcanzando una medida-F de 76%). Palabras claves: Aprendizaje Automático, Extracción de Información, Clasificación Temática de Textos, Desastres Naturales, Bases de Datos.


Machine Learning Information Extraction Text Categorization Natural Disasters Databases Aprendizaje Automático Extracción de Información Clasificación Temática de Textos Desastres Naturales Bases de Datos CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES

Maize kernel abortion recognition and classification using binary classification machine learning algorithms and deep convolutional neural networks

Walter Mupangwa Isaiah Nyagumbo Mainassara Zaman-Allah (2020)

Maize kernel traits such as kernel length, kernel width, and kernel number determine the total kernel weight and, consequently, maize yield. Therefore, the measurement of kernel traits is important for maize breeding and the evaluation of maize yield. There are a few methods that allow the extraction of ear and kernel features through image processing. We evaluated the potential of deep convolutional neural networks and binary machine learning (ML) algorithms (logistic regression (LR), support vector machine (SVM), AdaBoost (ADB), Classification tree (CART), and the K-Neighbor (kNN)) for accurate maize kernel abortion detection and classification. The algorithms were trained using 75% of 66 total images, and the remaining 25% was used for testing their performance. Confusion matrix, classification accuracy, and precision were the major metrics in evaluating the performance of the algorithms. The SVM and LR algorithms were highly accurate and precise (100%) under all the abortion statuses, while the remaining algorithms had a performance greater than 95%. Deep convolutional neural networks were further evaluated using different activation and optimization techniques. The best performance (100% accuracy) was reached using the rectifier linear unit (ReLu) activation procedure and the Adam optimization technique. Maize ear with abortion were accurately detected by all tested algorithms with minimum training and testing time compared to ear without abortion. The findings suggest that deep convolutional neural networks can be used to detect the maize ear abortion status supplemented with the binary machine learning algorithms in maize breading programs. By using a convolution neural network (CNN) method, more data (big data) can be collected and processed for hundreds of maize ears, accelerating the phenotyping process.



Mexican experience in spanish question answering

Experiencia mexicana en la búsqueda de respuestas en español


Nowadays, due to the great advances in communication and storage media, there is more information available than ever before. This information can satisfy almost every information need; nevertheless, without the appropriate manage facilities, all of it is practically useless. This fact has motivated the emergence of several text processing applications that help in accessing large document collections. Currently, there are three main approaches for this purpose: information retrieval, information extraction, and question answering. Question answering (QA) systems aim to identify the exact answer to a question from a given document collection. This paper presents a survey of the Mexican experience in Spanish QA. In particular, it presents an overview of the participations of the Language Technologies Laboratory of INAOE (LabTL) in the Spanish QA evaluation task at CLEF, from 2004 to 2007. Through these participations, the LabTL has mainly explored two different approaches for QA: a language independent approach based on statistical methods, and a language dependent approach supported by sophisticated linguistic analyses of texts. It is important to point out that, due to these works, the LabTL has become one of the leading research groups in Spanish QA.

En la actualidad, debido a los grandes avances en los medios de comunicación y de almacenamiento, hay más información disponible como nunca antes se ha visto. Esta información puede satisfacer casi todas las necesidades de información, sin embargo, sin una adecuada gestión ésta es prácticamente inútil. Este hecho ha motivado la aparición de diferentes aplicaciones para el procesamiento de texto orientadas a facilitar el acceso a grandes colecciones de documentos. Hoy en día, existen tres enfoques principales para este propósito: la recuperación de información, la extracción de información, y los sistemas de búsqueda de respuestas. Los sistemas de búsqueda de respuestas (QA por sus siglas en inglés) tienen por objeto identificar la respuesta exacta a una pregunta dentro de una determinada colección de documentos. Este trabajo presenta un panorama general de la experiencia mexicana en QA en español. En particular, se presentan las participaciones del Laboratorio de Tecnologías del Lenguaje del INAOE (LabTL) en la tarea de QA en español dentro del foro de evaluación CLEF, desde 2004 a 2007. A través de estas participaciones, el LabTL ha explorado principalmente dos enfoques diferentes en QA: un enfoque independiente del lenguaje basado en métodos estadísticos, y un enfoque dependiente del lenguaje apoyado en un complejo análisis lingüístico del texto. Es importante señalar que, debido a estos trabajos, el LabTL se ha convertido en uno de los principales grupos de investigación de QA en español.


Question Answering Passage Retrieval Answer Extraction Machine Learning Búsqueda de Respuestas Recuperación de Pasajes Extracción de Respuestas Aprendizaje Automático CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES

Multi-class particle swarm model selection for automatic image annotation

Hugo Jair Escalante Balderas Manuel Montes y Gómez Luis Enrique Sucar Succar (2012)

This article describes the application of particle swarm model selection (PSMS) to the problem of automatic image annotation (AIA). PSMS can be considered a black-box tool for the selection of effective classifiers in binary classification problems. We face the AIA problem as one of multi-class classification, considering a one-vs-all (OVA) strategy. OVA makes a multi-class problem into a series of binary classification problems, each of which deals with whether a region belongs to a particular class or not. We use PSMS to select the models that compose the OVA classifier and propose a new technique for making multi-class decisions from the selected classifiers. This way, effective classifiers can be obtained in acceptable times; specific methods for preprocessing, feature selection and classification are selected for each class; and, most importantly, very good annotation performance can be obtained. We present experimental results in six data sets that give evidence of the validity of our approach; to the best of our knowledge the results reported herein are the best obtained so far in the data sets we consider. It is important to emphasize that despite the application domain we consider is AIA, nothing restricts us of applying the methods described in this article to any other multi-class classification problem.


Classification Particle swarm optimization Particle swarm model selection Machine learning Image annotation Object recognition CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES CIENCIA DE LOS ORDENADORES

Segmentation of multispectral satellite images based on seeded region growing and instance-based learning


In order to reach the balance between the fulfillment of human needs and the protection

of the environment, it is necessary to have detailed and accurate information about

natural resources. Such information can be obtained through thematic maps, a product

of remote sensing. In remote sensing, the generation of accurate thematic maps presents

many research challenges, being one of them, image segmentation.

In this thesis, a novel segmentation algorithm based on seeded region growing and

instance based learning is proposed. The algorithm includes a novel automatic seed

generation approach that uses a histograms analysis, a new weighted instance-based

learning algorithm (WIBK) which obtains one or more weights per feature per class,

a novel region growing algorithm (SRG-WIBK) that uses WIBK as decision criteria,

and a novel region-merging scheme based on ownership tables which allows to merge

regions according to user needs. The WIBK algorithm was experimentally evaluated

on several databases from the UCI repository, and compared against instance-based

and non instance-based learning algorithms showing a very competitive performance.

The SRG-WIBK algorithm was tested on multispectral synthetic images and compared

against the algorithms implemented in the ERDAS software showing very even results.

Para lograr el balance entre la satifacción de las necesidades humanas y la protección

del medio ambiente, es necesario tener información detallada y precisa sobre los recursos

naturales. Esta información puede ser obtenida mediante mapas temáticos, uno de

los productos de la percepción remota. En percepción remota, la generación de mapas

temáticos fiables presenta muchos retos de investigación, siendo uno de ellos, la

segmentación de la imagen.

En esta tesis se propone un nuevo algoritmo de segmentación basado en crecimiento

de regiones y aprendizaje basado en instacias. Dentro de las características del algoritmo

se encuentran un nuevo esquema automático de obtención de semillas basado en

análisis de histogramas, un nuevo algoritmo de aprendizaje basado en instacias (WIBK)

que obtiene uno o más pesos por atributo por clase, un nuevo algoritmo de crecimiento

de regiones (SRG-WIBK) que hace uso de WIBK como criterio de decisión y un nuevo

esquema de agrupamiento de regiones basado en tablas de propiedad que permite agrupar

regiones de acuerdo a las necesidades del usuario. El algoritmo WIBK fué evaluado

experimentalmente en varias bases de datos del repositorio UCI, y comparado contra

algoritmos de aprendizaje basados y no basados en instancias mostrando resultados

muy competitivos. El algoritmo SRG-WIBK fué probado en imágenes multiespectrales

sintéticas, y comparado contra los algoritmos implementados en el software ERDAS

mostrando resultados muy parejos.

Master thesis


Using wittgenstein’s family resemblance principle to learn exemplars


The introduction of the notion of family resemblance represented a major shift in Wittgenstein’s thoughts on the meaning of words, moving away from a belief that words were well defined, to a view that words denoted less well defined categories of meaning. This paper presents the use of the notion of family resemblance in the area of machine learning as an example of the benefits that can accrue from adopting the kind of paradigm shift taken by Wittgenstein. The paper presents a model capable of learning exemplars using the principle of family resemblance and adopting Bayesian networks for a representation of exemplars. An empirical evaluation is presented on three data sets and shows promising results that suggest that previous assumptions about the way we categories need reopening.