Author: Farid García Lamont
Classification methods usually exhibit a poor performance when they are applied on imbalanced data sets. In order to overcome this problem, some algorithms have been proposed in the last decade. Most of them generate synthetic instances in order to balance data sets, regardless the classification algorithm. These methods work reasonably well in most cases; however, they tend to cause over-fitting. In this paper, we propose a method to face the imbalance problem. Our approach, which is very simple to implement, works in two phases; the first one detects instances that are difficult to predict correctly for classification methods. These instances are then categorized into “noisy” and “secure”, where the former refers to those instances whose most of their nearest neighbors belong to the opposite class. The second phase of our method, consists in generating a number of synthetic instances for each one of those that are difficult to predict correctly. After applying our method to data sets, the AUC area of classifiers is improved dramatically. We compare our method with others of the state-of-the-art, using more than 10 data sets.
En este trabajo se presenta una propuesta para segmentación de imágenes por características de color utilizando mapas auto organizados.
Most of the works addressing segmentation of color images use clustering-based methods; the drawback with such methods is that they require a priori knowledge of the amount of clusters, so the number of clusters is set depending on the nature of the scene so as not to lose color features of the scene. Other works that employ different unsupervised learning-based methods use the colors of the given image, but the classifying method employed is retrained again when a new image is given. Humans have the nature capability to: (1) recognize colors by using their previous knowledge, that is, they do not need to learn to identify colors every time they observe a new image and, (2) within a scene, humans can recognize regions or objects by their chromaticity features. Hence, in this paper we propose to emulate the human color perception for color image segmentation. We train a three-layered self-organizing map with chromaticity samples so that the neural network is able to segment color images by their chromaticity features. When training is finished, we use the same neural network to process several images, without training it again and without specifying, to some extent, the number of colors the image have. The hue component of colors is extracted by mapping the input image from the RGB space to the HSV space. We test our proposal using the Berkeley segmentation database and compare quantitatively our results with related works; according to the results comparison, we claim that our approach is competitive.
In this paper we present a comparison between three color characterizations methods applied for fruit recognition, two of them are selected from two related works and the third is the authors’ proposal; in the three works, color is represented in the RGB space. The related works characterize the colors considering their intensity data; but employing the intensity data of colors in the RGB space may lead to obtain imprecise models of colors, because, in this space, despite two colors with the same chromaticity if they have different intensities then they represent different colors. Hence, we introduce a method to characterize the color of objects by extracting the chromaticity of colors; so, the intensity of colors does not influence significantly the color extraction. The color characterizations of these two methods and our proposal are implemented and tested to extract the color features of different fruit classes. The color features are concatenated with the shape characteristics, obtained using Fourier descriptors, Hu moments and four basic geometric features, to form a feature vector. A feed-forward neural network is employed as classifier; the performance of each method is evaluated using an image database with 12 fruit classes.
Se calcula la cantidad de grupos en que los vectores de color son agrupados usando fuzzy c-means
Fuzzy C-means (FCM) is one of the most often techniques employed for color image segmentation; the drawback with this technique is the number of clusters the data, pixels’ colors, is grouped must be defined a priori. In this paper we present an approach to compute the number of clusters automatically. A competitive neural network (CNN) and a self-organizing map (SOM) are trained with chromaticity samples of different colors; the neural networks process each pixel of the image to segment, where the activation occurrences of each neuron are collected in a histogram. The number of clusters is set by computing the number of the most activated neurons. The number of clusters is adjusted by comparing the similitude of colors. We show successful segmentation results obtained using images of the Berkeley segmentation database by training only one time the CNN and SOM, using only chromaticity data.
This paper focuses on the representation of magnetic resonances of different parts of the human body, such as knees, spinal column, arms, elbows, etc., using ontologies. First, it maps the resonance images in a multimedia database. Then, automatically, using the SIFT pattern recognition algorithm, descriptors of the images stored in the database are extracted in order to recover useful data for the user; it uses the ontologies as an artificial intelligence tool and, in consequence, reduces generation of useless data. Why do we think this is an interesting task? Because, if the user requires information about any topics or (s)he has some illness or needs to undergo magnetic resonance, this tool will show him/her images and text to convey a better understanding, helping to obtain useful conclusions. Artificial intelligence techniques are used, such as machine learning, knowledge representation, and pattern recognition. The ontological relations introduced here are based on the common representation of language, using definition dictionaries, Roget’s thesaurus, synonym dictionaries, and other resources. The system generates an output in the OM ontological language . This language represents a structure where our system adds the data scanned by the SIFT algorithm. The tests have been made in Spanish; however, thanks to the portability of our system, it is possible to extend the method to any language.
Proyecto UAEM 3454CHT/2013
Univariate decision trees are classifiers currently used in many data mining applications. This classifier discovers partitions in the input space via hyperplanes that are orthogonal to the axes of attributes, producing a model that can be understood by human experts. One disadvantage of univariate decision trees is that they produce complex and inaccurate models when decision boundaries are not orthogonal to axes. In this paper we introduce the Fisher’s Tree, it is a classifier that takes advantage of dimensionality reduction of Fisher’s linear discriminant and uses the decomposition strategy of decision trees, to come up with an oblique decision tree. Our proposal generates an artificial attribute that is used to split the data in a recursive way. The Fisher’s decision tree induces oblique trees whose accuracy, size, number of leaves and training time are competitive with respect to other decision trees reported in the literature. We use more than ten public available data sets to demonstrate the effectiveness of our method.
Support Vector Machine (SVM) has important properties such as a strong mathematical background and a better generalization capability with respect to other classification methods. On the other hand, the major drawback of SVM occurs in its training phase, which is computationally expensive and highly dependent on the size of input data set. In this study, a new algorithm to speed up the training time of SVM is presented; this method selects a small and representative amount of data from data sets to improve training time of SVM. The novel method uses an induction tree to reduce the training data set for SVM, producing a very fast and high-accuracy algorithm. According to the results, the proposed algorithm produces results with similar accuracy and in a faster way than the current SVM implementations.
Proyecto UAEM 3771/2014/CI
Over the past few years, has been shown that generalization power of Support Vector Machines (SVM) falls dramatically on imbalanced data-sets. In this paper, we propose a new method to improve accuracy of SVM on imbalanced data-sets. To get this outcome, firstly, we used undersampling and SVM to obtain the initial SVs and a sketch of the hyperplane. These support vectors help to generate new artificial instances, which will take part as the initial population of a genetic algorithm. The genetic algorithm improves the population in artificial instances from one generation to another and eliminates instances that produce noise in the hyperplane. Finally, the generated and evolved data were included in the original data-set for minimizing the imbalance and improving the generalization ability of the SVM on skewed data-sets.
Mejora del contraste de imagenes de color RGB
The histogram equalization (HE) is a technique developed for image contrast enhancement of grayscale images. For RGB (Red, Green, Blue) color images, the HE is usually applied in the color channels separately; due to correlation between the color channels, the chromaticity of colors is modified. In order to overcome this problem, the colors of the image are mapped to different color spaces where the chromaticity and the intensity of colors are decoupled; then, the HE is applied in the intensity channel. Mapping colors between different color spaces may involve a huge computational load, because the mathematical operations are not linear. In this paper we present a proposal for contrast enhancement of RGB color images, without mapping the colors to different color spaces, where the HE is applied to the intensities of the color vectors. We show that the images obtained with our proposal are very similar to the images processed in the HSV (Hue, Saturation, Value) and L*a*b* color spaces.
Se propone un enfoque para calcular el numero de grupos en que una imagen de color debe segmentarse utilizando fuzzy c-means
In this paper we introduce a method for color image segmentation by computing automatically the number of clusters the data, pixels, are divided into using fuzzy c-means. In several works the number of clusters is defined by the user. In other ones the number of clusters is computed by obtaining the number of dominant colors, which is determined with unsupervised neural networks (NN) trained with the image’s colors; the number of dominant colors is defined by the number of the most activated neurons. The drawbacks with this approach are as follows: (1) The NN must be trained every time a new image is given and (2) despite employing different color spaces, the intensity data of colors are used, so the undesired effects of nonuniform illumination may affect computing the number of dominant colors. Our proposal consists in processing the images with an unsupervised NN trained previously with chromaticity samples of different colors; the number of the neurons with the highest activation occurrences defines the number of clusters the image is segmented. By training the NN with chromatic data of colors it can be employed to process any image without training it again, and our approach is, to some extent, robust to non-uniform illumination. We perform experiments with the images of the Berkeley segmentation database, using competitive NN and self-organizing maps; we compute and compare the quantitative evaluation of the segmented images obtained with related works using the probabilistic random index and variation of information metrics.