Author: Eric Sadit Téllez Avila
Sentiment analysis (SA) is a task related to understanding people's feelings in written text; the starting point would be to identify the polarity level (positive, neutral or negative) of a given text, moving on to identify emotions or whether a text is humorous or not. This task has been the subject of several research competitions in a number of languages, e.g., English, Spanish, and Arabic, among others. In this contribution, we propose an SA system, namely EvoMSA, that our participating systems in various SA competitions, making it domain independent and multilingual by processing text using only language-independent techniques.
EvoMSA is based on Genetic Programming that works by combining the output of text classifers to produce the final prediction. We analyzed EvoMSA on diferent SA competitions to provide a global overview of its performance. The results indicated that EvoMSA is competitive obtaining top rankings in several SA competitions. Furthermore, we performed an analysis of EvoMSA's components to measure their contribution to the performance; the aim was to facilitate a practitioner or newcomer to implement a competitive SA classifer. Finally, it is worth to mention that EvoMSA is available as open source software.
El análisis de redes sociales para el estudio de mercados financieros se ha vuelto un tema de investigación y desarrollo de herramientas que permite a los agentes financieros usar las opiniones de la gente para aumentar la precisión en las predicciones de mercado. Nuestra investigación se enfoca en la predicción de la tendencia de índices financieros usando la minería de opinión, basado en el análisis de blogs especializados en finanzas para el idioma inglés. Los comenta-rios vertidos en estos blogs son clasificados en términos de su opinión respecto a la tendencia de mercado (a la alza, estable o a la baja). Se evalúan distintas téc-nicas de aprendizaje computacional y minería de textos para la clasificación de los comentarios realizados durante un periodo de tres meses. Los resultados ob-tenidos muestran que este análisis puede ser incorporado como un factor en la toma de decisión de los agentes financieros y mejorar la precisión de sus proyec-ciones.
Proximity searching is the problem of retrieving,from agiven data base,those objects closest to aquery.To avoid exhaustive searching, data structures called indexes are builton the data base prior to serving queries.The curse of dimensionality is awell-known problem
for indexes: in spaces with sufficiently concentrated distance histograms,no index out performs anexhaustive scan of the data base.
In the Genetic Programming (GP) community there has been a great interest in developing semantic genetic operators. These type of operators use information of the phenotype to create ospring. The most recent approaches of semantic GP include the GP framework based on the alignment of error space, the geometric semantic genetic operators, and backpropagation genetic operators. Our contribution proposes two semantic operators based on projections in the phenotype space. The proposed operators have the characteristic, by construction, that the ospring's tness is as at least as good as the tness of the best parent; using as tness the euclidean distance. The semantic operators proposed increment the learning capabilities of GP. These operators are compared against a traditional GP and Geometric Semantic GP in the Human oral bioavailability regression problem and 13 classication problems. The results show that a GP system with our novel semantic operators has the best performance in the training phase in all the problems tested.
Luis Luis Pellegrin OCTAVIO LOYOLA GONZALEZ JOSE ORTIZ BEJAR MIGUEL ANGEL MEDINA PEREZ ANDRES EDUARDO GUTIERREZ RODRIGUEZ Eric Sadit Téllez Avila MARIO GRAFF GUERRERO SABINO MIRANDA JIMENEZ Daniela Moctezuma MAURICIO ALFONSO GARCIA LIMON ALICIA MORALES REYES CARLOS ALBERTO REYES GARCIA Eduardo Morales Manzanares Hugo Jair Escalante (2019)
This paper describes the design of the 2017 RedICA: Text-Image Matching (RICATIM) challenge, including the dataset generation, a complete analysis of results, and the descriptions of the top-ranked developed methods. The academic challenge explores the feasibility of a novel binary image classification scenario, where each instance corresponds to the concatenation of learned representations of an image and a word. Instances are labeled as positive if the word is relevant for describing the visual content of the image, and negative otherwise. This novel approach of the image classification problem poses an alternative scenario where any text-image pair can be represented in such space, so any word could be considered for describing an image. The proposed methods are diverse and competitive, showing considerable improvements over the proposed baselines.