Advanced search


Knowledge area




537 results, page 1 of 10

A simple approach to multilingual polarity classification in twitter

Eric Tellez SABINO MIRANDA JIMENEZ Mario Graff Daniela Moctezuma Ranyart Rodrigo Suarez Ponce de Leon Oscar Sánchez Siordia (2017)

Recently, sentiment analysis has received a lot of attention due to the interest in mining opinions of social media users. Sentiment analysis consists in determining the polarity of a given text, i.e., its degree of positiveness or negativeness. Traditionally, Sentiment Analysis algorithms have been tailored to a specific language given the complexity of having a number of lexical variations and errors introduced by the people generating content. In this contribution, our aim is to provide a simple to implement and easy to use multilingual framework, that can serve as a baseline for sentiment analysis contests, and as a starting point to build new sentiment analysis systems. We compare our approach in eight different languages, three of them correspond to important international contests, namely, SemEval (English), TASS (Spanish), and SENTIPOLC (Italian). Within the competitions, our approach reaches from medium to high positions in the rankings; whereas in the remaining languages our approach outperforms the reported results.

Article

Multilingual sentiment analysis Error-robust text representations Opinion mining INGENIERÍA Y TECNOLOGÍA CIENCIAS TECNOLÓGICAS TECNOLOGÍA DE LOS ORDENADORES INTELIGENCIA ARTIFICIAL INTELIGENCIA ARTIFICIAL

A Case Study of Spanish Text Transformations for Twitter Sentiment Analysis

Oscar Sánchez Siordia Eric Tellez SABINO MIRANDA JIMENEZ Mario Graff Daniela Moctezuma Elio Atenógenes Villaseñor García (2017)

Sentiment analysis is a text mining task that determines the polarity of a given text, i.e., its positiveness or negativeness. Recently, it has received a lot of attention given the interest in opinion mining in micro-blogging platforms. These new forms of textual expressions present new challenges to analyze text because of the use of slang, orthographic and grammatical errors, among others. Along with these challenges, a practical sentiment classifier should be able to handle efficiently large workloads. The aim of this research is to identify in a large set of combinations which text transformations (lemmatization, stemming, entity removal, among others), tokenizers (e.g., word n-grams), and token-weighting schemes make the most impact on the accuracy of a classifier (Support Vector Machine) trained on two Spanish datasets. The methodology used is to exhaustively analyze all combinations of text transformations and their respective parameters to find out what common characteristics the best performing classifiers have. Furthermore, we introduce a novel approach based on the combination of word-based n-grams and character-based q-grams. The results show that this novel combination of words and characters produces a classifier that outperforms the traditional wordbased combination by 11.17% and 5.62% on the INEGI and TASS’15 dataset, respectively.

Article

Sentiment Analysis Error-robust text representations Opinion mining INGENIERÍA Y TECNOLOGÍA CIENCIAS TECNOLÓGICAS TECNOLOGÍA DE LOS ORDENADORES INTELIGENCIA ARTIFICIAL INTELIGENCIA ARTIFICIAL

Método probabilista para clasificación de polaridad: negación e intensificación en análisis de sentimientos

SAMARA GRETEL VILLALBA OSORNIO (2016)

Sentiment Analysis (SA) is an area that uses Natural Language processing and

Machine Learning techniques to extract subjective information from texts. In SA

area, several problems are still open, one of them is negation handling. Negation is

a linguistic phenomenon presented in all human languages. In written documents,

negation is presented as marks or negative particles. Negative particles invert the

true value of a sentence. In traditional text classification, semantic information is lost

and with that, the capacity to recognize some linguistic phenomena like negation and

intensification is lost too. To correctly understand the meaning of a text it is necessary

to identify and to treat these linguistic phenomena. The aim of this work is to consider

the negation and intensification to improve polarity classification in opinion texts. A

probabilistic approach that suggests some modifications to the Multinomial Naive

Bayes (MNB) that allows the handling of negation and intensification in the texts

improving their classification is proposed. The work proposes a method that is little

dependent of language and kind of text. Experiments in English and Spanish texts

and in some domains like movies, hotels, books, electronics, etc. were performed. The

results were compared with the ones published in related works.

El Análisis de Sentimientos (AS) es un área que utiliza técnicas de procesamiento

de lenguaje natural y de aprendizaje automático para extraer información subjetiva

de los textos. En el AS aún quedan muchos problemas abiertos, uno de ellos es el tratamiento de la negación. La Negación es un fenómeno lingüístico presente en todos los

idiomas humanos. En documentos, la negación está dada por la presencia de señales o

partículas negativas. Las partículas negativas invierten el valor de verdad de una frase.

Para lograr un correcto entendimiento del significado de un texto es necesario identificar

y tratar estos fenómenos lingüísticos. La finalidad de este trabajo es considerar los

fenómenos lingüísticos de negación e intensificación para mejorar la clasificación por

polaridad en textos de opinión. Para ello se utilizará un enfoque de tipo probabilista,

proponiendo algunas modificaciones al método de Naive Bayes Multinomial (NBM),

las cuales permiten añadir información lingüística a los textos mejorando con ello su

clasificación. El método propuesto es poco dependiente del lenguaje y la temática de

los textos. Se realizaron experimentos en Español e Inglés y en varios dominios tales

como cine, hoteles, libros, electrónicos, etc. Los resultados de los experimentos fueron

comparados con métodos del estado del arte.

Master thesis

Opinion Mining Sentiment Analysis Information Transfer Negation Handling INGENIERÍA Y TECNOLOGÍA CIENCIAS TECNOLÓGICAS TECNOLOGÍA DE LOS ORDENADORES SISTEMAS DE RECONOCIMIENTO DE CARACTERES

Análisis de documentos de opinión usando la representación word2vec

Analysis of opinion documents using the word2vec representation

ANTONIO DE JESUS GARCIA CHAVEZ (2018)

El análisis de sentimientos es el área de estudio que involucra la utilización y manejo de información del tipo sentimientos, emociones y actitudes, las cuales se encuentran relacionadas a objetos, personas, servicios, lugares, eventos o temas en específico. Una de las principales tareas en el área es la detección de polaridad en documentos de opinión, lo cual se refiere a realizar un análisis de sentimientos a nivel de documento, para lo cual, es usual considerar y manipular información del tipo sintáctica así como semántica. Por lo que para aprovechar y beneficiarse de estas opiniones suele ser conveniente utilizar técnicas de clasificación automatizadas que facilitan la tarea. En este trabajo se propone emplear la semántica como información con la cual clasificar la polaridad de documentos de opinión. Para esto se utiliza una red neuronal denominada word2vec, la cual logra modelar, por medio de vectores palabra, relaciones semánticamente representativas entre las palabras que se encuentren dentro de un corpus de texto dado. Estos vectores palabra son empleados dentro de la medida de distancia entre documentos que se propuso utilizar en este trabajo, dicha medida lleva por nombre “Word Mover’s Distance” (WMD). Para lo cual el procedimiento propuesto realiza la clasificación de polaridad de documentos mediante los k-vecinos más cercanos, el clasificador recibe como entrada un conjunto de documentos de opinión con polaridades desconocidas así como un determinado número de documentos vecinos con los cuales comparar, mediante la medida de similitud WMD. La salida del algoritmo es la clasificación de polaridad del conjunto de documentos desconocidos de entrada. Para la evaluación del procedimiento propuesto se construyeron 12 espacios semánticos, obtenidos de las combinaciones de los corpora de opiniones utilizados. Se emplearon dos corpora en idioma español, siendo el primero críticas de cine obtenidas de la web “MuchoCine” con un total de 2000 documentos y el segundo opiniones del sitio “TripAdvisor ” con un total de 10845 documentos. Resultados experimentales muestran que el procedimiento propuesto tiene dificultades para clasificar correctamente documentos provenientes del corpus de “MuchoCine” mientras que lo hace exitosamente con documentos de “TripAdvisor ”. En este último caso, se mostró también que es posible incrementar la calidad de los resultados variando el número de documentos vecinos a consultar y el tamaño de la votación.

Sentiment analysis is the area of study that involves the use and processing of information such as feelings, emotions, and attitudes, which are related to specific objects, people, services, places, events, or topics. One of the main tasks in the area is the polarity detection in opinion documents, which refers to performing a sentiment analysis at document level, for which it is usual to use syntactic as well as semantic information. Therefore, in order to take advantage of and benefit from these opinions, it is often convenient to use automated classification techniques that facilitate the task. In this work, we propose to use semantics as information with which to classify the polarity of opinion documents. A neural network called word2vec is used to model semantically representative relationships between words within a given text corpus by means of word vectors. These Word vectors are used within the measure of distance between documents called Word Mover’s Distance (WMD). For this purpose, the proposed procedure classifies the polarity of documents by using the k-nearest neighbours, the classifier receives as input a set of opinion documents with unknown polarities as well as a certain number of neighbouring documents with which to compare, through the WMD similarity measure. The output of the algorithm is the polarity classification of the set of unknown input documents. For the evaluation of the proposed procedure, 12 semantic spaces were constructed, obtained from the combinations of the corpora of opinions used. Two Spanish-language corpora were used, the first being film reviews obtained from the MuchoCine website with a total of 2000 documents and the second being reviews from the TripAdvisor website with a total of 10845 documents. Experimental results show that the proposed procedure has difficulty to correctly classify documents from the MuchoCine corpus while successfully classifying documents from TripAdvisor. In the latter case, it was also shown that it is possible to increase the quality of the results by varying the number of neighbouring documents to be consulted and the size of the vote.

Master thesis

Análisis de sentimientos, Word2vec Sentiment analysis INGENIERÍA Y TECNOLOGÍA CIENCIAS TECNOLÓGICAS TECNOLOGÍA DE LOS ORDENADORES INTELIGENCIA ARTIFICIAL

Speech-acts Based Analysis for Requirements Discovery from Online Discussions

ITZEL MORALES RAMIREZ (2018)

Online discussions about software applications and services that take place on web-based communication platforms represent an invaluable knowledge source for diverse software engineering tasks, including requirements elicitation. The amount of research work on developing effective tool-supported analysis methods is rapidly increasing, as part of the so called software analytics. Textual messages in App store reviews, tweets, online discussions taking place in mailing lists and user forums, are analysed by combining natural language processing techniques to filter out irrelevant data; text mining and machine learning algorithms to classify messages into different categories, such as bug report and feature request.

Our research objective is to exploit a linguistic technique based on speech-acts for the analysis of online discussions with the ultimate goal of discovering requirement-relevant information. In this paper, we present a revised and extended version of the speech-acts based analysis technique, which we previously presented at CAiSE 2017, together with a detailed experimental characterisation of its properties. Datasets used in the experimental evaluation are taken from a widely used open source software project (161120 textual comments), as well as from an industrial project in the home energy management domain. We make them available for experiment replication purposes. On these datasets, our approach is able to successfully classify messages into Feature/Enhancement and Other, with F-measure of 0.81 and 0.84 respectively. We also found evidence that there is an association between types of speech-acts and categories of issues, and that there is correlation between some of the speech-acts and issue priority, thus motivating further research on the exploitation of our speech-acts based analysis technique in semi-automated multi-criteria requirements prioritisation.

Article

/www.sciencedirect.com/science/article/pii/S0306437917306087/Requirements engineering, Speech-acts analysis, Sentiment analysis, Classification techniques, Online discussions INGENIERÍA Y TECNOLOGÍA CIENCIAS TECNOLÓGICAS TECNOLOGÍA DE LOS ORDENADORES BANCOS DE DATOS BANCOS DE DATOS

Análisis de rentabilidad de la producción de ganado bovino de engorda en el noreste del Estado de México: Estudio de caso Municipio de Tepetlaoxtoc

MARIA DEL ROCIO PRIETO CORNEJO (2011)

Tesis (Maestría en Ciencias, especialista en Economía).- Colegio de Postgraduados, 2011.

La producción de ganado bovino para carne en México es una de las más importantes actividades ganaderas del país sin embargo hemos visto que el mercado de consumo de este producto está siendo suministrado especialmente por las importaciones de Estados Unidos. Se aplicaron encuestas en el municipio de Tepetlaoxtoc para analizar que tan competitivos son nuestros productores y ver cuáles son las principales desventajas a las que se enfrentan, los resultados fueron que todas las unidades de producción son rentables y competitivas a precios privados,es decir, que el valor agregado que generan cubre todos los costos de los factores internos de la producción y presentan una ganancia extraordinaria. Pero no todas las unidades de producción presentan ventajas comparativas con relación a nuestro país vecino Estados Unidos, como fue el caso de la pequeña unidad de producción, en los otros dos casos si la presentan ya que en la inversión de una unidad de divisa la unidad de producción mediana permite ahorrar 0.38 unidades de divisas y la unidad grande de producción ahorra 0.67 unidades de divisas. _______________ PROFITABILITY ANALYSIS OF BEEF CATTLE PRODUCTION IN THE NORTHEAST OF THE STATE OF MEXICO: A CASE STUDY

TEPETLAOXTOC TOWNSHIP. ABSTRACT: Beef cattle Production is one of the largest livestock operations in Mexico, yet we have seen that the consumer market for this commodity is being mostly supplied by imports from the United United States. Surveys were implemented in the municipality of Tepetlaoxtoc to analyze how competitive our producers are and see the main disadvantages that they face. The results were that all production units are profitable and competitive at domestic prices, meaning that the generated added value covers all costs of domestic factors of production and they have extraordinary earnings. But not all production units have a comparative advantage in relation to our neighbor, the United States. This is the case of small-sized farm; in the others two cases, both have comparative advantages. One currency unit was compared and it was seen that the medium size farm saves 0.38currencyunit and the large farm saves 0.67 currency unit.

Master thesis

Análisis Bovinos Carne Competitivos Rentables Valor agregado Analysis Beef cattle competitive Profitable Value-added Maestría Economía CIENCIAS SOCIALES

Two approaches for multilingual question answering: Merging passages vs. Merging answers

RITA MARINA ACEVES PEREZ MANUEL MONTES Y GOMEZ LUIS VILLASEÑOR PINEDA LUIS ARTURO UREÑA LOPEZ (2008)

One major problem in multilingual Question Answering (QA) is the integration of information obtained from different languages into one single ranked list. This paper proposes two different architectures to overcome this problem. The first one performs the information merging at passage level, whereas the second does it at answer level. In both cases, we applied a set of traditional merging strategies from cross-lingual information retrieval. Experimental results evidence the appropriateness of these merging strategies for the task of multilingual QA, as well as the advantages of multilingual QA over the traditional monolingual approach.

Article

Multilingual Question Answering Cross-Lingual Information Retrieval Information Merging CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES