Author: ALBERTO TELLEZ VALERO

Validación de respuestas reconociendo la implicación textual

ALBERTO TELLEZ VALERO (2009)

A question answering system is a kind of search engine that allows retrieving concrete information from large text document collections. The characteristic of this type of systems is that requests from users are expressed as questions for which specic pieces of information (i.e., text fragments instead of complete documents) must be returned as answer. Unfortunately, in many cases, the current performance of these systems has not been as expected. Such is the case of spanish, where, to date, the best system of this kind has only correctly answered to 53% of the questions from a given test set in this language. In order to improve this performance in this thesis is presented an answer validation method. This method allows creating a system that labels as valid or erroneous each one of the answers from the question answering systems. In particular, the answer validation system uses a classier based on supervised learning to label the answers. The principal characteristic of the system is that it uses novel attributes to evaluate the textual entailment along with attributes that verify the compatibility between question-answer. This combination of attributes allows the system to select valid answers for the questions while it discards the erroneous ones. The experiments in a set of questions and answers in spanish show the eectiveness of the system. The obtained results are encouraging since they outperform the results achieved by other similar systems; but mainly, because they allow increasing the best performance reached in spanish question answering. This last result mainly produced by the application of the answer validation system to combine the answers from multiple question answering systems.

Un sistema de búsqueda de respuestas es un tipo de motor de búsqueda que permite recuperar información concreta a partir de grandes colecciones de documentos de texto. La característica de este tipo de sistemas es que la petición del usuario es expresada como una pregunta para la cual piezas específicas de información (i.e., fragmentos de texto en lugar de documentos completos) son retornadas como una respuesta. Desafortunadamente el desempeño actual de estos sistemas en muchos casos no ha resultado ser el esperado. Tal como ocurre en el español, donde hasta la fecha el mejor sistema de esta clase sólo ha contestado correctamente a un 53% de las preguntas de un conjunto de prueba en este idioma. Con el propósito de mejorar dicho desempeño en esta tesis se presenta un método de validación de respuestas. Este método permite crear un sistema que etiqueta como válida o errónea a cada una de las respuestas de los sistemas de búsqueda de respuestas. En particular, el sistema de validación de respuestas utiliza un clasificador basado en aprendizaje supervisado para etiquetar cada respuesta. La característica principal del sistema es que emplea atributos novedosos para evaluar la implicación textual junto con atributos que verifican la compatibilidad entre pregunta-respuesta. Esta combinación de atributos le permite al sistema seleccionar respuestas válidas para las preguntas mientras descarta las erróneas. Los experimentos en preguntas y respuestas en español muestran la efectividad del sistema. Los resultados obtenidos son motivadores, éstos superan a los alcanzados por otros sistemas similares. Pero sobre todo, estos resultados permiten incrementar el mejor desempeño alcanzado en la búsqueda de respuestas en español. Esto último principalmente por utilizar el sistema de validación de respuestas para combinar las respuestas de múltiples sistemas de búsqueda de respuestas.

Doctoral thesis

Information retrieval Natural languages processing Artificial intelligence CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES

Using machine learning for extracting information from natural disaster news reports

Usando aprendizaje automático para extraer información de noticias de desastres naturales

ALBERTO TELLEZ VALERO MANUEL MONTES Y GOMEZ LUIS VILLASEÑOR PINEDA (2009)

The disasters caused by natural phenomena have been present all along human history; nevertheless, their consequences are greater each time. This tendency will not be reverted in the coming years; on the contrary, it is expected that natural phenomena will increase in number and intensity due to the global warming. Because of this situation it is of great interest to have sufficient data related to natural disasters, since these data are absolutely necessary to analyze their impact as well as to establish links between their occurrence and their effects. In accordance to this necessity, in this paper we describe a system based on Machine Learning methods that improves the acquisition of natural disaster data. This system automatically populates a natural disaster database by extracting information from online news reports. In particular, it allows extracting information about five different types of natural disasters: hurricanes, earthquakes, forest fires, inundations, and droughts. Experimental results on a collection of Spanish news show the effectiveness of the proposed system for detecting relevant documents about natural disasters (reaching an F-measure of 98%), as well as for extracting relevant facts to be inserted into a given database (reaching an F-measure of 76%).

Los desastres causados por fenómenos naturales han estado presentes desde el principio de la historia del hombre; sin embargo, sus consecuencias son cada vez mayores. Esta tendencia podría no ser revertida en los próximos años; al contrario, se espera que los fenómenos naturales puedan incrementar en número e intensidad debido al calentamiento global. A causa de esta situación es de gran interés tener suficientes datos relacionados a los desastres naturales, ya que estos datos son absolutamente necesarios para analizar su impacto así como para establecer conexiones entre su ocurrencia y sus efectos. En correspondencia con esta necesidad, en este artículo describimos un sistema basado en métodos de Aprendizaje Automático que mejora la adquisición de datos de desastres naturales. Este sistema automáticamente llena una base de datos de desastres naturales con la información extraída de noticias de periódicos en línea. En particular, este sistema permite extraer información acerca de cinco tipos de desastres naturales: huracanes, temblores, incendios forestales, inundaciones y sequías. Los resultados experimentales en una colección de noticias en Español muestran la eficacia del sistema propuesto tanto para detectar documentos relevantes sobre desastres naturales (alcanzando una medida-F de 98%), así como para extraer hechos relevantes para ser insertados en una base de datos dada (alcanzando una medida-F de 76%). Palabras claves: Aprendizaje Automático, Extracción de Información, Clasificación Temática de Textos, Desastres Naturales, Bases de Datos.

Article

Machine Learning Information Extraction Text Categorization Natural Disasters Databases Aprendizaje Automático Extracción de Información Clasificación Temática de Textos Desastres Naturales Bases de Datos CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES

Towards multi-stream question answering using answer validation

ALBERTO TELLEZ VALERO MANUEL MONTES Y GOMEZ LUIS VILLASEÑOR PINEDA (2010)

Motivated by the continuous growth of theWeb in the number of sites and users, several search engines attempt to extend their traditional functionality by incorporating question answering (QA) facilities. This extension seems natural but it is not straightforward since current QA systems still achieve poor performance rates for languages other than English. Based on the fact that retrieval effectiveness has been previously improved by combining evidence from multiple search engines, in this paper we propose a method that allows taking advantage of the outputs of several QA systems. This method is based on an answer validation approach that decides about the correctness of answers based on their entailment with a support text, and therefore, that reduces the influence of the answer redundancies and the system confidences. Experimental results on Spanish are encouraging; evaluated over a set of 190 questions from the CLEF 2006 collection, our method responded correctly 63% of the questions, outperforming the best QA participating system (53%) by a relative increase of 19%. In addition, when they were considered five answers per question, our method could obtain the correct answer for 73% of the questions. In this case, it outperformed traditional multi-stream techniques by generating a better ranking of the set of answers presented to the users.

Article

Question answering Information fusion Answer validation Textual entailment CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES CIENCIA DE LOS ORDENADORES

Learning to select the correct answer in multi-stream question answering

ALBERTO TELLEZ VALERO Manuel Montes y Gómez Luis Villaseñor Pineda (2011)

Question answering (QA) is the task of automatically answering a question posed in natural language. Currently, there exists several QA approaches, and, according to recent evaluation results, most of them are complementary. That is, different systems are relevant for different kinds of questions. Somehow, this fact indicates that a pertinent combination of various systems should allow to improve the individual results. This paper focuses on this problem, namely, the selection of the correct answer from a given set of responses corresponding to different QA systems. In particular, it proposes a supervised multi-stream approach that decides about the correctness of answers based on a set of features that describe: (i) the compatibility between question and answer types, (ii) the redundancy of answers across streams, as well as (iii) the overlap and non-overlap information between the question–answer pair and the support text. Experimental results are encouraging; evaluated over a set of 190 questions in Spanish and using answers from 17 different QA systems, our multi-stream QA approach could reach an estimated QA performance of 0.74, significantly outperforming the estimated performance from the best individual system (0.53) as well as the result from best traditional multi-stream QA approach (0.60).

Article

Data fusion Multi-stream QA Textual entailment Answer validation CIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA MATEMÁTICAS CIENCIA DE LOS ORDENADORES CIENCIA DE LOS ORDENADORES