Johnny Torres - Academia.edu (original) (raw)
Papers by Johnny Torres
2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG), 2017
In universities worldwide, instructors may spend a significant amount of time reviewing homework ... more In universities worldwide, instructors may spend a significant amount of time reviewing homework and group projects submitted by their students. Web-based technologies, like Google Docs, have provided a platform for students to write documents collaboratively. Currently, those platforms provide limited information on the individual contribution made by each student. Previous studies have focused on the quantitative aspects of individuals' contribution in collaborative writing, while the quality aspect has received less attention. In this paper, we propose a new model to measure not only quantitative input but also the quality of the content that has been contributed to a document written collaboratively in Spanish language. Based on topics-modeling techniques, we use an adaptive non-negative matrix factorization (NMF) model to extract topics from the content of the document, and grade higher students making those contributions. Using Google documents submitted by students to the academic system of our university as part of their projects, experimental results show that compared to other baseline methods such as edits or words count, our model provide a better approximation to the scores given by human reviewers. Therefore, our model can be used as part of an automatic grading subsystem within the academic system, to provide a baseline score of students' contribution in collaborative documents. This will allow instructors to reduce their workload associated with revision and grading of documents and focus their time on more relevant tasks.
2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM), 2017
Recommender systems have proven to successfully influence user decisions on several applications ... more Recommender systems have proven to successfully influence user decisions on several applications and websites such as Amazon, eBay, Netflix, Spotify, among others. Largely, these systems rely on collaborative filtering to make useful recommendations but could suffer from cold start issues, i.e., lacking enough information for new users. By harnessing the popularity of social media (e.g. Twitter, Facebook) and other social media platforms, knowledge about users can be extracted, including intent of users that engage in conversations. This can provide additional information to applications using recommender systems in order to overcome issues like cold start. To that end, this research will tackle several challenges of natural language understanding in the context of conversations on social media platforms, such as: clustering, classification and summarization of conversations. Thus, the models developed will allow to extract knowledge from microblogs conversations that could be used in several applications.
Advances in Intelligent Systems and Computing, 2018
Nowadays, a growing number of people publicly share information about their fitness activities on... more Nowadays, a growing number of people publicly share information about their fitness activities on social media platforms like Twitter or Facebook. These social networks can furnish people with useful information to get an overview of different geographic areas where people can practice different sport-related activities. In this study, we analyze 14 million tweets to identify places to perform fitness activities and uncovering their aspects from twitterers’ opinions. To this end, we apply clustering analysis to uncover places where twitterers perform fitness activities, and then train a text classifier that achieves a score F1 of \(76\%\) to discriminate the aspects of fitness places. Using this information, recommender systems can provide useful information to local people or tourists that look for places to do exercise.
El crecimiento exponencial del uso de las redes sociales permite a los usuarios comunicarse direc... more El crecimiento exponencial del uso de las redes sociales permite a los usuarios comunicarse directamente con la audiencia y causar impacto. Lideres politicos hacen uso cada vez mas de dichas plataformas para interactuar con sus seguidores. En el presente proyecto se emplean tecnicas de aprendizaje de maquina no supervisado para conocer cuales son las tematicas que los lideres politicos colocan en la opinion publica a traves de las redes sociales y, determinar como se relacionan los titulares de periodicos digitales en un determinado periodo de tiempo. Se ha obtenido datos de lideres politicos en la plataforma Twitter para llevar a cabo experimentos que permitan aplicar tecnicas de clustering de documentos para extraer los topicos relevantes y, los resultados obtenidos se han evaluado con datos de las publicaciones digitales de diario El Universo y diario El Telegrafo en el periodo que comprende el estudio. De esta forma se ha podido identificar que la agenda mediatica corresponde a ...
2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG), 2017
Each year, natural disasters cause severe damage to infrastructure and lead to significant losses... more Each year, natural disasters cause severe damage to infrastructure and lead to significant losses of human lives. Non-governmental organizations (NGOs) and activists use social media networks to disseminate and organize relief efforts, but are often unable to efficiently gather volunteers and resources. Harnessing the power of crowdsourced social media services (e.g. Twitter, Facebook, Google+), we can provide trustworthy channels that can contribute in the relief efforts lead by NGOs and activists. This research project seeks to determine the best approach to build a social recommender model based on the dynamics of human behavior exhibited by fine-grained geospatial footprints of citizens on social networks. Thus, the aim is to provide location-aware recommendations of trustworthy activists, enabling citizens to contribute more efficiently during events, such as, natural disasters.
The massive data generated by users in online platforms, such as social networks, create challeng... more The massive data generated by users in online platforms, such as social networks, create challenges for text classification tasks based on supervised learning. Supervised learning often requires a lot of feature engineering or a significant amount of annotated data to achieve good results. However, the scarcity of annotated data is a critical issue, and manual annotation can be both costly and time-consuming. Semi-supervised learning requires far less annotated data and achieve similar performance as supervised approaches. In this paper, we introduce a semi-supervised neural architecture for muti-label settings, that combines deep learning representation and k-means clustering. The results show that the semi-supervised approach can leverage large-scale unlabeled data and achieve better results compared to baseline unsupervised as well as supervised methods.
La mineria de datos aporta una informacion invaluable que puede ser usada para mejorar los proces... more La mineria de datos aporta una informacion invaluable que puede ser usada para mejorar los procesos en diversas areas tales como la medicina, comercio, informatica, entre otras. Dentro del area informatica, la infraestructura de servidores web, servidores multimedia y servidores de archivos, necesita estar preparada para el manejo de grandes volumenes de datos de manera eficiente. Uno de los retos consiste en optimizar el acceso a los datos mediante el diseno y uso de caches, de tal manera que permita minimizar el uso de los servidores y el trafico de red. El presente trabajo evalua algunas tecnicas de mineria de datos que permiten encontrar patrones de acceso a datos que pueden ser utilizados por disenadores de algoritmos de gestion de caches para tomar decisiones de desalojo mas efectivas y su consecuente mejora en el rendimiento de la cache. Se presentan los resultados de las evaluaciones de una manera sistematica, los cuales pueden ser aprovechados por otros investigadores en el...
Multi-task learning is a framework that enforces different learning tasks to share their knowledg... more Multi-task learning is a framework that enforces different learning tasks to share their knowledge to improve their generalization performance. It is a hot and active domain that strives to handle several core issues; particularly, which tasks are correlated and similar, and how to share the knowledge among correlated tasks. Existing works usually do not distinguish the polarity and magnitude of feature weights and commonly rely on linear correlation, due to three major technical challenges in: 1) optimizing the models that regularize feature weight polarity, 2) deciding whether to regularize sign or magnitude, 3) identifying which tasks should share their sign and/or magnitude patterns. To address them, this paper proposes a new multi-task learning framework that can regularize feature weight signs across tasks. We innovatively formulate it as a biconvex inequality constrained optimization with slacks and propose a new efficient algorithm for the optimization with theoretical guara...
2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG), 2017
The growing interest of our society for health issues and physical activities is reflected on soc... more The growing interest of our society for health issues and physical activities is reflected on social media services where people share information about their daily activities and accomplishments. Fitness activities has become a routine present in many people's daily activities. Previous studies have analyzed people commitment to enhance their lifestyle and reach health objectives using data published in social platforms such as Twitter. However, there is a lack of designed metrics to quantify how users are engaged in physical activities in different areas of a given city. In this study, we collect and analyze 55K tweets posted by people in Ecuador, through different mobile applications, reflecting their participation on fitness and sport activities. Aggregating individual posts at city level, we uncover geographical patterns and dynamics of citizen activities in cities and states. Thus, we illustrate the potential of geolocated posts of people on social media services as sociometers of cities' health and fitness activities.
Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, 2017
Nowadays, microblog platforms provide a medium to share content and interact with other users. Wi... more Nowadays, microblog platforms provide a medium to share content and interact with other users. With the large-scale data generated on these platforms, the origin and reasons of users engagement in conversations has attracted the attention of the research community. In this paper, we analyze the factors that might spark conversations in Twitter, for the English and Spanish languages. Using a corpus of 2.7 million tweets, we reconstruct existing conversations, then extract several contextual and content features. Based on the features extracted, we train and evaluate several predictive models to identify tweets that will spark a conversation. Our findings show that conversations are more likely to be initiated by users with high activity level and popularity. For less popular users, the type of content generated is a more important factor. Experimental results shows that the best predictive model is able obtain an average score F1=0.80F1=0.80F1=0.80. We made available the dataset scripts and code used in this paper to the research community via Github.
Companion Proceedings of The 2019 World Wide Web Conference, 2019
The role of social networks during natural disasters is becoming crucial to share relevant inform... more The role of social networks during natural disasters is becoming crucial to share relevant information and coordinate relief actions. With the reach of the social networks, any user around the world has the possibility of interact in crisis-events as these unfold. A large part of the information posted during a disaster uses the native language where the disaster occurred. However, there are also users from other parts of the world who can comment about the event, often in another language. In this work, we conducted a study of crisis-related tweets about the earthquake that occurred in Ecuador in April 2016. To that end, we introduce a new annotated dataset in both Spanish and English languages with approximately 8K tweets; half of them belong to conversations. We evaluate several neural architectures to identify crisis-related tweets in a multi-lingual setting, and we found that deep contextual multi-lingual embeddings outperform other strong baseline models. We then explore the type of conversations that occur from the perspective of different languages. The results show that certain types of conversations occur more in the native language and others in a foreign language. Conversations from foreign countries seek to gather situation awareness and give emotional support, while in the affected country the conversations aim mainly to humanitarian aid.
Expert Systems with Applications, 2020
This is a PDF file of an article that has undergone enhancements after acceptance, such as the ad... more This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Proceedings of the 13th International Workshop on Semantic Evaluation, 2019
In this paper, we propose the use of a Convolutional Neural Network (CNN) to identify offensive t... more In this paper, we propose the use of a Convolutional Neural Network (CNN) to identify offensive tweets. We use an end-to-end model (i.e., no preprocessing) and fine-tune pretrained embeddings (FastText) during training for learning words' representation. We compare the proposed CNN model to a baseline model, such as Linear Regression, and several neural models. The results show that CNN outperforms other models, and stands as a simple but strong baseline in comparison to other systems submitted to the Shared Task.
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, 2018
This paper addresses the problem of automatic recognition of emotions in text-only conversational... more This paper addresses the problem of automatic recognition of emotions in text-only conversational datasets for the EmotionX challenge. Emotion is a human characteristic expressed through several modalities (e.g., auditory, visual, tactile), therefore, trying to detect emotions only from the text becomes a difficult task even for humans. This paper evaluates several neural architectures based on Attention Models, which allow extracting relevant parts of the context within a conversation to identify the emotion associated with each utterance. Empirical results the effectiveness of the attention model for the Emo-tionPush dataset compared to the baseline models, and other cases show better results with simpler models.
2016 IEEE Latin American Conference on Computational Intelligence (LA-CCI), 2016
Citizens engage in online discussions with more frequency each day producing content relevant loc... more Citizens engage in online discussions with more frequency each day producing content relevant locally and globally. Finding influencers, who drive the agenda of such content on Twitter, has become a challenging task. An important factor that boosts the user influence is the geographic proximity with his peers [1]. Based on this finding from previous work, we propose ProximityRank, an extension of the TwitterRank [2] algorithm that brings distance to the equation. ProximityRank exhibits a higher accuracy in ranking users' influence because it takes into account geographic proximity among users, in addition to the similarity of topics in their tweets. Using a dataset of 2.8M tweets, we conduct experiments in different scenarios showing that ProximityRank outperforms previous techniques in the quality of recommendation about whom to follow.
2016 Third International Conference on eDemocracy & eGovernment (ICEDEG), 2016
The popularity of social networks, such as Twitter, have provided users around the world the abil... more The popularity of social networks, such as Twitter, have provided users around the world the ability to share information, express opinions or sentiments about any topic. Twitter has become the preferred social network platform used by researchers for measuring popularity or influence of users in social networks. This study seeks to extend analysis of influential users in the spatial context of Ecuador, applying computational intelligence techniques in order to identify influential users and for those users calculate its ranking. The results show that a careful selection and normalization of features found in the Twitter user's profile, allows us to detect influential users with high degree of accuracy, and then calculate the ranking only over those users. This approach provide a quicker method compared to previous techniques for determining the ranking by filtering non-influential users.
Langmuir, 2004
... 4) Aviram, A.; Ratner, MA Chem. Phys. Lett.1974, 29, 277−283. ... 24) Price, DW; Dirk, SM; Ma... more ... 4) Aviram, A.; Ratner, MA Chem. Phys. Lett.1974, 29, 277−283. ... 24) Price, DW; Dirk, SM; Maya, F.; Tour, JM Tetrahedron2003, 59, 2497−2518. ...
HIV Clinical Trials, 2008
To evaluate the satisfaction with self-injected enfuvirtide (ENF) and the clinical outcome of HIV... more To evaluate the satisfaction with self-injected enfuvirtide (ENF) and the clinical outcome of HIV-infected patients without very advanced disease. ESPPE is a multicenter observational study that included 103 evaluated patients showing baseline characteristics predictive of positive outcome: CD4 >100 cells/mm3, viral load (VL) <100,000 copies/mL, previous treatment with a maximum of 10 antiretroviral drugs, and concomitant use of 2 active drugs. By using validated surveys, patients were questioned 6 months after the prescription of ENF about their quality of life (QoL) and acceptance of self-injections and adherence to the treatment. At 6 months, the mean CD4 increase was 121 cells/mm3 (p < .05) and 65% (intent-to-treat, ENF stopped=failure) had VL <50 copies/mL (p < .001). Fourteen patients discontinued the treatment, mostly due to intolerance (6). The majority (>89%) assessed all items relating QoL as "excellent," "very good," or "good." The treatment satisfaction index on a visual analog scale scored a median of 8.1 out of 10; when participants were asked about the interference of injections on their daily activities, 87% answered "never" or "only sometimes." Effectiveness and patients' perception about ENF remain good when ENF was used in patients without very advanced disease. QoL was not impaired after ENF use.
2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM), 2017
Wikipedia, as the largest online encyclopedia, is edited collaboratively by hundreds of users. Th... more Wikipedia, as the largest online encyclopedia, is edited collaboratively by hundreds of users. The content in some articles can have dispute, giving rise to discussions which are registered in the related talk pages. In this paper, we propose an annotation schema for Spanish Wikipedia talk pages in order to determine the type of opinions expressed in them. We apply the annotation schema to a corpus that includes a collection of discussions about 148 topics drawn from 25 Spanish Wikipedia talk pages. We make the resulting dataset publicly available for download on github1. Furthermore, we train and evaluate supervised machine learning models to automatically identify the annotation labels. Linear Support Vector classifier (LinearSVC) performs better compared to other baseline models, and achieves an accuracy F1 = 0.71 in our experiments.
Expert Systems with Applications, 2019
The massive amounts of data on social media networks can be overwhelming for users; for this reas... more The massive amounts of data on social media networks can be overwhelming for users; for this reason, recommending relevant content becomes an essential task to avoid information overload. In this paper, we propose a new task for recommending users that might be interested in join conversations on specific domains. To that end, we introduce a new corpus that contains conversations threads from popular users on Twitter on domains such as politics, sports, or humanitarian activism. Modeling short-text conversations on microblogs can be difficult because user-generated data is unstructured and noisy. Previous works focused on recommending content to users based on latent factors models and collaborative filtering methods. We propose a state-of-the-art recommendation model based on a sequence-to-sequence neural architecture that encodes the text of users’ profiles and the conversations’ context using several variants of recurrent neural networks. The experimental results show that our method provides as much as 20% higher recall compared to baseline methods. Moreover, we use an end-to-end learning framework that allows downstream applications to use recommender systems (RSs) that generalize better to new content by using pre-trained embeddings, thus being useful across domains or events.
2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG), 2017
In universities worldwide, instructors may spend a significant amount of time reviewing homework ... more In universities worldwide, instructors may spend a significant amount of time reviewing homework and group projects submitted by their students. Web-based technologies, like Google Docs, have provided a platform for students to write documents collaboratively. Currently, those platforms provide limited information on the individual contribution made by each student. Previous studies have focused on the quantitative aspects of individuals' contribution in collaborative writing, while the quality aspect has received less attention. In this paper, we propose a new model to measure not only quantitative input but also the quality of the content that has been contributed to a document written collaboratively in Spanish language. Based on topics-modeling techniques, we use an adaptive non-negative matrix factorization (NMF) model to extract topics from the content of the document, and grade higher students making those contributions. Using Google documents submitted by students to the academic system of our university as part of their projects, experimental results show that compared to other baseline methods such as edits or words count, our model provide a better approximation to the scores given by human reviewers. Therefore, our model can be used as part of an automatic grading subsystem within the academic system, to provide a baseline score of students' contribution in collaborative documents. This will allow instructors to reduce their workload associated with revision and grading of documents and focus their time on more relevant tasks.
2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM), 2017
Recommender systems have proven to successfully influence user decisions on several applications ... more Recommender systems have proven to successfully influence user decisions on several applications and websites such as Amazon, eBay, Netflix, Spotify, among others. Largely, these systems rely on collaborative filtering to make useful recommendations but could suffer from cold start issues, i.e., lacking enough information for new users. By harnessing the popularity of social media (e.g. Twitter, Facebook) and other social media platforms, knowledge about users can be extracted, including intent of users that engage in conversations. This can provide additional information to applications using recommender systems in order to overcome issues like cold start. To that end, this research will tackle several challenges of natural language understanding in the context of conversations on social media platforms, such as: clustering, classification and summarization of conversations. Thus, the models developed will allow to extract knowledge from microblogs conversations that could be used in several applications.
Advances in Intelligent Systems and Computing, 2018
Nowadays, a growing number of people publicly share information about their fitness activities on... more Nowadays, a growing number of people publicly share information about their fitness activities on social media platforms like Twitter or Facebook. These social networks can furnish people with useful information to get an overview of different geographic areas where people can practice different sport-related activities. In this study, we analyze 14 million tweets to identify places to perform fitness activities and uncovering their aspects from twitterers’ opinions. To this end, we apply clustering analysis to uncover places where twitterers perform fitness activities, and then train a text classifier that achieves a score F1 of \(76\%\) to discriminate the aspects of fitness places. Using this information, recommender systems can provide useful information to local people or tourists that look for places to do exercise.
El crecimiento exponencial del uso de las redes sociales permite a los usuarios comunicarse direc... more El crecimiento exponencial del uso de las redes sociales permite a los usuarios comunicarse directamente con la audiencia y causar impacto. Lideres politicos hacen uso cada vez mas de dichas plataformas para interactuar con sus seguidores. En el presente proyecto se emplean tecnicas de aprendizaje de maquina no supervisado para conocer cuales son las tematicas que los lideres politicos colocan en la opinion publica a traves de las redes sociales y, determinar como se relacionan los titulares de periodicos digitales en un determinado periodo de tiempo. Se ha obtenido datos de lideres politicos en la plataforma Twitter para llevar a cabo experimentos que permitan aplicar tecnicas de clustering de documentos para extraer los topicos relevantes y, los resultados obtenidos se han evaluado con datos de las publicaciones digitales de diario El Universo y diario El Telegrafo en el periodo que comprende el estudio. De esta forma se ha podido identificar que la agenda mediatica corresponde a ...
2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG), 2017
Each year, natural disasters cause severe damage to infrastructure and lead to significant losses... more Each year, natural disasters cause severe damage to infrastructure and lead to significant losses of human lives. Non-governmental organizations (NGOs) and activists use social media networks to disseminate and organize relief efforts, but are often unable to efficiently gather volunteers and resources. Harnessing the power of crowdsourced social media services (e.g. Twitter, Facebook, Google+), we can provide trustworthy channels that can contribute in the relief efforts lead by NGOs and activists. This research project seeks to determine the best approach to build a social recommender model based on the dynamics of human behavior exhibited by fine-grained geospatial footprints of citizens on social networks. Thus, the aim is to provide location-aware recommendations of trustworthy activists, enabling citizens to contribute more efficiently during events, such as, natural disasters.
The massive data generated by users in online platforms, such as social networks, create challeng... more The massive data generated by users in online platforms, such as social networks, create challenges for text classification tasks based on supervised learning. Supervised learning often requires a lot of feature engineering or a significant amount of annotated data to achieve good results. However, the scarcity of annotated data is a critical issue, and manual annotation can be both costly and time-consuming. Semi-supervised learning requires far less annotated data and achieve similar performance as supervised approaches. In this paper, we introduce a semi-supervised neural architecture for muti-label settings, that combines deep learning representation and k-means clustering. The results show that the semi-supervised approach can leverage large-scale unlabeled data and achieve better results compared to baseline unsupervised as well as supervised methods.
La mineria de datos aporta una informacion invaluable que puede ser usada para mejorar los proces... more La mineria de datos aporta una informacion invaluable que puede ser usada para mejorar los procesos en diversas areas tales como la medicina, comercio, informatica, entre otras. Dentro del area informatica, la infraestructura de servidores web, servidores multimedia y servidores de archivos, necesita estar preparada para el manejo de grandes volumenes de datos de manera eficiente. Uno de los retos consiste en optimizar el acceso a los datos mediante el diseno y uso de caches, de tal manera que permita minimizar el uso de los servidores y el trafico de red. El presente trabajo evalua algunas tecnicas de mineria de datos que permiten encontrar patrones de acceso a datos que pueden ser utilizados por disenadores de algoritmos de gestion de caches para tomar decisiones de desalojo mas efectivas y su consecuente mejora en el rendimiento de la cache. Se presentan los resultados de las evaluaciones de una manera sistematica, los cuales pueden ser aprovechados por otros investigadores en el...
Multi-task learning is a framework that enforces different learning tasks to share their knowledg... more Multi-task learning is a framework that enforces different learning tasks to share their knowledge to improve their generalization performance. It is a hot and active domain that strives to handle several core issues; particularly, which tasks are correlated and similar, and how to share the knowledge among correlated tasks. Existing works usually do not distinguish the polarity and magnitude of feature weights and commonly rely on linear correlation, due to three major technical challenges in: 1) optimizing the models that regularize feature weight polarity, 2) deciding whether to regularize sign or magnitude, 3) identifying which tasks should share their sign and/or magnitude patterns. To address them, this paper proposes a new multi-task learning framework that can regularize feature weight signs across tasks. We innovatively formulate it as a biconvex inequality constrained optimization with slacks and propose a new efficient algorithm for the optimization with theoretical guara...
2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG), 2017
The growing interest of our society for health issues and physical activities is reflected on soc... more The growing interest of our society for health issues and physical activities is reflected on social media services where people share information about their daily activities and accomplishments. Fitness activities has become a routine present in many people's daily activities. Previous studies have analyzed people commitment to enhance their lifestyle and reach health objectives using data published in social platforms such as Twitter. However, there is a lack of designed metrics to quantify how users are engaged in physical activities in different areas of a given city. In this study, we collect and analyze 55K tweets posted by people in Ecuador, through different mobile applications, reflecting their participation on fitness and sport activities. Aggregating individual posts at city level, we uncover geographical patterns and dynamics of citizen activities in cities and states. Thus, we illustrate the potential of geolocated posts of people on social media services as sociometers of cities' health and fitness activities.
Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, 2017
Nowadays, microblog platforms provide a medium to share content and interact with other users. Wi... more Nowadays, microblog platforms provide a medium to share content and interact with other users. With the large-scale data generated on these platforms, the origin and reasons of users engagement in conversations has attracted the attention of the research community. In this paper, we analyze the factors that might spark conversations in Twitter, for the English and Spanish languages. Using a corpus of 2.7 million tweets, we reconstruct existing conversations, then extract several contextual and content features. Based on the features extracted, we train and evaluate several predictive models to identify tweets that will spark a conversation. Our findings show that conversations are more likely to be initiated by users with high activity level and popularity. For less popular users, the type of content generated is a more important factor. Experimental results shows that the best predictive model is able obtain an average score F1=0.80F1=0.80F1=0.80. We made available the dataset scripts and code used in this paper to the research community via Github.
Companion Proceedings of The 2019 World Wide Web Conference, 2019
The role of social networks during natural disasters is becoming crucial to share relevant inform... more The role of social networks during natural disasters is becoming crucial to share relevant information and coordinate relief actions. With the reach of the social networks, any user around the world has the possibility of interact in crisis-events as these unfold. A large part of the information posted during a disaster uses the native language where the disaster occurred. However, there are also users from other parts of the world who can comment about the event, often in another language. In this work, we conducted a study of crisis-related tweets about the earthquake that occurred in Ecuador in April 2016. To that end, we introduce a new annotated dataset in both Spanish and English languages with approximately 8K tweets; half of them belong to conversations. We evaluate several neural architectures to identify crisis-related tweets in a multi-lingual setting, and we found that deep contextual multi-lingual embeddings outperform other strong baseline models. We then explore the type of conversations that occur from the perspective of different languages. The results show that certain types of conversations occur more in the native language and others in a foreign language. Conversations from foreign countries seek to gather situation awareness and give emotional support, while in the affected country the conversations aim mainly to humanitarian aid.
Expert Systems with Applications, 2020
This is a PDF file of an article that has undergone enhancements after acceptance, such as the ad... more This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Proceedings of the 13th International Workshop on Semantic Evaluation, 2019
In this paper, we propose the use of a Convolutional Neural Network (CNN) to identify offensive t... more In this paper, we propose the use of a Convolutional Neural Network (CNN) to identify offensive tweets. We use an end-to-end model (i.e., no preprocessing) and fine-tune pretrained embeddings (FastText) during training for learning words' representation. We compare the proposed CNN model to a baseline model, such as Linear Regression, and several neural models. The results show that CNN outperforms other models, and stands as a simple but strong baseline in comparison to other systems submitted to the Shared Task.
Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, 2018
This paper addresses the problem of automatic recognition of emotions in text-only conversational... more This paper addresses the problem of automatic recognition of emotions in text-only conversational datasets for the EmotionX challenge. Emotion is a human characteristic expressed through several modalities (e.g., auditory, visual, tactile), therefore, trying to detect emotions only from the text becomes a difficult task even for humans. This paper evaluates several neural architectures based on Attention Models, which allow extracting relevant parts of the context within a conversation to identify the emotion associated with each utterance. Empirical results the effectiveness of the attention model for the Emo-tionPush dataset compared to the baseline models, and other cases show better results with simpler models.
2016 IEEE Latin American Conference on Computational Intelligence (LA-CCI), 2016
Citizens engage in online discussions with more frequency each day producing content relevant loc... more Citizens engage in online discussions with more frequency each day producing content relevant locally and globally. Finding influencers, who drive the agenda of such content on Twitter, has become a challenging task. An important factor that boosts the user influence is the geographic proximity with his peers [1]. Based on this finding from previous work, we propose ProximityRank, an extension of the TwitterRank [2] algorithm that brings distance to the equation. ProximityRank exhibits a higher accuracy in ranking users' influence because it takes into account geographic proximity among users, in addition to the similarity of topics in their tweets. Using a dataset of 2.8M tweets, we conduct experiments in different scenarios showing that ProximityRank outperforms previous techniques in the quality of recommendation about whom to follow.
2016 Third International Conference on eDemocracy & eGovernment (ICEDEG), 2016
The popularity of social networks, such as Twitter, have provided users around the world the abil... more The popularity of social networks, such as Twitter, have provided users around the world the ability to share information, express opinions or sentiments about any topic. Twitter has become the preferred social network platform used by researchers for measuring popularity or influence of users in social networks. This study seeks to extend analysis of influential users in the spatial context of Ecuador, applying computational intelligence techniques in order to identify influential users and for those users calculate its ranking. The results show that a careful selection and normalization of features found in the Twitter user's profile, allows us to detect influential users with high degree of accuracy, and then calculate the ranking only over those users. This approach provide a quicker method compared to previous techniques for determining the ranking by filtering non-influential users.
Langmuir, 2004
... 4) Aviram, A.; Ratner, MA Chem. Phys. Lett.1974, 29, 277−283. ... 24) Price, DW; Dirk, SM; Ma... more ... 4) Aviram, A.; Ratner, MA Chem. Phys. Lett.1974, 29, 277−283. ... 24) Price, DW; Dirk, SM; Maya, F.; Tour, JM Tetrahedron2003, 59, 2497−2518. ...
HIV Clinical Trials, 2008
To evaluate the satisfaction with self-injected enfuvirtide (ENF) and the clinical outcome of HIV... more To evaluate the satisfaction with self-injected enfuvirtide (ENF) and the clinical outcome of HIV-infected patients without very advanced disease. ESPPE is a multicenter observational study that included 103 evaluated patients showing baseline characteristics predictive of positive outcome: CD4 >100 cells/mm3, viral load (VL) <100,000 copies/mL, previous treatment with a maximum of 10 antiretroviral drugs, and concomitant use of 2 active drugs. By using validated surveys, patients were questioned 6 months after the prescription of ENF about their quality of life (QoL) and acceptance of self-injections and adherence to the treatment. At 6 months, the mean CD4 increase was 121 cells/mm3 (p < .05) and 65% (intent-to-treat, ENF stopped=failure) had VL <50 copies/mL (p < .001). Fourteen patients discontinued the treatment, mostly due to intolerance (6). The majority (>89%) assessed all items relating QoL as "excellent," "very good," or "good." The treatment satisfaction index on a visual analog scale scored a median of 8.1 out of 10; when participants were asked about the interference of injections on their daily activities, 87% answered "never" or "only sometimes." Effectiveness and patients' perception about ENF remain good when ENF was used in patients without very advanced disease. QoL was not impaired after ENF use.
2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM), 2017
Wikipedia, as the largest online encyclopedia, is edited collaboratively by hundreds of users. Th... more Wikipedia, as the largest online encyclopedia, is edited collaboratively by hundreds of users. The content in some articles can have dispute, giving rise to discussions which are registered in the related talk pages. In this paper, we propose an annotation schema for Spanish Wikipedia talk pages in order to determine the type of opinions expressed in them. We apply the annotation schema to a corpus that includes a collection of discussions about 148 topics drawn from 25 Spanish Wikipedia talk pages. We make the resulting dataset publicly available for download on github1. Furthermore, we train and evaluate supervised machine learning models to automatically identify the annotation labels. Linear Support Vector classifier (LinearSVC) performs better compared to other baseline models, and achieves an accuracy F1 = 0.71 in our experiments.
Expert Systems with Applications, 2019
The massive amounts of data on social media networks can be overwhelming for users; for this reas... more The massive amounts of data on social media networks can be overwhelming for users; for this reason, recommending relevant content becomes an essential task to avoid information overload. In this paper, we propose a new task for recommending users that might be interested in join conversations on specific domains. To that end, we introduce a new corpus that contains conversations threads from popular users on Twitter on domains such as politics, sports, or humanitarian activism. Modeling short-text conversations on microblogs can be difficult because user-generated data is unstructured and noisy. Previous works focused on recommending content to users based on latent factors models and collaborative filtering methods. We propose a state-of-the-art recommendation model based on a sequence-to-sequence neural architecture that encodes the text of users’ profiles and the conversations’ context using several variants of recurrent neural networks. The experimental results show that our method provides as much as 20% higher recall compared to baseline methods. Moreover, we use an end-to-end learning framework that allows downstream applications to use recommender systems (RSs) that generalize better to new content by using pre-trained embeddings, thus being useful across domains or events.