Michael Tadesse | Dalian University of Technology (original) (raw)

Uploads

Papers by Michael Tadesse

Research paper thumbnail of Multi-granularity Convolutional Neural Network with Feature Fusion and Refinement for User Profiling

Lecture Notes in Computer Science, 2019

User profiling is an important research topic in social media analysis, which has great value in ... more User profiling is an important research topic in social media analysis, which has great value in research and industries. Existing research on user profiling has mostly focused on manually handcrafted features for user attribute prediction. However, the research has partly overlooked the social relation of users. To address the problem, we propose a multi-granularity convolutional neural network model with feature fusion and refinement. Our model leverages the convolution mechanism to automatically extract user latent semantic features with respect to their attributes from social texts. We also combine different machine learning methods using the stacking mechanism for feature refinement. The proposed model can capture the social relation of users by combining semantic context and social network information, and improve the performance of attribute classification. We evaluate our model based on the dataset from SMP CUP 2016 competition. The experimental results demonstrate that the proposed model is effective in automatic user attribute classification with a particular focus on fine-grained user information.

Research paper thumbnail of Chinese medical relation extraction based on multi-hop self-attention mechanism

International Journal of Machine Learning and Cybernetics, 2020

The medical literature is the most important way to demonstrate academic achievements and academi... more The medical literature is the most important way to demonstrate academic achievements and academic exchanges. Massive medical literature has become a huge treasure trove of knowledge. It is necessary to automatically extract implicit medical knowledge from the medical literature. Medical relation extraction aims to automatically extract medical relations from the medical text for various medical researches. However, there are a few kinds of research in Chinese medical literature. Currently, the popular methods are based on neural networks, which focus on semantic information on one aspect of the sentence. However, complex semantic information in the sentence determines the relation between entities, the semantic information cannot be represented by one sentence vector. In this paper, we propose an attention-based model to extract the multi-aspect semantic information for the Chinese medical relation extraction by multi-hop attention mechanism. The model could generate multiple weight vectors for the sentence through each attention step, therefore, we can generate the different semantic representation of a sentence, respectively. Our model is evaluated by using Chinese medical literature from China National Knowledge Infrastructure (CNKI). It achieves an F1 score of 93.19% for therapeutic relation tasks and 73.47% for causal relation tasks.

Research paper thumbnail of Interactive Self-Attentive Siamese Network for Biomedical Sentence Similarity

IEEE Access, 2020

The determination of semantic similarity between sentences is an important component in natural l... more The determination of semantic similarity between sentences is an important component in natural language processing (NLP) tasks such as text retrieval and text summarization. Many approaches have been proposed for estimating sentence similarity, and Siamese neural networks (SNN) provide a better approach. However, the sentence semantic representation, generated by sharing weights in the SNN without any attention mechanism, ignores the different contributions of different words to the overall sentence semantics. Furthermore, the attention operation within only a single sentence neglects interactive semantic influence on similarity estimation. To address these issues, an interactive self-attention (ISA) mechanism is proposed in this paper and integrated with an SNN, named an interactive self-attentive Siamese neural network (ISA-SNN) which is used to verify the effectiveness of ISA. The proposed model obtains the weights of words in a single sentence by means of self-attention and extracts inherent interactive semantic information between sentences via interactive attention to enhance sentence semantic representation. It achieves better performances without feature engineering than other existing methods on three biomedical benchmark datasets (a Pearson correlation coefficient of 0.656 and 0.713/0.658 on DBMI and CDD-ful/-ref, respectively).

Research paper thumbnail of Detection of Suicide Ideation in Social Media Forums Using Deep Learning

Algorithms, 2019

Suicide ideation expressed in social media has an impact on language usage. Many at-risk individu... more Suicide ideation expressed in social media has an impact on language usage. Many at-risk individuals use social forum platforms to discuss their problems or get access to information on similar tasks. The key objective of our study is to present ongoing work on automatic recognition of suicidal posts. We address the early detection of suicide ideation through deep learning and machine learning-based classification approaches applied to Reddit social media. For such purpose, we employ an LSTM-CNN combined model to evaluate and compare to other classification models. Our experiment shows the combined neural network architecture with word embedding techniques can achieve the best relevance classification results. Additionally, our results support the strength and ability of deep learning architectures to build an effective model for a suicide risk assessment in various text classification tasks.

Research paper thumbnail of Detection of Depression-Related Posts in Reddit Social Media Forum

IEEE Access, 2019

Depression is viewed as the largest contributor to global disability and a major reason for suici... more Depression is viewed as the largest contributor to global disability and a major reason for suicide. It has an impact on the language usage reflected in the written text. The key objective of our study is to examine Reddit users' posts to detect any factors that may reveal the depression attitudes of relevant online users. For such purpose, we employ the Natural Language Processing (NLP) techniques and machine learning approaches to train the data and evaluate the efficiency of our proposed method. We identify a lexicon of terms that are more common among depressed accounts. The results show that our proposed method can significantly improve performance accuracy. The best single feature is bigram with the Support Vector Machine (SVM) classifier to detect depression with 80% accuracy and 0.80 F1 scores. The strength and effectiveness of the combined features (LIWC+LDA+bigram) are most successfully demonstrated with the Multilayer Perceptron (MLP) classifier resulting in the top performance for depression detection reaching 91% accuracy and 0.93 F1 scores. According to our study, better performance improvement can be achieved by proper feature selections and their multiple feature combinations. INDEX TERMS Natural language processing, machine learning, Reddit, social networks, depression.

Research paper thumbnail of Personality Predictions Based on User Behavior on the Facebook Social Media Platform

IEEE Access, 2018

With the development of social networks, a large variety of approaches have been developed to def... more With the development of social networks, a large variety of approaches have been developed to define users' personalities based on their social activities and language use habits. Particular approaches differ with regard to different machine learning algorithms, data sources, and feature sets. The goal of this paper is to investigate the predictability of the personality traits of Facebook users based on different features and measures of the Big 5 model. We examine the presence of structures of social networks and linguistic features relative to personality interactions using the myPersonality project data set. We analyze and compare four machine learning models and perform the correlation between each of the feature sets and personality traits. The results for the prediction accuracy show that even if tested under the same data set, the personality prediction system built on the XGBoost classifier outperforms the average baseline for all the feature sets, with a highest prediction accuracy of 74.2%. The best prediction performance was reached for the extraversion trait by using the individual social network analysis features set, which achieved a higher personality prediction accuracy of 78.6%. INDEX TERMS Big 5, feature analysis, predicting personality, social behavior, social networks.

Research paper thumbnail of Multi-granularity Convolutional Neural Network with Feature Fusion and Refinement for User Profiling

Lecture Notes in Computer Science, 2019

User profiling is an important research topic in social media analysis, which has great value in ... more User profiling is an important research topic in social media analysis, which has great value in research and industries. Existing research on user profiling has mostly focused on manually handcrafted features for user attribute prediction. However, the research has partly overlooked the social relation of users. To address the problem, we propose a multi-granularity convolutional neural network model with feature fusion and refinement. Our model leverages the convolution mechanism to automatically extract user latent semantic features with respect to their attributes from social texts. We also combine different machine learning methods using the stacking mechanism for feature refinement. The proposed model can capture the social relation of users by combining semantic context and social network information, and improve the performance of attribute classification. We evaluate our model based on the dataset from SMP CUP 2016 competition. The experimental results demonstrate that the proposed model is effective in automatic user attribute classification with a particular focus on fine-grained user information.

Research paper thumbnail of Chinese medical relation extraction based on multi-hop self-attention mechanism

International Journal of Machine Learning and Cybernetics, 2020

The medical literature is the most important way to demonstrate academic achievements and academi... more The medical literature is the most important way to demonstrate academic achievements and academic exchanges. Massive medical literature has become a huge treasure trove of knowledge. It is necessary to automatically extract implicit medical knowledge from the medical literature. Medical relation extraction aims to automatically extract medical relations from the medical text for various medical researches. However, there are a few kinds of research in Chinese medical literature. Currently, the popular methods are based on neural networks, which focus on semantic information on one aspect of the sentence. However, complex semantic information in the sentence determines the relation between entities, the semantic information cannot be represented by one sentence vector. In this paper, we propose an attention-based model to extract the multi-aspect semantic information for the Chinese medical relation extraction by multi-hop attention mechanism. The model could generate multiple weight vectors for the sentence through each attention step, therefore, we can generate the different semantic representation of a sentence, respectively. Our model is evaluated by using Chinese medical literature from China National Knowledge Infrastructure (CNKI). It achieves an F1 score of 93.19% for therapeutic relation tasks and 73.47% for causal relation tasks.

Research paper thumbnail of Interactive Self-Attentive Siamese Network for Biomedical Sentence Similarity

IEEE Access, 2020

The determination of semantic similarity between sentences is an important component in natural l... more The determination of semantic similarity between sentences is an important component in natural language processing (NLP) tasks such as text retrieval and text summarization. Many approaches have been proposed for estimating sentence similarity, and Siamese neural networks (SNN) provide a better approach. However, the sentence semantic representation, generated by sharing weights in the SNN without any attention mechanism, ignores the different contributions of different words to the overall sentence semantics. Furthermore, the attention operation within only a single sentence neglects interactive semantic influence on similarity estimation. To address these issues, an interactive self-attention (ISA) mechanism is proposed in this paper and integrated with an SNN, named an interactive self-attentive Siamese neural network (ISA-SNN) which is used to verify the effectiveness of ISA. The proposed model obtains the weights of words in a single sentence by means of self-attention and extracts inherent interactive semantic information between sentences via interactive attention to enhance sentence semantic representation. It achieves better performances without feature engineering than other existing methods on three biomedical benchmark datasets (a Pearson correlation coefficient of 0.656 and 0.713/0.658 on DBMI and CDD-ful/-ref, respectively).

Research paper thumbnail of Detection of Suicide Ideation in Social Media Forums Using Deep Learning

Algorithms, 2019

Suicide ideation expressed in social media has an impact on language usage. Many at-risk individu... more Suicide ideation expressed in social media has an impact on language usage. Many at-risk individuals use social forum platforms to discuss their problems or get access to information on similar tasks. The key objective of our study is to present ongoing work on automatic recognition of suicidal posts. We address the early detection of suicide ideation through deep learning and machine learning-based classification approaches applied to Reddit social media. For such purpose, we employ an LSTM-CNN combined model to evaluate and compare to other classification models. Our experiment shows the combined neural network architecture with word embedding techniques can achieve the best relevance classification results. Additionally, our results support the strength and ability of deep learning architectures to build an effective model for a suicide risk assessment in various text classification tasks.

Research paper thumbnail of Detection of Depression-Related Posts in Reddit Social Media Forum

IEEE Access, 2019

Depression is viewed as the largest contributor to global disability and a major reason for suici... more Depression is viewed as the largest contributor to global disability and a major reason for suicide. It has an impact on the language usage reflected in the written text. The key objective of our study is to examine Reddit users' posts to detect any factors that may reveal the depression attitudes of relevant online users. For such purpose, we employ the Natural Language Processing (NLP) techniques and machine learning approaches to train the data and evaluate the efficiency of our proposed method. We identify a lexicon of terms that are more common among depressed accounts. The results show that our proposed method can significantly improve performance accuracy. The best single feature is bigram with the Support Vector Machine (SVM) classifier to detect depression with 80% accuracy and 0.80 F1 scores. The strength and effectiveness of the combined features (LIWC+LDA+bigram) are most successfully demonstrated with the Multilayer Perceptron (MLP) classifier resulting in the top performance for depression detection reaching 91% accuracy and 0.93 F1 scores. According to our study, better performance improvement can be achieved by proper feature selections and their multiple feature combinations. INDEX TERMS Natural language processing, machine learning, Reddit, social networks, depression.

Research paper thumbnail of Personality Predictions Based on User Behavior on the Facebook Social Media Platform

IEEE Access, 2018

With the development of social networks, a large variety of approaches have been developed to def... more With the development of social networks, a large variety of approaches have been developed to define users' personalities based on their social activities and language use habits. Particular approaches differ with regard to different machine learning algorithms, data sources, and feature sets. The goal of this paper is to investigate the predictability of the personality traits of Facebook users based on different features and measures of the Big 5 model. We examine the presence of structures of social networks and linguistic features relative to personality interactions using the myPersonality project data set. We analyze and compare four machine learning models and perform the correlation between each of the feature sets and personality traits. The results for the prediction accuracy show that even if tested under the same data set, the personality prediction system built on the XGBoost classifier outperforms the average baseline for all the feature sets, with a highest prediction accuracy of 74.2%. The best prediction performance was reached for the extraversion trait by using the individual social network analysis features set, which achieved a higher personality prediction accuracy of 78.6%. INDEX TERMS Big 5, feature analysis, predicting personality, social behavior, social networks.