Abu Shoeb | Shahjalal University of Science and Technology (original) (raw)

Papers by Abu Shoeb

Research paper thumbnail of Computational methods to understand the association between emojis and emotions

Emojis have become ubiquitous in digital communication due to their visual appeal as well as thei... more Emojis have become ubiquitous in digital communication due to their visual appeal as well as their ability to vividly express human emotion, among other factors. They are also heavily used in customer surveys and feedback forms. Hence, there is a need for methods and resources that shed light on their meaning and communicative role. In this work, we seek to explore the connection between emojis and emotions by employing new resources and methodologies. First, we compile a unique corpus of ~20.8 million emoji-centric tweets, such that we can capture rich emoji semantics using a comparably small dataset. We then train a model to generate interpretable word-vectors and show how domain-specific emoji embedding gives better emotion prediction than other vanilla embeddings like Glove and Word2Vec. Second, we conduct annotation experiments for a set of 150 popular emojis. This gives 1,200 emoji-emotion pairs of human ratings of association concerning 8 basic human emotions such as anger, a...

Research paper thumbnail of Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Research paper thumbnail of An Extended Algorithm to Enhance the Performance of the Gridbus Broker with Data Restoring Technique

2009 International Conference on Computer Engineering and Technology, 2009

There are various types of Grids have been developed to support different types of Applications. ... more There are various types of Grids have been developed to support different types of Applications. The Gridbus broker that mainly focused on Data Grid mediates access to distributed resources by discovering suitable data and computational resources, job monitoring, accessing local or remote data sources during job execution, collecting and presenting results. In this paper, we present an enhanced version of Grid Service Broker scheduling on Global Data Grids with job restoration point. In the present version of scheduling algorithm, if any job associated with an application becomes failed during its runtime then the scheduling algorithm just marked the job as failed and reset. But it does not keep any track of the percentage of the work has already been done by the job. In Contrast, our proposed enhanced algorithm utilizes a restore point to store the proportion of task completed by an executor. In case of failure, it starts the job from that restored point rather than its initial point. From the experimental result, we have found that our proposed algorithm increases the performance of the present scheduling algorithm.

Research paper thumbnail of Assessing Emoji Use in Modern Text Processing Tools

ArXiv, 2021

Emojis have become ubiquitous in digital communication, due to their visual appeal as well as the... more Emojis have become ubiquitous in digital communication, due to their visual appeal as well as their ability to vividly convey human emotion, among other factors. This also leads to an increased need for systems and tools to operate on text containing emojis. In this study, we assess this support by considering test sets of tweets with emojis, based on which we perform a series of experiments investigating the ability of prominent NLP and text processing tools to adequately process them. In particular, we consider tokenization, part-of-speech tagging, dependency parsing, as well as sentiment analysis. Our findings show that many systems still have notable shortcomings when operating on text containing emojis.

Research paper thumbnail of Spam Campaign Cluster Detection Using Redirected URLs and Randomized SubDomains

A substantial majority of the email sent everyday is spam. Spam emails cause many problems if som... more A substantial majority of the email sent everyday is spam. Spam emails cause many problems if someone acts or clicks on the link provided in the email body. The problems may include infecting users personal machine with malware, stealing personal information, capturing credit card information, etc. Since spam emails are generated as a part of a very limited numbers of spam campaigns, it is useful to cluster spam messages into campaigns, so as to identify which campaigns are the largest. This enables investigation to focus this attention on the largest as the most significant clusters. In this paper, we present a method to cluster spam emails into spam campaigns. In our approach, the redirected URL has been chosen as the primary field for cluster formation. Our study shows that, a huge number of URLs arriving in spam email eventually points to a much smaller set of redirected URLs. Our multilevel clustering method grouped 90% of our half million spam emails into 4 spam campaigns. In ...

Research paper thumbnail of Is Private Browsing in Modern Web Browsers Really Private?

ArXiv, 2018

Web browsers are the most common tool to perform various activities over the internet. Along with... more Web browsers are the most common tool to perform various activities over the internet. Along with normal mode, all modern browsers have private browsing mode. The name of the mode varies from browser to browser but the purpose of the private mode remains same in every browser. In normal browsing mode, the browser keeps track of users' activity and related data such as browsing histories, cookies, auto-filled fields, temporary internet files, etc. In private mode, it is said that no information is stored while browsing or all information is destroyed after closing the current private session. However, some researchers have already disproved this claim by performing various tests in most popular browsers. I have also some personal experience where private mode browsing fails to keep all browsing information as private. In this position paper, I take the position against private browsing. By examining various facts, it is proved that the private browsing mode is not really private ...

Research paper thumbnail of A Comparative Study on I/O Performance between Compute and Storage Optimized Instances of Amazon EC2

2014 IEEE 7th International Conference on Cloud Computing, 2014

Cloud computing infrastructure helps users to minimize cost by outsourcing data and computation o... more Cloud computing infrastructure helps users to minimize cost by outsourcing data and computation on-demand. Due to the varying user needs in terms of computation power, storage capacity, etc., cloud providers offer various machines to choose from, to maximize the intended need. In this paper, we disprove several common conceptions regarding the performance and cost of cloud by experimenting on instances of two different families (compute and storage optimized) of the most popular cloud platform, Amazon Elastic Compute Cloud (EC2). Our analysis shows the interesting finding that, for the machines of the same configuration, storage optimized instances have lower disk readwrite speed than compute optimized, which does not completely reflect the claim made by Amazon in all cases. Additionally, storage optimized instances have notable performance difference among them. We also identify that the I/O performance of same instance type varies over different time periods.

Research paper thumbnail of Are Emojis Emotional? A Study to Understand the Association between Emojis and Emotions

ArXiv, 2020

Given the growing ubiquity of emojis in language, there is a need for methods and resources that ... more Given the growing ubiquity of emojis in language, there is a need for methods and resources that shed light on their meaning and communicative role. One conspicuous aspect of emojis is their use to convey affect in ways that may otherwise be non-trivial to achieve. In this paper, we seek to explore the connection between emojis and emotions by means of a new dataset consisting of human-solicited association ratings. We additionally conduct experiments to assess to what extent such associations can be inferred from existing data, such that similar associations can be predicted for a larger set of emojis. Our experiments show that this succeeds when high-quality word-level information is available.

Research paper thumbnail of EmoTag – Towards an Emotion-Based Analysis of Emojis

Proceedings - Natural Language Processing in a Deep Learning World

Despite being a fairly recent phenomenon, emojis have quickly become ubiquitous. Besides their ex... more Despite being a fairly recent phenomenon, emojis have quickly become ubiquitous. Besides their extensive use in social media, they are now also invoked in customer surveys and feedback forms. Hence, there is a need for techniques to understand their sentiment and emotion. In this work, we provide a method to quantify the emotional association of basic emotions such as anger, fear, joy, and sadness for a set of emojis. We collect and process a unique corpus of 20 million emoji-centric tweets, such that we can capture rich emoji semantics using a comparably small dataset. We evaluate the induced emotion profiles of emojis with regard to their ability to predict word affect intensities as well as sentiment scores.

Research paper thumbnail of EmoTag1200: Understanding the Association between Emojis and Emotions

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Research paper thumbnail of Performance Analysis of MPI (mpi4py) on Diskless Cluster Environment in Ubuntu

International Journal of Computer Applications

Now-a-days Cluster computing has become a crying need for the processing of large scale data. For... more Now-a-days Cluster computing has become a crying need for the processing of large scale data. For computing large amount of data, which need huge execution time, the run time can be reduced using multiple processors and task distribution through cluster computing. It is the technique of sharing two or more computers' resources through a network (usually through a local area network) in order to take advantage of the parallel processing power of those computers. Clusters of computers are usually deployed to improve processing speed and/or reliability and scalability over that provided by a single computer. In this paper we proposed a High Performance computing approach on Linux platform (Ubuntu) using Parallel Programming environment with the collaboration of multiple nodes for large scale computational work.

Research paper thumbnail of Effect of Homogeneous and Heterogeneous Network Structure on Alchemi Based Grid Computing Platform

Modern world is evolving to an era of collaborative computing from personal computing. By the lat... more Modern world is evolving to an era of collaborative computing from personal computing. By the latest few years Grid computing has been established as a means of collaboration for human civilization in many fields. This paper concerned on Alchemi which is a .net based Desktop Grid Computing Framework. Alchemi uses the unutilized processing power, resources and by combining a number of PCs it creates a virtual super computer. Depending on the hosts' configuration we can define the network of PCs as Homogeneous Network or Heterogeneous Network that eventually serve as a grid platform. Heterogeneous network can be defined as a LAN working together with different hardware and/or software configuration and protocol. In the same way we can define Homogeneous Network as a Network of PCs with same processing power and same protocol. This paper inspects the effect of Heterogeneous and Homogeneous Network on a grid computing platform. Thus we created a test bed where the Homogeneous and Heterogeneous Network have total same processing power. We executed a simple computational application and recorded the result for different number of threads and different size of that application. Processing the data shows us for smaller number of tasks both Networks works almost similar but for bigger tasks Homogeneous networks work better by a considerable amount as the task size increases. So, depending on this result we suggest to have grid platform as more likely to be Homogeneous Network.

Research paper thumbnail of Spam Campaign Cluster Detection Using Redirected URLs and Randomized Sub-Domains

ABSTRACT A substantial majority of the email sent everyday is spam. Spam emails cause many proble... more ABSTRACT A substantial majority of the email sent everyday is spam. Spam emails cause many problems if someone acts or clicks on the link provided in the email body. The problems may include infecting users personal machine with malware, stealing personal information, capturing credit card information, etc. Since spam emails are generated as a part of a very limited numbers of spam campaigns, it is useful to cluster spam messages into campaigns, so as to identify which campaigns are the largest. This enables investigation to focus this attention on the largest as the most significant clusters. In this paper, we present a method to cluster spam emails into spam campaigns. In our approach, the redirected URL has been chosen as the primary field for cluster formation. Our study shows that, a huge number of URLs arriving in spam email eventually points to a much smaller set of redirected URLs. Our multilevel clustering method grouped 90% of our half million spam emails into 4 spam campaigns. In addition to redirected URLs, we also use randomized sub domains, which come as a given URL in email body, for campaign identification. We believe that our model can be applied in real time to quickly detect major campaign.

Research paper thumbnail of A Comparative Study on I/O Performance between Compute and Storage Optimized Instances of Amazon EC2

2014 IEEE 7th International Conference on Cloud Computing, 2014

Research paper thumbnail of Runtime thread rescheduling: An extended scheduling algorithm to enhance the performance of the Gridbus broker

2008 IEEE International Multitopic Conference, 2008

Abstract Grid computing is rapidly becoming a requirement for the modern days computing where nee... more Abstract Grid computing is rapidly becoming a requirement for the modern days computing where needs large amount of data to be processed. The Gridbus broker focuses on the Data Grid and schedules jobs depending on data and compute resources. In the current scheduling process, a job is assigned to an executor depending on the compute resource and data resource available at the time of deployment. One major problem is, if there is an idle higher grade compute resource available after the scheduling, it doesn't take the ...

Research paper thumbnail of File based GRID thread implementation in the .NET-based Alchemi Framework

2008 IEEE International Multitopic Conference, 2008

Abstract Now a day, grid computing is considered as one of the emerging technology in which jobs ... more Abstract Now a day, grid computing is considered as one of the emerging technology in which jobs are distributed across the network or Internet. Among the several software toolkits those help us to implement a grid environment, Alchemi is widely used and open source toolkit that runs on the Windows operating system in the .NET Framework. The node which requests an application to be performed is called owner. The node that receives the requested application and sends result back to the owner is called manager. An ...

Research paper thumbnail of Runtime thread rescheduling: An extended scheduling algorithm to enhance the performance of the Gridbus broker

… , 2008. INMIC 2008. …, 2008

Grid computing is becoming a requirement for the processing of large amount of data now-a-days. T... more Grid computing is becoming a requirement for the processing of large amount of data now-a-days. The Gridbus broker schedules jobs depending on data and compute resources. Current scheduling process does not reassign a job from lower compute resource to higher compute resource if higher compute resource is available. In this paper, we have proposed a technique to reassign a thread to higher grade executor by preempting the thread in lower grade executor by using the data restoration technique which track the information of the thread so far ran on a lower rate compute resource. It is done only if there is an idle higher computer resource is available. The performance as well as the reliability of the Grid has been improved by this approach in a considerable extent.

Research paper thumbnail of Computational methods to understand the association between emojis and emotions

Emojis have become ubiquitous in digital communication due to their visual appeal as well as thei... more Emojis have become ubiquitous in digital communication due to their visual appeal as well as their ability to vividly express human emotion, among other factors. They are also heavily used in customer surveys and feedback forms. Hence, there is a need for methods and resources that shed light on their meaning and communicative role. In this work, we seek to explore the connection between emojis and emotions by employing new resources and methodologies. First, we compile a unique corpus of ~20.8 million emoji-centric tweets, such that we can capture rich emoji semantics using a comparably small dataset. We then train a model to generate interpretable word-vectors and show how domain-specific emoji embedding gives better emotion prediction than other vanilla embeddings like Glove and Word2Vec. Second, we conduct annotation experiments for a set of 150 popular emojis. This gives 1,200 emoji-emotion pairs of human ratings of association concerning 8 basic human emotions such as anger, a...

Research paper thumbnail of Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Research paper thumbnail of An Extended Algorithm to Enhance the Performance of the Gridbus Broker with Data Restoring Technique

2009 International Conference on Computer Engineering and Technology, 2009

There are various types of Grids have been developed to support different types of Applications. ... more There are various types of Grids have been developed to support different types of Applications. The Gridbus broker that mainly focused on Data Grid mediates access to distributed resources by discovering suitable data and computational resources, job monitoring, accessing local or remote data sources during job execution, collecting and presenting results. In this paper, we present an enhanced version of Grid Service Broker scheduling on Global Data Grids with job restoration point. In the present version of scheduling algorithm, if any job associated with an application becomes failed during its runtime then the scheduling algorithm just marked the job as failed and reset. But it does not keep any track of the percentage of the work has already been done by the job. In Contrast, our proposed enhanced algorithm utilizes a restore point to store the proportion of task completed by an executor. In case of failure, it starts the job from that restored point rather than its initial point. From the experimental result, we have found that our proposed algorithm increases the performance of the present scheduling algorithm.

Research paper thumbnail of Assessing Emoji Use in Modern Text Processing Tools

ArXiv, 2021

Emojis have become ubiquitous in digital communication, due to their visual appeal as well as the... more Emojis have become ubiquitous in digital communication, due to their visual appeal as well as their ability to vividly convey human emotion, among other factors. This also leads to an increased need for systems and tools to operate on text containing emojis. In this study, we assess this support by considering test sets of tweets with emojis, based on which we perform a series of experiments investigating the ability of prominent NLP and text processing tools to adequately process them. In particular, we consider tokenization, part-of-speech tagging, dependency parsing, as well as sentiment analysis. Our findings show that many systems still have notable shortcomings when operating on text containing emojis.

Research paper thumbnail of Spam Campaign Cluster Detection Using Redirected URLs and Randomized SubDomains

A substantial majority of the email sent everyday is spam. Spam emails cause many problems if som... more A substantial majority of the email sent everyday is spam. Spam emails cause many problems if someone acts or clicks on the link provided in the email body. The problems may include infecting users personal machine with malware, stealing personal information, capturing credit card information, etc. Since spam emails are generated as a part of a very limited numbers of spam campaigns, it is useful to cluster spam messages into campaigns, so as to identify which campaigns are the largest. This enables investigation to focus this attention on the largest as the most significant clusters. In this paper, we present a method to cluster spam emails into spam campaigns. In our approach, the redirected URL has been chosen as the primary field for cluster formation. Our study shows that, a huge number of URLs arriving in spam email eventually points to a much smaller set of redirected URLs. Our multilevel clustering method grouped 90% of our half million spam emails into 4 spam campaigns. In ...

Research paper thumbnail of Is Private Browsing in Modern Web Browsers Really Private?

ArXiv, 2018

Web browsers are the most common tool to perform various activities over the internet. Along with... more Web browsers are the most common tool to perform various activities over the internet. Along with normal mode, all modern browsers have private browsing mode. The name of the mode varies from browser to browser but the purpose of the private mode remains same in every browser. In normal browsing mode, the browser keeps track of users' activity and related data such as browsing histories, cookies, auto-filled fields, temporary internet files, etc. In private mode, it is said that no information is stored while browsing or all information is destroyed after closing the current private session. However, some researchers have already disproved this claim by performing various tests in most popular browsers. I have also some personal experience where private mode browsing fails to keep all browsing information as private. In this position paper, I take the position against private browsing. By examining various facts, it is proved that the private browsing mode is not really private ...

Research paper thumbnail of A Comparative Study on I/O Performance between Compute and Storage Optimized Instances of Amazon EC2

2014 IEEE 7th International Conference on Cloud Computing, 2014

Cloud computing infrastructure helps users to minimize cost by outsourcing data and computation o... more Cloud computing infrastructure helps users to minimize cost by outsourcing data and computation on-demand. Due to the varying user needs in terms of computation power, storage capacity, etc., cloud providers offer various machines to choose from, to maximize the intended need. In this paper, we disprove several common conceptions regarding the performance and cost of cloud by experimenting on instances of two different families (compute and storage optimized) of the most popular cloud platform, Amazon Elastic Compute Cloud (EC2). Our analysis shows the interesting finding that, for the machines of the same configuration, storage optimized instances have lower disk readwrite speed than compute optimized, which does not completely reflect the claim made by Amazon in all cases. Additionally, storage optimized instances have notable performance difference among them. We also identify that the I/O performance of same instance type varies over different time periods.

Research paper thumbnail of Are Emojis Emotional? A Study to Understand the Association between Emojis and Emotions

ArXiv, 2020

Given the growing ubiquity of emojis in language, there is a need for methods and resources that ... more Given the growing ubiquity of emojis in language, there is a need for methods and resources that shed light on their meaning and communicative role. One conspicuous aspect of emojis is their use to convey affect in ways that may otherwise be non-trivial to achieve. In this paper, we seek to explore the connection between emojis and emotions by means of a new dataset consisting of human-solicited association ratings. We additionally conduct experiments to assess to what extent such associations can be inferred from existing data, such that similar associations can be predicted for a larger set of emojis. Our experiments show that this succeeds when high-quality word-level information is available.

Research paper thumbnail of EmoTag – Towards an Emotion-Based Analysis of Emojis

Proceedings - Natural Language Processing in a Deep Learning World

Despite being a fairly recent phenomenon, emojis have quickly become ubiquitous. Besides their ex... more Despite being a fairly recent phenomenon, emojis have quickly become ubiquitous. Besides their extensive use in social media, they are now also invoked in customer surveys and feedback forms. Hence, there is a need for techniques to understand their sentiment and emotion. In this work, we provide a method to quantify the emotional association of basic emotions such as anger, fear, joy, and sadness for a set of emojis. We collect and process a unique corpus of 20 million emoji-centric tweets, such that we can capture rich emoji semantics using a comparably small dataset. We evaluate the induced emotion profiles of emojis with regard to their ability to predict word affect intensities as well as sentiment scores.

Research paper thumbnail of EmoTag1200: Understanding the Association between Emojis and Emotions

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Research paper thumbnail of Performance Analysis of MPI (mpi4py) on Diskless Cluster Environment in Ubuntu

International Journal of Computer Applications

Now-a-days Cluster computing has become a crying need for the processing of large scale data. For... more Now-a-days Cluster computing has become a crying need for the processing of large scale data. For computing large amount of data, which need huge execution time, the run time can be reduced using multiple processors and task distribution through cluster computing. It is the technique of sharing two or more computers' resources through a network (usually through a local area network) in order to take advantage of the parallel processing power of those computers. Clusters of computers are usually deployed to improve processing speed and/or reliability and scalability over that provided by a single computer. In this paper we proposed a High Performance computing approach on Linux platform (Ubuntu) using Parallel Programming environment with the collaboration of multiple nodes for large scale computational work.

Research paper thumbnail of Effect of Homogeneous and Heterogeneous Network Structure on Alchemi Based Grid Computing Platform

Modern world is evolving to an era of collaborative computing from personal computing. By the lat... more Modern world is evolving to an era of collaborative computing from personal computing. By the latest few years Grid computing has been established as a means of collaboration for human civilization in many fields. This paper concerned on Alchemi which is a .net based Desktop Grid Computing Framework. Alchemi uses the unutilized processing power, resources and by combining a number of PCs it creates a virtual super computer. Depending on the hosts' configuration we can define the network of PCs as Homogeneous Network or Heterogeneous Network that eventually serve as a grid platform. Heterogeneous network can be defined as a LAN working together with different hardware and/or software configuration and protocol. In the same way we can define Homogeneous Network as a Network of PCs with same processing power and same protocol. This paper inspects the effect of Heterogeneous and Homogeneous Network on a grid computing platform. Thus we created a test bed where the Homogeneous and Heterogeneous Network have total same processing power. We executed a simple computational application and recorded the result for different number of threads and different size of that application. Processing the data shows us for smaller number of tasks both Networks works almost similar but for bigger tasks Homogeneous networks work better by a considerable amount as the task size increases. So, depending on this result we suggest to have grid platform as more likely to be Homogeneous Network.

Research paper thumbnail of Spam Campaign Cluster Detection Using Redirected URLs and Randomized Sub-Domains

ABSTRACT A substantial majority of the email sent everyday is spam. Spam emails cause many proble... more ABSTRACT A substantial majority of the email sent everyday is spam. Spam emails cause many problems if someone acts or clicks on the link provided in the email body. The problems may include infecting users personal machine with malware, stealing personal information, capturing credit card information, etc. Since spam emails are generated as a part of a very limited numbers of spam campaigns, it is useful to cluster spam messages into campaigns, so as to identify which campaigns are the largest. This enables investigation to focus this attention on the largest as the most significant clusters. In this paper, we present a method to cluster spam emails into spam campaigns. In our approach, the redirected URL has been chosen as the primary field for cluster formation. Our study shows that, a huge number of URLs arriving in spam email eventually points to a much smaller set of redirected URLs. Our multilevel clustering method grouped 90% of our half million spam emails into 4 spam campaigns. In addition to redirected URLs, we also use randomized sub domains, which come as a given URL in email body, for campaign identification. We believe that our model can be applied in real time to quickly detect major campaign.

Research paper thumbnail of A Comparative Study on I/O Performance between Compute and Storage Optimized Instances of Amazon EC2

2014 IEEE 7th International Conference on Cloud Computing, 2014

Research paper thumbnail of Runtime thread rescheduling: An extended scheduling algorithm to enhance the performance of the Gridbus broker

2008 IEEE International Multitopic Conference, 2008

Abstract Grid computing is rapidly becoming a requirement for the modern days computing where nee... more Abstract Grid computing is rapidly becoming a requirement for the modern days computing where needs large amount of data to be processed. The Gridbus broker focuses on the Data Grid and schedules jobs depending on data and compute resources. In the current scheduling process, a job is assigned to an executor depending on the compute resource and data resource available at the time of deployment. One major problem is, if there is an idle higher grade compute resource available after the scheduling, it doesn't take the ...

Research paper thumbnail of File based GRID thread implementation in the .NET-based Alchemi Framework

2008 IEEE International Multitopic Conference, 2008

Abstract Now a day, grid computing is considered as one of the emerging technology in which jobs ... more Abstract Now a day, grid computing is considered as one of the emerging technology in which jobs are distributed across the network or Internet. Among the several software toolkits those help us to implement a grid environment, Alchemi is widely used and open source toolkit that runs on the Windows operating system in the .NET Framework. The node which requests an application to be performed is called owner. The node that receives the requested application and sends result back to the owner is called manager. An ...

Research paper thumbnail of Runtime thread rescheduling: An extended scheduling algorithm to enhance the performance of the Gridbus broker

… , 2008. INMIC 2008. …, 2008

Grid computing is becoming a requirement for the processing of large amount of data now-a-days. T... more Grid computing is becoming a requirement for the processing of large amount of data now-a-days. The Gridbus broker schedules jobs depending on data and compute resources. Current scheduling process does not reassign a job from lower compute resource to higher compute resource if higher compute resource is available. In this paper, we have proposed a technique to reassign a thread to higher grade executor by preempting the thread in lower grade executor by using the data restoration technique which track the information of the thread so far ran on a lower rate compute resource. It is done only if there is an idle higher computer resource is available. The performance as well as the reliability of the Grid has been improved by this approach in a considerable extent.