ehab hassanein | Cairo University (original) (raw)
Papers by ehab hassanein
MATTER: International Journal of Science and Technology, Nov 19, 2018
Annotation is adding metadata to pages to become more meaningful and readable for machines. Howev... more Annotation is adding metadata to pages to become more meaningful and readable for machines. However, many Semantic annotation tools developed which proved their success in multiple languages, but Arabic is none of them. We present AnnoBic which is an Arabic semantic annotation tool for RSS feeds.
Lecture Notes in Computer Science, Nov 19, 2022
arXiv (Cornell University), Jun 19, 2022
Validation of compliance rules against process data is a fundamental functionality for business p... more Validation of compliance rules against process data is a fundamental functionality for business process management. Over the years, the problem has been addressed for different types of process data, i.e., process models, process event data at runtime, and event logs representing historical execution. Several approaches have been proposed to tackle compliance checking over process logs. These approaches have been based on different data models and storage technologies including relational databases, graph databases, and proprietary formats. Graph-based encoding of event logs is a promising direction that turns several process analytics tasks into queries on the underlying graph. Compliance checking is one class of such analysis tasks. In this paper, we argue that encoding log data as graphs alone is not enough to guarantee efficient processing of queries on this data. Efficiency is important due to the interactive nature of compliance checking. Thus, compliance checking would benefit from sub-linear scanning of the data. Moreover, as more data are added, e.g., new batches of logs arrive, the data size should grow sub-linearly to optimize both the space of storage and time for querying. We propose two encoding methods using graph representation, realized in Neo4J, and show the benefits of these encoding on a special class of queries, namely timed order compliance rules. Compared to a baseline encoding, our experiments show up to 5x speed up in the querying time as well as a 3x reduction in the graph size.
International Journal of Information Technology and Language Studies, May 2, 2021
Intelligent Multi‐modal Data Processing
International Journal of Advanced Computer Science and Applications, 2019
Due to the proliferation of big data with large volume, velocity, complexity, and distribution am... more Due to the proliferation of big data with large volume, velocity, complexity, and distribution among remote servers, it became obvious that traditional relational databases are unsuitable for meeting the requirements of such data. This led to the emergence of a novel technology among organizations and business enterprises; NoSQL datastores. Today such datastores have become popular alternatives to traditional relational databases, since their schema-less data models can manipulate and handle a huge amount of structured, semistructured and unstructured data, with high speed and immense distribution. Those data stores are of four basic types, and numerous instances have been developed under each type. This implies the need to understand the differences among them and how to select the most suitable one for any given data. Unfortunately, research efforts in the literature either consider differences from a theoretical point of view (without real use cases), or address performance issues such as speed and storage, which is insufficient to give researchers deep insight into the mapping of a given data structure to a given NoSQL datastore type. Hence, this paper provides a qualitative comparison among three popular datastores of different types (Redis, Neo4j, and MongoDB) using a real use case of each type, translated to the others. It thus highlights the inherent differences among them, and hence what data structures each of them suits most.
Future Computing and Informatics Journal, Dec 1, 2018
Many developing countries are now experiencing revolution in e-government to deliver fluent and s... more Many developing countries are now experiencing revolution in e-government to deliver fluent and simple services for their citizens. However, governmental sectors face many challenges in using its e-governments' services and its infrastructure, improving current services or developing new services; as data and applications increasingly inflating, IT budget costs, software licensing and support and difficulties in migration, integration and management for software and hardware. These challenges may lead to failure of e-governments' projects. Therefore, there is a need for a solution to overcome these challenges. Cloud Computing plays a vital role to solve these problems. This paper demonstrates egovernment's obstacles and cloud computing features. Also, it proposes an abstract hybrid model for adapting cloud computing in e-government that overcomes the e-government's challenges. This hybrid proposed model identifies three different patterns of cloud computing which are Local Governmental Cloud "LGC", Regional Governmental Cloud "RGC" and Wide Governmental Cloud "WGC". The proposed model determines how the entity connects to each of three clouds and what the relation between them is. In addition, readiness assessment of the services need to migrate into cloud. Finally, a set of recommended cloud aspects and their values for each of three clouds are suggested that ensure implementation of the sorted services.
International Journal of Advanced Computer Science and Applications
Apparently, most life activities that people perform depend on their unique characteristics. Pers... more Apparently, most life activities that people perform depend on their unique characteristics. Personal characteristics vary across people, so they perform tasks in different ways based on their skills. People have different mental, psychological, and behavioral features that affect most life activities. This is the same case with students at various educational levels. Students have different features that affect their academic performance. The academic score is the main indicator of the student's performance. However, other factors such as personality features, intelligence level, and basic personal data can have a great influence on the student's performance. This means that the academic score is not the only indicator that can be used in predicting students' performance. Consequently, an approach based on personal data, personality features, and intelligence quotient is proposed to predict the performance of university undergraduates. Five machine learning techniques were used in the proposed approach. In order to evaluate the performance of the proposed approach, a real student's dataset was used, and various performance measures were computed. Several experiments were performed to determine the impact of various features on the student's performance. The proposed approach gave promising results when tested on the dataset.
International Journal of Recent Technology and Engineering (IJRTE), 2019
Software Engineering (SE) is the application of essentials to deal with the analysis, design, dev... more Software Engineering (SE) is the application of essentials to deal with the analysis, design, development, testing, deployment and management - Software Development Life Cycle (SDLC) - of software systems. Requirements Engineering (RE) is responsible for the most critical task in the SDLC; which is transforming the requirements and wishes of the software users into complete, accurate and formal specifications. One of the main responsibilities of RE is the creation of a software requirements document that exactly, reliably, and totally defines the functional and non-functional properties of the system to be developed. At some point through the RE process, the requirements are written using a Natural Language (NL). On one hand, NLs are flexible, common, and popular. On the other hand, NLs are recognized widely as inherently ambiguous. Ambiguity is noticed in a requirement document when a piece of text is interpreted in distinct ways. This may lead to erroneous software that is too exp...
International Journal of Recent Technology and Engineering (IJRTE), 2019
The evolution of cloud computing over the past few years is potentially one of the major advances... more The evolution of cloud computing over the past few years is potentially one of the major advances in the history of computing. Cloud computing theoretically provides all computing needs as services. Accordingly, a large number of cloud service providers exist and the number is constantly increasing. This presents a significant problem for a user to find a relevant service provider, and calls for developing a specialized search engine to help users select suitable services matching their needs. Towards this goal, we developed a search engine that crawls the web sites of various service providers, extracts service attributes from their JavaScript Object Notation (JSON) files and normalizes the attributes in a service table. Those attributes are clustered using one of three different algorithms (K-means, K-medoids, and ISODATA). The requirements of a given user are then matched against the centroids of the various clusters to help obtain the closest match. In this paper, we compared th...
Proceedings of the 10th International Conference on Informatics and Systems - INFOS '16, 2016
Reusability of software components can save effort, time, and cost. The ability to take a decisio... more Reusability of software components can save effort, time, and cost. The ability to take a decision for which software components will be reused is not an easy task. Before starting a new project or while working on a project similarity checks between the project requirements and requirements of already developed projects can be done to consider reusable components. In this paper we propose a framework to measure similarity between requirements documents targeting improving reusability.
2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 2015
Lecture Notes in Business Information Processing, 2016
Lecture Notes in Computer Science, 2016
Event logs are invaluable sources about the actual execution of processes. Most of process mining... more Event logs are invaluable sources about the actual execution of processes. Most of process mining and postmortem analysis techniques depend on logs. All these techniques require the existence of the case ID to correlate the events. Real life logs are rarely originating from a centrally orchestrated process execution. Hence, case ID is missing, known as unlabeled logs. Correlating unlabeled events is a challenging problem that has received little attention in literature. Moreover, the few approaches addressing this challenge support acyclic business processes only. In this paper, we build on our previous work and propose an approach to deduce case ID for unlabeled event logs produced from cyclic business processes. As a result, a set of ranked labeled logs are generated. We evaluate our approach using real life logs.
Finding frequent itemsets is one of the most important fields of data mining. Apriori algorithm i... more Finding frequent itemsets is one of the most important fields of data mining. Apriori algorithm is the most established algorithm for finding frequent itemsets from a transactional dataset; however, it needs to scan the dataset many times and to generate many candidate itemsets. Unfortunately, when the dataset size is huge, both memory use and computational cost can still be very expensive. In addition, single processor’s memory and CPU resources are very limited, which make the algorithm performance inefficient. Parallel and distributed computing are effective strategies for accelerating algorithms performance. In this paper, we have implemented an efficient MapReduce Apriori algorithm (MRApriori) based on HadoopMapReduce model which needs only two phases (MapReduce Jobs) to find all frequent k-itemsets, and compared our proposed MRApriori algorithm with current two existed algorithms which need either one or k phases (k is maximum length of frequent itemsets) to find the same freq...
International Journal of Advanced Computer Science and Applications
Recommending the right resource to execute the next activity of a running process instance is of ... more Recommending the right resource to execute the next activity of a running process instance is of utmost importance for the overall performance of the business process, as well as the resource and for the whole organization. Several approaches have recommended a resource based on the task requirements and the resource capabilities. Moreover, the process execution history and the logs have been used to better recommend a resource based on different human-resource recommender criteria like frequency and speed of execution, etc. These approaches considered the recommendation based on the individual's execution history of the task that will be allocated to the resource. In this paper, a novel approach based on the co-working history of resources has been proposed. This approach considers the resources that had executed the previous tasks in the current running process instances. Then, it recommends a resource that has the best harmony with the rest of the resources.
Egypt public higher education and institutions (HEIs) have recognized the need to reassess their ... more Egypt public higher education and institutions (HEIs) have recognized the need to reassess their functions of teaching, research, and community services. Successful organizations are these providing value for their stakeholders. HEIs are indifference and their management need to identify their stakeholders’ needs and to reposition their institutions towards the fulfillment of these needs. On their quest to enhance their competencies, Information Technology (IT) plays an important role of these institutions. Consequently, governance of It (or ITG) becomes a necessity. From the view point of, this paper aims to identify Egypt public HEIs stakeholders and their needs as the first and necessary step towards the successful implementation of ITG in Egypt public HEIs.
This paper presents a multi-spectral fusion system for improving the object detection and classif... more This paper presents a multi-spectral fusion system for improving the object detection and classification results in military and terrorism domains that entitled MSFMT. It relies on a combination between deep transfer learning and Dempster-Shafer statistical method in decision level fusion. It improves the classification results through fusing multiple sensory data that are extracted from multiple sources into two data types, images and videos, in night modes. It fuses multiple spectrums for showing the best vision for each object or action. These spectrums are Visual Intensified images (VIS), Near-infrared spectroscopy (NIR) images, thermal images, long wave infrared images (LIWR), DHV, and RGB). The neural network structure is constructed based on six neural networks. Each neural network is based on AlexNet pre-trained transfer neural networks for classifying spectrums. Each neural includes two neural networks for classifying objects and actions. MSFMT system improves the classific...
MATTER: International Journal of Science and Technology, Nov 19, 2018
Annotation is adding metadata to pages to become more meaningful and readable for machines. Howev... more Annotation is adding metadata to pages to become more meaningful and readable for machines. However, many Semantic annotation tools developed which proved their success in multiple languages, but Arabic is none of them. We present AnnoBic which is an Arabic semantic annotation tool for RSS feeds.
Lecture Notes in Computer Science, Nov 19, 2022
arXiv (Cornell University), Jun 19, 2022
Validation of compliance rules against process data is a fundamental functionality for business p... more Validation of compliance rules against process data is a fundamental functionality for business process management. Over the years, the problem has been addressed for different types of process data, i.e., process models, process event data at runtime, and event logs representing historical execution. Several approaches have been proposed to tackle compliance checking over process logs. These approaches have been based on different data models and storage technologies including relational databases, graph databases, and proprietary formats. Graph-based encoding of event logs is a promising direction that turns several process analytics tasks into queries on the underlying graph. Compliance checking is one class of such analysis tasks. In this paper, we argue that encoding log data as graphs alone is not enough to guarantee efficient processing of queries on this data. Efficiency is important due to the interactive nature of compliance checking. Thus, compliance checking would benefit from sub-linear scanning of the data. Moreover, as more data are added, e.g., new batches of logs arrive, the data size should grow sub-linearly to optimize both the space of storage and time for querying. We propose two encoding methods using graph representation, realized in Neo4J, and show the benefits of these encoding on a special class of queries, namely timed order compliance rules. Compared to a baseline encoding, our experiments show up to 5x speed up in the querying time as well as a 3x reduction in the graph size.
International Journal of Information Technology and Language Studies, May 2, 2021
Intelligent Multi‐modal Data Processing
International Journal of Advanced Computer Science and Applications, 2019
Due to the proliferation of big data with large volume, velocity, complexity, and distribution am... more Due to the proliferation of big data with large volume, velocity, complexity, and distribution among remote servers, it became obvious that traditional relational databases are unsuitable for meeting the requirements of such data. This led to the emergence of a novel technology among organizations and business enterprises; NoSQL datastores. Today such datastores have become popular alternatives to traditional relational databases, since their schema-less data models can manipulate and handle a huge amount of structured, semistructured and unstructured data, with high speed and immense distribution. Those data stores are of four basic types, and numerous instances have been developed under each type. This implies the need to understand the differences among them and how to select the most suitable one for any given data. Unfortunately, research efforts in the literature either consider differences from a theoretical point of view (without real use cases), or address performance issues such as speed and storage, which is insufficient to give researchers deep insight into the mapping of a given data structure to a given NoSQL datastore type. Hence, this paper provides a qualitative comparison among three popular datastores of different types (Redis, Neo4j, and MongoDB) using a real use case of each type, translated to the others. It thus highlights the inherent differences among them, and hence what data structures each of them suits most.
Future Computing and Informatics Journal, Dec 1, 2018
Many developing countries are now experiencing revolution in e-government to deliver fluent and s... more Many developing countries are now experiencing revolution in e-government to deliver fluent and simple services for their citizens. However, governmental sectors face many challenges in using its e-governments' services and its infrastructure, improving current services or developing new services; as data and applications increasingly inflating, IT budget costs, software licensing and support and difficulties in migration, integration and management for software and hardware. These challenges may lead to failure of e-governments' projects. Therefore, there is a need for a solution to overcome these challenges. Cloud Computing plays a vital role to solve these problems. This paper demonstrates egovernment's obstacles and cloud computing features. Also, it proposes an abstract hybrid model for adapting cloud computing in e-government that overcomes the e-government's challenges. This hybrid proposed model identifies three different patterns of cloud computing which are Local Governmental Cloud "LGC", Regional Governmental Cloud "RGC" and Wide Governmental Cloud "WGC". The proposed model determines how the entity connects to each of three clouds and what the relation between them is. In addition, readiness assessment of the services need to migrate into cloud. Finally, a set of recommended cloud aspects and their values for each of three clouds are suggested that ensure implementation of the sorted services.
International Journal of Advanced Computer Science and Applications
Apparently, most life activities that people perform depend on their unique characteristics. Pers... more Apparently, most life activities that people perform depend on their unique characteristics. Personal characteristics vary across people, so they perform tasks in different ways based on their skills. People have different mental, psychological, and behavioral features that affect most life activities. This is the same case with students at various educational levels. Students have different features that affect their academic performance. The academic score is the main indicator of the student's performance. However, other factors such as personality features, intelligence level, and basic personal data can have a great influence on the student's performance. This means that the academic score is not the only indicator that can be used in predicting students' performance. Consequently, an approach based on personal data, personality features, and intelligence quotient is proposed to predict the performance of university undergraduates. Five machine learning techniques were used in the proposed approach. In order to evaluate the performance of the proposed approach, a real student's dataset was used, and various performance measures were computed. Several experiments were performed to determine the impact of various features on the student's performance. The proposed approach gave promising results when tested on the dataset.
International Journal of Recent Technology and Engineering (IJRTE), 2019
Software Engineering (SE) is the application of essentials to deal with the analysis, design, dev... more Software Engineering (SE) is the application of essentials to deal with the analysis, design, development, testing, deployment and management - Software Development Life Cycle (SDLC) - of software systems. Requirements Engineering (RE) is responsible for the most critical task in the SDLC; which is transforming the requirements and wishes of the software users into complete, accurate and formal specifications. One of the main responsibilities of RE is the creation of a software requirements document that exactly, reliably, and totally defines the functional and non-functional properties of the system to be developed. At some point through the RE process, the requirements are written using a Natural Language (NL). On one hand, NLs are flexible, common, and popular. On the other hand, NLs are recognized widely as inherently ambiguous. Ambiguity is noticed in a requirement document when a piece of text is interpreted in distinct ways. This may lead to erroneous software that is too exp...
International Journal of Recent Technology and Engineering (IJRTE), 2019
The evolution of cloud computing over the past few years is potentially one of the major advances... more The evolution of cloud computing over the past few years is potentially one of the major advances in the history of computing. Cloud computing theoretically provides all computing needs as services. Accordingly, a large number of cloud service providers exist and the number is constantly increasing. This presents a significant problem for a user to find a relevant service provider, and calls for developing a specialized search engine to help users select suitable services matching their needs. Towards this goal, we developed a search engine that crawls the web sites of various service providers, extracts service attributes from their JavaScript Object Notation (JSON) files and normalizes the attributes in a service table. Those attributes are clustered using one of three different algorithms (K-means, K-medoids, and ISODATA). The requirements of a given user are then matched against the centroids of the various clusters to help obtain the closest match. In this paper, we compared th...
Proceedings of the 10th International Conference on Informatics and Systems - INFOS '16, 2016
Reusability of software components can save effort, time, and cost. The ability to take a decisio... more Reusability of software components can save effort, time, and cost. The ability to take a decision for which software components will be reused is not an easy task. Before starting a new project or while working on a project similarity checks between the project requirements and requirements of already developed projects can be done to consider reusable components. In this paper we propose a framework to measure similarity between requirements documents targeting improving reusability.
2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 2015
Lecture Notes in Business Information Processing, 2016
Lecture Notes in Computer Science, 2016
Event logs are invaluable sources about the actual execution of processes. Most of process mining... more Event logs are invaluable sources about the actual execution of processes. Most of process mining and postmortem analysis techniques depend on logs. All these techniques require the existence of the case ID to correlate the events. Real life logs are rarely originating from a centrally orchestrated process execution. Hence, case ID is missing, known as unlabeled logs. Correlating unlabeled events is a challenging problem that has received little attention in literature. Moreover, the few approaches addressing this challenge support acyclic business processes only. In this paper, we build on our previous work and propose an approach to deduce case ID for unlabeled event logs produced from cyclic business processes. As a result, a set of ranked labeled logs are generated. We evaluate our approach using real life logs.
Finding frequent itemsets is one of the most important fields of data mining. Apriori algorithm i... more Finding frequent itemsets is one of the most important fields of data mining. Apriori algorithm is the most established algorithm for finding frequent itemsets from a transactional dataset; however, it needs to scan the dataset many times and to generate many candidate itemsets. Unfortunately, when the dataset size is huge, both memory use and computational cost can still be very expensive. In addition, single processor’s memory and CPU resources are very limited, which make the algorithm performance inefficient. Parallel and distributed computing are effective strategies for accelerating algorithms performance. In this paper, we have implemented an efficient MapReduce Apriori algorithm (MRApriori) based on HadoopMapReduce model which needs only two phases (MapReduce Jobs) to find all frequent k-itemsets, and compared our proposed MRApriori algorithm with current two existed algorithms which need either one or k phases (k is maximum length of frequent itemsets) to find the same freq...
International Journal of Advanced Computer Science and Applications
Recommending the right resource to execute the next activity of a running process instance is of ... more Recommending the right resource to execute the next activity of a running process instance is of utmost importance for the overall performance of the business process, as well as the resource and for the whole organization. Several approaches have recommended a resource based on the task requirements and the resource capabilities. Moreover, the process execution history and the logs have been used to better recommend a resource based on different human-resource recommender criteria like frequency and speed of execution, etc. These approaches considered the recommendation based on the individual's execution history of the task that will be allocated to the resource. In this paper, a novel approach based on the co-working history of resources has been proposed. This approach considers the resources that had executed the previous tasks in the current running process instances. Then, it recommends a resource that has the best harmony with the rest of the resources.
Egypt public higher education and institutions (HEIs) have recognized the need to reassess their ... more Egypt public higher education and institutions (HEIs) have recognized the need to reassess their functions of teaching, research, and community services. Successful organizations are these providing value for their stakeholders. HEIs are indifference and their management need to identify their stakeholders’ needs and to reposition their institutions towards the fulfillment of these needs. On their quest to enhance their competencies, Information Technology (IT) plays an important role of these institutions. Consequently, governance of It (or ITG) becomes a necessity. From the view point of, this paper aims to identify Egypt public HEIs stakeholders and their needs as the first and necessary step towards the successful implementation of ITG in Egypt public HEIs.
This paper presents a multi-spectral fusion system for improving the object detection and classif... more This paper presents a multi-spectral fusion system for improving the object detection and classification results in military and terrorism domains that entitled MSFMT. It relies on a combination between deep transfer learning and Dempster-Shafer statistical method in decision level fusion. It improves the classification results through fusing multiple sensory data that are extracted from multiple sources into two data types, images and videos, in night modes. It fuses multiple spectrums for showing the best vision for each object or action. These spectrums are Visual Intensified images (VIS), Near-infrared spectroscopy (NIR) images, thermal images, long wave infrared images (LIWR), DHV, and RGB). The neural network structure is constructed based on six neural networks. Each neural network is based on AlexNet pre-trained transfer neural networks for classifying spectrums. Each neural includes two neural networks for classifying objects and actions. MSFMT system improves the classific...