Zahidul Islam - Academia.edu (original) (raw)
Papers by Zahidul Islam
This work is accepted for presentation at IEEE ICC'23. : Smart grid technologies have been tr... more This work is accepted for presentation at IEEE ICC'23. : Smart grid technologies have been transforming the power grid operation paradigms by integrating smart sensing devices, advanced communication networks, and powerful computing resources. In addition, data-driven applications have significantly increased in recent years, accelerating the use of smart sensors, such as phasor measurement units (PMU), in power grid monitoring. It necessitates a well-functioning communication network (CN) for PMU measurement data transfer to the control center even in the event of failures. This paper proposes a PMU network routing algorithm to ensure data transfer for control center's resilient observability to the power grid. The interdependent roles of PMUs in power grid observability is first identified based on the power grid topology. Then, a failure-tolerant routing algorithm is proposed to find data transfer paths in the CN that meets the power grid monitoring needs. The resultant ...
INDIN '05. 2005 3rd IEEE International Conference on Industrial Informatics, 2005.
1 Data Mining is a powerful tool for information discovery from huge datasets. Various sectors, i... more 1 Data Mining is a powerful tool for information discovery from huge datasets. Various sectors, including commercial, government, financial, medical, and scientific, are applying Data Mining techniques on their datasets that typically contain sensitive individual information. During this process the datasets get exposed to several parties, which can potentially lead to disclosure of sensitive information and thus to breaches of privacy. Several Data Mining privacy preserving techniques have been recently proposed. In this paper we focus on data perturbation techniques, i.e., those that add noise to the data in order to prevent exact disclosure of confidential values. Some of these techniques were designed for datasets having only numerical non-class attributes and a categorical class attribute. However, natural datasets are more likely to have both numerical and categorical non-class attributes, and occasionally they contain only categorical attributes. Noise addition techniques developed for numerical attributes are not suitable for such datasets, due to the absence of natural ordering among categorical values. In this paper we propose a new method for adding noise to categorical values, which makes use of the clusters that exist among these values. We first discuss several existing categorical clustering methods and point out the limitations they exhibit in our context. Then we present a novel approach towards clustering of categorical values and use it to perturb data while maintaining the patterns in the dataset. Our clustering approach partitions the values of a given categorical attribute rather than the records of the datasets; additionally, our approach operates on the horizontally partitioned dataset and it is possible for two values to belong to the same cluster in one horizontal partition of the dataset, and to two distinct clusters in another partition. Finally, we provide some experimental results in order to evaluate our perturbation technique and to compare our clustering approach with an existing method, the so-called CACTUS.
Cornell University - arXiv, Mar 30, 2020
Data are being collected from various aspects of life. These data can often arrive in chunks/batc... more Data are being collected from various aspects of life. These data can often arrive in chunks/batches. Traditional static clustering algorithms are not suitable for dynamic datasets, i.e., when data arrive in streams of chunks/batches. If we apply a conventional clustering technique over the combined dataset, then every time a new batch of data comes, the process can be slow and wasteful. Moreover, it can be challenging to store the combined dataset in memory due to its ever-increasing size. As a result, various incremental clustering techniques have been proposed. These techniques need to efficiently update the current clustering result whenever a new batch arrives, to adapt the current clustering result/solution with the latest data. These techniques also need the ability to detect concept drifts when the clustering pattern of a new batch is significantly different from older batches. Sometimes, clustering patterns may drift temporarily in a single batch while the next batches do not exhibit the drift. Therefore, incremental clustering techniques need the ability to detect a temporary drift and sustained drift. In this paper, we propose an efficient incremental clustering algorithm called UIClust. It is designed to cluster streams of data chunks, even when there are temporary or sustained concept drifts. We evaluate the performance of UIClust by comparing it with a recently published, high-quality incremental clustering algorithm. We use real and synthetic datasets. We compare the results by using well-known clustering evaluation criteria: entropy, sum of squared errors (SSE), and execution time. Our results show that UIClust outperforms the existing technique in all our experiments.
Fosil polen analizleri göl çökelleri, turbalıklar, akarsu ve deniz sedimanları, buzullar, linyitl... more Fosil polen analizleri göl çökelleri, turbalıklar, akarsu ve deniz sedimanları, buzullar, linyitler ve taş kömürleri gibi çeşitli ortamlardan elde edilen polenlerin_x000D_ araştırılmasını kapsamaktadır. Kuvaterner dönemine ait palinolojik çalışmaların önemli veri kaynaklarından biri de göllerdir. Araştırma alanı olarak seçilen_x000D_ Akgöl, Sakarya ilinde, Ferizli ilçesinin Gölkent mahallesinde bulunmaktadır. Gölün yüzölçümü 3,5 km2_x000D_ ve maksimum derinliği 8 m’dir. Bu çalışmanın amacı:_x000D_ gölün dip sedimanlarında fosil polen analizleri yaparak gölün çevresinde son 1000 yılda meydana gelen vejetasyon değişimlerini ortaya çıkarmaktır._x000D_ Akgöl’den karot alımında İTÜ EMCOL Araştırma Uygulama Merkezi’nin 4x4 m. platformlu piston karotiyeri kullanılmıştır. İstanbul Üniversitesi-Cerrahpaşa,_x000D_ Orman Fakültesi Orman Botaniği Anabilim Dalında bulunan Palinoloji Laboratuvarı’na getirilen karot üzerinde her 5 cm’de bir 2 cm3_x000D_ lük sediman örnekleri_x000D_ alınmıştır. Bu ...
IEEE Transactions on Services Computing
Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a ... more Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by transferring knowledge from the source domain having many labeled records. While existing methods often focus on one issue and leave the other one for the further work, TLF is capable of handling both issues simultaneously. In TLF, we alleviate feature discrepancy by identifying shared label distributions that act as the pivots to bridge the domains. We handle distribution divergence by simultaneously optimizing the structural risk functional, joint distributions between domains, and the manifold consistency underlying marginal distributions. Moreover, for the manifold consistency we exploit its intrinsic properties by identifying k nearest neighbors of a record, where the value of k is determined automatically in TLF. Furthermore, since negative transfer is not desired, we consider only the source records that are belonging to the source pivots during the knowledge transfer. We evaluate TLF on seven publicly available natural datasets and compare the performance of TLF against the performance of eleven state-of-the-art techniques. We also evaluate the effectiveness of TLF in some challenging situations. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.
Journal of Global Antimicrobial Resistance
IEEE Access
Precision agriculture represents the new age of conventional agriculture. This is made possible b... more Precision agriculture represents the new age of conventional agriculture. This is made possible by the advancement of various modern technologies such as the internet of things. The unparalleled potential for data collection and analytics has resulted in an increase in multidisciplinary research within machine learning and agriculture. However, the application of machine learning techniques to agriculture seems to be out of step with core machine learning research. This gap is further exacerbated by the inherent challenges associated with agricultural data. In this work, we conduct a systematic review of a large body of academic literature published between 2000 and 2022, on the application of machine learning techniques to agriculture. We identify and discuss some of the key data issues such as class imbalance, data sparsity and high dimensionality. Further, we study the impact of these data issues on various machine learning approaches within the context of agriculture. Finally, we identify some of the common pitfalls in the machine learning and agriculture research including the misapplication of machine learning evaluation techniques. To this end, this survey presents a holistic view on the state of affairs in the cross-domain of machine learning and agriculture and proposes some suitable mitigation strategies to address these challenges. INDEX TERMS Agriculture, digital farming, intelligent agriculture, machine learning, precision agriculture, precision farming.
Antibiotics
Background: Antibiotic exposure in the pediatric intensive care unit (PICU) is very high, althoug... more Background: Antibiotic exposure in the pediatric intensive care unit (PICU) is very high, although 50% of all antibiotics may be unnecessary. We aimed to determine the utility of simple bedside screening tools and predicting factors to avoid antibiotic overuse in the ICU among children with diarrhea and critical illness. Methods: We conducted a retrospective, single-center, case-control study that included children aged 2–59 months who were admitted to PICU with diarrhea and critical illness between 2017 and 2020. Results: We compared young children who did not receive antibiotics (cases, n = 164) during ICU stay to those treated with antibiotics (controls, n = 346). For predicting the ‘no antibiotic approach’, the sensitivity of a negative quick Sequential Organ Failure Assessment (qSOFA) was similar to quick Pediatric Logistic Organ Dysfunction-2 (qPELOD-2) and higher than Systemic Inflammatory Response Syndrome (SIRS). A negative qSOFA or qPELOD-2 score calculated during PICU adm...
ACM Computing Surveys
Data mining is the science of extracting information or “knowledge” from data. It is a task commo... more Data mining is the science of extracting information or “knowledge” from data. It is a task commonly executed on cloud computing resources, personal computers and laptops. However, what about smartphones? Despite the fact that these ubiquitous mobile devices now offer levels of hardware and performance approaching that of laptops, locally executed model-training using data mining methods on smartphones is still notably rare. On-device model-training offers a number of advantages. It largely mitigates issues of data security and privacy, since no data is required to leave the device. It also ensures a self-contained, fully portable data mining solution requiring no cloud computing or network resources and able to operate in any location. In this article, we focus on the intersection of smartphones and data mining. We investigate the growth in smartphone performance, survey smartphone usage models in previous research, and look at recent developments in locally executed data mining on...
Journal of Business and Economic Analysis, 2018
The purpose of this paper is to develop a conceptual model on organizational effectiveness in the... more The purpose of this paper is to develop a conceptual model on organizational effectiveness in the public sector by investigating its relationship with two concepts 1) the importance of knowledge sharing for ensuring organizational effectiveness and 2) the role of organizational leadership in creating knowledge sharing environment to result in organizational effectiveness. Knowledge sharing has been proliferated as a key process to organizational performance improvement. Similar to its counterpart, the public sector is as serious about the effectiveness of their organizations, however not many public managers and employees are receptive to the idea of knowledge sharing and this could be due to their distinctive character of confidentiality. Here, it could be argued that leadership has implication for creating that climate as it has locomotion function (i.e., facilitation of motivation and activation of employees to fulfill the goals) or a cohesion function (i.e., enabling collaborati...
Communications in Computer and Information Science, 2018
Decision Trees are well known classification algorithms that are also appreciated for their capac... more Decision Trees are well known classification algorithms that are also appreciated for their capacity for knowledge discovery. In the literature two major shortcomings of decision trees have been pointed out: (1) instability, and (2) high computational cost. These problems have been addressed to some extent through ensemble learning techniques such as Random Forest. Unlike decision trees where the whole attribute space of a dataset is used to discover the best test attribute for a node, in Random Forest a random subspace of attributes is first selected from which the test attribute for a node is then identified. The property that randomly selects an attribute subspace can cause the selection of all/many poor quality attributes in a subspace resulting in an individual tree with low accuracy. Therefore, in this paper we propose a probabilistic selection of attributes (instead of a random selection) where the probability of the selection of an attribute is proportionate to its quality. Although we developed this approach independently, after the research was completed we discovered that some existing techniques also took the same approach. While in this paper we use mutual information as a measure of an attribute quality, the papers in the literature used information gain ratio and a t-test as the measure. The proposed technique has been evaluated using nine different datasets and a stable performance can be seen in terms of the accuracy (ensemble accuracy and individual tree accuracy) and efficiency.
Cornell University - arXiv, Dec 24, 2015
In this paper, we explore how modifying data to preserve privacy affects the quality of the patte... more In this paper, we explore how modifying data to preserve privacy affects the quality of the patterns discoverable in the data. For any analysis of modified data to be worth doing, the data must be as close to the original as possible. Therein lies a problem-how does one make sure that modified data still contains the information it had before modification? This question is not the same as asking if an accurate classifier can be built from the modified data. Often in the literature, the prediction accuracy of a classifier made from modified (anonymized) data is used as evidence that the data is similar to the original. We demonstrate that this is not the case, and we propose a new methodology for measuring the retention of the patterns that existed in the original data. We then use our methodology to design three measures that can be easily implemented, each measuring aspects of the data that no pre-existing techniques can measure. These measures do not negate the usefulness of prediction accuracy or other measures-they are complementary to them, and support our argument that one measure is almost never enough.
Children, 2021
Congenital heart disease (CHD) is one of the most common types of birth defect with a high morbid... more Congenital heart disease (CHD) is one of the most common types of birth defect with a high morbidity and mortality, particularly in severely malnourished children under five. In this study, we aim to identify the predicting factors for CHD and their outcomes. 694 malnourished children under five years of age admitted between April 2015 and December 2017 constituted the study population. Of them, 64 were cases of CHD, and by comparison 630 were without CHD. CHD was diagnosed clinically and confirmed by echocardiogram. 64% of the cases had a single defect. Cases were more likely to be present with diarrhea, cough, respiratory distress, cyanosis, hypoxemia, hypoglycemia and hypernatremia on admission. The cases also had a high proportion of severe sepsis, bacteremia, heart failure, respiratory failure and death, compared to those without CHD. Cough (95% CI = 1.09–18.92), respiratory distress (95% CI = 1.46–5.39) and hypoxemia (95% CI = 1.59–6.86) were found to be the independent predic...
This paper measures profit efficiency and examines the effect of access to microfinance on the pe... more This paper measures profit efficiency and examines the effect of access to microfinance on the performance of rice firms in Bangladesh. An extended Cobb-Douglas stochastic frontier profit function was used to assess profit efficiency and profit loss of rice farmers in Bangladesh in a survey data of 360 farms throughout the 2008-2009 growing seasons. Model diagnostics reveal that serious selection bias exists that justifies the uses of sample selection model in stochastic frontier models. After effectively correcting for selectivity bias, the mean profit efficiency of the microfinance borrowers and non-borrowers were estimated at 68% and 52% respectively, thereby suggesting that a considerable share of profits were lost due to profit inefficiencies in rice production. The results from the inefficiency effect model show that households' age, extension visits, off-farm income, region and the farm size are the significant determinants of inefficiency. Some indicative policy recommen...
Alberta's lakes support important environmental, social and economic values. The effect of cumula... more Alberta's lakes support important environmental, social and economic values. The effect of cumulative allocations over time from lakes within a watershed may impact the health of the aquatic environment, which include fish and wildlife resources. Therefore, water withdrawals from lakes should be regulated in such a way so that ecosystems are preserved while balancing reliable, quality water supplies to sustain communities and economic and recreational opportunities. Accurate estimation of water availability (volume) in a lake requires a complete water balance study, which requires bathymetric information of the lake. However, only a fraction of Alberta lakes have surveyed bathymetry data. In support of provincial policy development and for quantifying the potential impacts of water withdrawal from lakes, approaches to estimating lake volume using limited available data were tested. In this study, we analysed available bathymetry data from 77 lakes and developed three different models to estimate maximum lake volume (a proxy of lake water availability) and 5% under ice volume (a proxy for winter allocation limit of lake water) assuming an ice thickness of 80 cm. These models have been developed in such a way that allows the user to apply the models based on data availability. These models can be used in absence of site specific data (e.g., bathymetry) to estimate volume, and subsequently water availability in lakes.
4th International Conference on Wireless, Intelligent and Distributed Environment for Communication, 2022
Measuring Data Quality: Predictive Accuracy vs. Similarity of Decision Trees Md. Zahidul Islam 1,... more Measuring Data Quality: Predictive Accuracy vs. Similarity of Decision Trees Md. Zahidul Islam 1, Payam Mamaani Barnaghi 2 and Ljiljana Brankovic1 School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, NSW 2308, Australia 1 ...
Advanced Data Mining and Applications, 2019
Smartphones have become the ultimate 'personal' computer, yet despite this, general-purpose data ... more Smartphones have become the ultimate 'personal' computer, yet despite this, general-purpose data mining and knowledge discovery tools for mobile devices are surprisingly rare. DataLearner is a new data mining application designed specifically for Android devices that imports the Weka data mining engine and augments it with algorithms developed by Charles Sturt University. Moreover, DataLearner can be expanded with additional algorithms. Combined, DataLearner delivers 40 classification, clustering and association rule mining algorithms for model training and evaluation without need for cloud computing resources or network connectivity. It provides the same classification accuracy as PCs and laptops, while doing so with acceptable processing speed and consuming negligible battery life. With its ability to provide easy-to-use data mining on a phone-size screen, DataLearner is a new portable, selfcontained data mining tool for remote, personalised and educational applications alike. DataLearner features four elementsthis paper, the app available on Google Play, the GPL3-licensed source code on GitHub and a short video on YouTube.
This work is accepted for presentation at IEEE ICC'23. : Smart grid technologies have been tr... more This work is accepted for presentation at IEEE ICC'23. : Smart grid technologies have been transforming the power grid operation paradigms by integrating smart sensing devices, advanced communication networks, and powerful computing resources. In addition, data-driven applications have significantly increased in recent years, accelerating the use of smart sensors, such as phasor measurement units (PMU), in power grid monitoring. It necessitates a well-functioning communication network (CN) for PMU measurement data transfer to the control center even in the event of failures. This paper proposes a PMU network routing algorithm to ensure data transfer for control center's resilient observability to the power grid. The interdependent roles of PMUs in power grid observability is first identified based on the power grid topology. Then, a failure-tolerant routing algorithm is proposed to find data transfer paths in the CN that meets the power grid monitoring needs. The resultant ...
INDIN '05. 2005 3rd IEEE International Conference on Industrial Informatics, 2005.
1 Data Mining is a powerful tool for information discovery from huge datasets. Various sectors, i... more 1 Data Mining is a powerful tool for information discovery from huge datasets. Various sectors, including commercial, government, financial, medical, and scientific, are applying Data Mining techniques on their datasets that typically contain sensitive individual information. During this process the datasets get exposed to several parties, which can potentially lead to disclosure of sensitive information and thus to breaches of privacy. Several Data Mining privacy preserving techniques have been recently proposed. In this paper we focus on data perturbation techniques, i.e., those that add noise to the data in order to prevent exact disclosure of confidential values. Some of these techniques were designed for datasets having only numerical non-class attributes and a categorical class attribute. However, natural datasets are more likely to have both numerical and categorical non-class attributes, and occasionally they contain only categorical attributes. Noise addition techniques developed for numerical attributes are not suitable for such datasets, due to the absence of natural ordering among categorical values. In this paper we propose a new method for adding noise to categorical values, which makes use of the clusters that exist among these values. We first discuss several existing categorical clustering methods and point out the limitations they exhibit in our context. Then we present a novel approach towards clustering of categorical values and use it to perturb data while maintaining the patterns in the dataset. Our clustering approach partitions the values of a given categorical attribute rather than the records of the datasets; additionally, our approach operates on the horizontally partitioned dataset and it is possible for two values to belong to the same cluster in one horizontal partition of the dataset, and to two distinct clusters in another partition. Finally, we provide some experimental results in order to evaluate our perturbation technique and to compare our clustering approach with an existing method, the so-called CACTUS.
Cornell University - arXiv, Mar 30, 2020
Data are being collected from various aspects of life. These data can often arrive in chunks/batc... more Data are being collected from various aspects of life. These data can often arrive in chunks/batches. Traditional static clustering algorithms are not suitable for dynamic datasets, i.e., when data arrive in streams of chunks/batches. If we apply a conventional clustering technique over the combined dataset, then every time a new batch of data comes, the process can be slow and wasteful. Moreover, it can be challenging to store the combined dataset in memory due to its ever-increasing size. As a result, various incremental clustering techniques have been proposed. These techniques need to efficiently update the current clustering result whenever a new batch arrives, to adapt the current clustering result/solution with the latest data. These techniques also need the ability to detect concept drifts when the clustering pattern of a new batch is significantly different from older batches. Sometimes, clustering patterns may drift temporarily in a single batch while the next batches do not exhibit the drift. Therefore, incremental clustering techniques need the ability to detect a temporary drift and sustained drift. In this paper, we propose an efficient incremental clustering algorithm called UIClust. It is designed to cluster streams of data chunks, even when there are temporary or sustained concept drifts. We evaluate the performance of UIClust by comparing it with a recently published, high-quality incremental clustering algorithm. We use real and synthetic datasets. We compare the results by using well-known clustering evaluation criteria: entropy, sum of squared errors (SSE), and execution time. Our results show that UIClust outperforms the existing technique in all our experiments.
Fosil polen analizleri göl çökelleri, turbalıklar, akarsu ve deniz sedimanları, buzullar, linyitl... more Fosil polen analizleri göl çökelleri, turbalıklar, akarsu ve deniz sedimanları, buzullar, linyitler ve taş kömürleri gibi çeşitli ortamlardan elde edilen polenlerin_x000D_ araştırılmasını kapsamaktadır. Kuvaterner dönemine ait palinolojik çalışmaların önemli veri kaynaklarından biri de göllerdir. Araştırma alanı olarak seçilen_x000D_ Akgöl, Sakarya ilinde, Ferizli ilçesinin Gölkent mahallesinde bulunmaktadır. Gölün yüzölçümü 3,5 km2_x000D_ ve maksimum derinliği 8 m’dir. Bu çalışmanın amacı:_x000D_ gölün dip sedimanlarında fosil polen analizleri yaparak gölün çevresinde son 1000 yılda meydana gelen vejetasyon değişimlerini ortaya çıkarmaktır._x000D_ Akgöl’den karot alımında İTÜ EMCOL Araştırma Uygulama Merkezi’nin 4x4 m. platformlu piston karotiyeri kullanılmıştır. İstanbul Üniversitesi-Cerrahpaşa,_x000D_ Orman Fakültesi Orman Botaniği Anabilim Dalında bulunan Palinoloji Laboratuvarı’na getirilen karot üzerinde her 5 cm’de bir 2 cm3_x000D_ lük sediman örnekleri_x000D_ alınmıştır. Bu ...
IEEE Transactions on Services Computing
Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a ... more Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by transferring knowledge from the source domain having many labeled records. While existing methods often focus on one issue and leave the other one for the further work, TLF is capable of handling both issues simultaneously. In TLF, we alleviate feature discrepancy by identifying shared label distributions that act as the pivots to bridge the domains. We handle distribution divergence by simultaneously optimizing the structural risk functional, joint distributions between domains, and the manifold consistency underlying marginal distributions. Moreover, for the manifold consistency we exploit its intrinsic properties by identifying k nearest neighbors of a record, where the value of k is determined automatically in TLF. Furthermore, since negative transfer is not desired, we consider only the source records that are belonging to the source pivots during the knowledge transfer. We evaluate TLF on seven publicly available natural datasets and compare the performance of TLF against the performance of eleven state-of-the-art techniques. We also evaluate the effectiveness of TLF in some challenging situations. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.
Journal of Global Antimicrobial Resistance
IEEE Access
Precision agriculture represents the new age of conventional agriculture. This is made possible b... more Precision agriculture represents the new age of conventional agriculture. This is made possible by the advancement of various modern technologies such as the internet of things. The unparalleled potential for data collection and analytics has resulted in an increase in multidisciplinary research within machine learning and agriculture. However, the application of machine learning techniques to agriculture seems to be out of step with core machine learning research. This gap is further exacerbated by the inherent challenges associated with agricultural data. In this work, we conduct a systematic review of a large body of academic literature published between 2000 and 2022, on the application of machine learning techniques to agriculture. We identify and discuss some of the key data issues such as class imbalance, data sparsity and high dimensionality. Further, we study the impact of these data issues on various machine learning approaches within the context of agriculture. Finally, we identify some of the common pitfalls in the machine learning and agriculture research including the misapplication of machine learning evaluation techniques. To this end, this survey presents a holistic view on the state of affairs in the cross-domain of machine learning and agriculture and proposes some suitable mitigation strategies to address these challenges. INDEX TERMS Agriculture, digital farming, intelligent agriculture, machine learning, precision agriculture, precision farming.
Antibiotics
Background: Antibiotic exposure in the pediatric intensive care unit (PICU) is very high, althoug... more Background: Antibiotic exposure in the pediatric intensive care unit (PICU) is very high, although 50% of all antibiotics may be unnecessary. We aimed to determine the utility of simple bedside screening tools and predicting factors to avoid antibiotic overuse in the ICU among children with diarrhea and critical illness. Methods: We conducted a retrospective, single-center, case-control study that included children aged 2–59 months who were admitted to PICU with diarrhea and critical illness between 2017 and 2020. Results: We compared young children who did not receive antibiotics (cases, n = 164) during ICU stay to those treated with antibiotics (controls, n = 346). For predicting the ‘no antibiotic approach’, the sensitivity of a negative quick Sequential Organ Failure Assessment (qSOFA) was similar to quick Pediatric Logistic Organ Dysfunction-2 (qPELOD-2) and higher than Systemic Inflammatory Response Syndrome (SIRS). A negative qSOFA or qPELOD-2 score calculated during PICU adm...
ACM Computing Surveys
Data mining is the science of extracting information or “knowledge” from data. It is a task commo... more Data mining is the science of extracting information or “knowledge” from data. It is a task commonly executed on cloud computing resources, personal computers and laptops. However, what about smartphones? Despite the fact that these ubiquitous mobile devices now offer levels of hardware and performance approaching that of laptops, locally executed model-training using data mining methods on smartphones is still notably rare. On-device model-training offers a number of advantages. It largely mitigates issues of data security and privacy, since no data is required to leave the device. It also ensures a self-contained, fully portable data mining solution requiring no cloud computing or network resources and able to operate in any location. In this article, we focus on the intersection of smartphones and data mining. We investigate the growth in smartphone performance, survey smartphone usage models in previous research, and look at recent developments in locally executed data mining on...
Journal of Business and Economic Analysis, 2018
The purpose of this paper is to develop a conceptual model on organizational effectiveness in the... more The purpose of this paper is to develop a conceptual model on organizational effectiveness in the public sector by investigating its relationship with two concepts 1) the importance of knowledge sharing for ensuring organizational effectiveness and 2) the role of organizational leadership in creating knowledge sharing environment to result in organizational effectiveness. Knowledge sharing has been proliferated as a key process to organizational performance improvement. Similar to its counterpart, the public sector is as serious about the effectiveness of their organizations, however not many public managers and employees are receptive to the idea of knowledge sharing and this could be due to their distinctive character of confidentiality. Here, it could be argued that leadership has implication for creating that climate as it has locomotion function (i.e., facilitation of motivation and activation of employees to fulfill the goals) or a cohesion function (i.e., enabling collaborati...
Communications in Computer and Information Science, 2018
Decision Trees are well known classification algorithms that are also appreciated for their capac... more Decision Trees are well known classification algorithms that are also appreciated for their capacity for knowledge discovery. In the literature two major shortcomings of decision trees have been pointed out: (1) instability, and (2) high computational cost. These problems have been addressed to some extent through ensemble learning techniques such as Random Forest. Unlike decision trees where the whole attribute space of a dataset is used to discover the best test attribute for a node, in Random Forest a random subspace of attributes is first selected from which the test attribute for a node is then identified. The property that randomly selects an attribute subspace can cause the selection of all/many poor quality attributes in a subspace resulting in an individual tree with low accuracy. Therefore, in this paper we propose a probabilistic selection of attributes (instead of a random selection) where the probability of the selection of an attribute is proportionate to its quality. Although we developed this approach independently, after the research was completed we discovered that some existing techniques also took the same approach. While in this paper we use mutual information as a measure of an attribute quality, the papers in the literature used information gain ratio and a t-test as the measure. The proposed technique has been evaluated using nine different datasets and a stable performance can be seen in terms of the accuracy (ensemble accuracy and individual tree accuracy) and efficiency.
Cornell University - arXiv, Dec 24, 2015
In this paper, we explore how modifying data to preserve privacy affects the quality of the patte... more In this paper, we explore how modifying data to preserve privacy affects the quality of the patterns discoverable in the data. For any analysis of modified data to be worth doing, the data must be as close to the original as possible. Therein lies a problem-how does one make sure that modified data still contains the information it had before modification? This question is not the same as asking if an accurate classifier can be built from the modified data. Often in the literature, the prediction accuracy of a classifier made from modified (anonymized) data is used as evidence that the data is similar to the original. We demonstrate that this is not the case, and we propose a new methodology for measuring the retention of the patterns that existed in the original data. We then use our methodology to design three measures that can be easily implemented, each measuring aspects of the data that no pre-existing techniques can measure. These measures do not negate the usefulness of prediction accuracy or other measures-they are complementary to them, and support our argument that one measure is almost never enough.
Children, 2021
Congenital heart disease (CHD) is one of the most common types of birth defect with a high morbid... more Congenital heart disease (CHD) is one of the most common types of birth defect with a high morbidity and mortality, particularly in severely malnourished children under five. In this study, we aim to identify the predicting factors for CHD and their outcomes. 694 malnourished children under five years of age admitted between April 2015 and December 2017 constituted the study population. Of them, 64 were cases of CHD, and by comparison 630 were without CHD. CHD was diagnosed clinically and confirmed by echocardiogram. 64% of the cases had a single defect. Cases were more likely to be present with diarrhea, cough, respiratory distress, cyanosis, hypoxemia, hypoglycemia and hypernatremia on admission. The cases also had a high proportion of severe sepsis, bacteremia, heart failure, respiratory failure and death, compared to those without CHD. Cough (95% CI = 1.09–18.92), respiratory distress (95% CI = 1.46–5.39) and hypoxemia (95% CI = 1.59–6.86) were found to be the independent predic...
This paper measures profit efficiency and examines the effect of access to microfinance on the pe... more This paper measures profit efficiency and examines the effect of access to microfinance on the performance of rice firms in Bangladesh. An extended Cobb-Douglas stochastic frontier profit function was used to assess profit efficiency and profit loss of rice farmers in Bangladesh in a survey data of 360 farms throughout the 2008-2009 growing seasons. Model diagnostics reveal that serious selection bias exists that justifies the uses of sample selection model in stochastic frontier models. After effectively correcting for selectivity bias, the mean profit efficiency of the microfinance borrowers and non-borrowers were estimated at 68% and 52% respectively, thereby suggesting that a considerable share of profits were lost due to profit inefficiencies in rice production. The results from the inefficiency effect model show that households' age, extension visits, off-farm income, region and the farm size are the significant determinants of inefficiency. Some indicative policy recommen...
Alberta's lakes support important environmental, social and economic values. The effect of cumula... more Alberta's lakes support important environmental, social and economic values. The effect of cumulative allocations over time from lakes within a watershed may impact the health of the aquatic environment, which include fish and wildlife resources. Therefore, water withdrawals from lakes should be regulated in such a way so that ecosystems are preserved while balancing reliable, quality water supplies to sustain communities and economic and recreational opportunities. Accurate estimation of water availability (volume) in a lake requires a complete water balance study, which requires bathymetric information of the lake. However, only a fraction of Alberta lakes have surveyed bathymetry data. In support of provincial policy development and for quantifying the potential impacts of water withdrawal from lakes, approaches to estimating lake volume using limited available data were tested. In this study, we analysed available bathymetry data from 77 lakes and developed three different models to estimate maximum lake volume (a proxy of lake water availability) and 5% under ice volume (a proxy for winter allocation limit of lake water) assuming an ice thickness of 80 cm. These models have been developed in such a way that allows the user to apply the models based on data availability. These models can be used in absence of site specific data (e.g., bathymetry) to estimate volume, and subsequently water availability in lakes.
4th International Conference on Wireless, Intelligent and Distributed Environment for Communication, 2022
Measuring Data Quality: Predictive Accuracy vs. Similarity of Decision Trees Md. Zahidul Islam 1,... more Measuring Data Quality: Predictive Accuracy vs. Similarity of Decision Trees Md. Zahidul Islam 1, Payam Mamaani Barnaghi 2 and Ljiljana Brankovic1 School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, NSW 2308, Australia 1 ...
Advanced Data Mining and Applications, 2019
Smartphones have become the ultimate 'personal' computer, yet despite this, general-purpose data ... more Smartphones have become the ultimate 'personal' computer, yet despite this, general-purpose data mining and knowledge discovery tools for mobile devices are surprisingly rare. DataLearner is a new data mining application designed specifically for Android devices that imports the Weka data mining engine and augments it with algorithms developed by Charles Sturt University. Moreover, DataLearner can be expanded with additional algorithms. Combined, DataLearner delivers 40 classification, clustering and association rule mining algorithms for model training and evaluation without need for cloud computing resources or network connectivity. It provides the same classification accuracy as PCs and laptops, while doing so with acceptable processing speed and consuming negligible battery life. With its ability to provide easy-to-use data mining on a phone-size screen, DataLearner is a new portable, selfcontained data mining tool for remote, personalised and educational applications alike. DataLearner features four elementsthis paper, the app available on Google Play, the GPL3-licensed source code on GitHub and a short video on YouTube.