Jaideep Srivastava - Academia.edu (original) (raw)
Papers by Jaideep Srivastava
Understanding dynamics of evolution in large social networks is an important problem. In this pap... more Understanding dynamics of evolution in large social networks is an important problem. In this paper, we characterize evolution in large multi-relational social networks. The proliferation of online media such as Twitter, Facebook, Orkut and MMORPGs 1 have created social networking data at an unprecedented scale. Sony's Everquest 2 is one such example. We used game multi-relational networks to reveal the dynamics of evolution in a multi-relational setting by macroscopic study of the game network. Macroscopic analysis involves fragmenting the network into smaller portions for studying the dynamics within these sub-networks, referred to as 'communities'. From an evolutionary perspective of multi-relational network analysis, we have made the following contributions. Specifically, we formulated and analyzed various metrics to capture evolutionary properties of networks. We find that co-evolution rates in trust based 'communities' are approximately 60% higher than the trade based 'communities'. We also find that the trust and trade connections within the 'communities' reduce as their size increases. Finally, we study the interrelation between the dynamics of trade and trust within 'communities' and find interesting results about the precursor relationship between the trade and the trust dynamics within the 'communities'.
Sleep, 2022
Study Objectives Upper airway stimulation (UAS) therapy is effective for a subset of obstructive ... more Study Objectives Upper airway stimulation (UAS) therapy is effective for a subset of obstructive sleep apnea (OSA) patients with continuous positive airway pressure (CPAP) intolerance. While overall adherence is high, some patients have suboptimal adherence, which limits efficacy. Our goal was to identify therapy usage patterns during the first 3 months of therapy to enable targeted strategies for improved adherence. Methods Therapy data was retrieved from 2098 patients for three months after device activation. Data included mean and standard deviation (SD) of hours of use, therapy pauses, hours from midnight the therapy was turned ON and OFF, percentage of missing days, and stimulation amplitude. Cluster analysis was performed using Gaussian mixture models that categorized patients into six main groups. Results The six groups and their prevalence can be summarized as Cluster 1A: Excellent Use (34%); Cluster 1B: Excellent Use with variable timing (23%); Cluster 2A: Good Use with mis...
Sleep, 2018
period. At 90 days, their compliance was similar to those who did not change first masks. While c... more period. At 90 days, their compliance was similar to those who did not change first masks. While changing a mask may take days/weeks out of the 90 day compliance window, compliance did not appear decreased by time taken to change to a second preferred mask. There was some evidence of increased mask replacement in Medicare or Medicaid patients. Support (If Any): none.
Journal of Advertising, 2022
Data structured in the form of overlapping or non-overlapping sets is found in a variety of domai... more Data structured in the form of overlapping or non-overlapping sets is found in a variety of domains, sometimes explicitly but often subtly. For example, teams, which are of prime importance in social science studies are "sets of individuals"; "item sets" in pattern mining are sets and for various types of analysis in language studies a sentence can be considered as a "set or bag of words". Although building models and inference algorithms for structured data has been an important task in the fields of machine learning and statistics, research on "set-like" data still remains less explored. Relationships between pairs of elements can be modeled as edges in a graph. However, for modeling relationships that involve all members of a set, a hyperedge is a more natural representation. In this work, we focus on the problem of embedding hyperedges in a hypergraph (a network of overlapping sets) to a low dimensional vector space. We propose a probabilistic deep-learning based method as well as a tensor-based algebraic model, both of which capture the hypergraph structure in a principled manner without loosing set-level information. Our central focus is to highlight the connection between hypergraphs (topology), tensors (algebra) and probabilistic models. We present a number of baselines, some of which adapt existing node-level embedding models to the hyperedge-level, as well as sequence based language techniques which are adapted for set structured hypergraph topology. The performance is evaluated with a network of social groups and a network of word phrases. Our experiments show that accuracy wise our methods perform similar to those of baselines which are not designed for hypergraphs. Moreover, our tensor based method is more efficient than deep-learning based auto-encoder method. We, therefore, argue that the proposed methods are more generic methods suitable for hypergraphs (and therefore also for graphs) that preserve accuracy and efficiency.
IEEE Transactions on Knowledge and Data Engineering, Jun 1, 2021
Intelligent transportation systems are a key component in smart cities, and the estimation and pr... more Intelligent transportation systems are a key component in smart cities, and the estimation and prediction of the spatiotemporal traffic state is critical to capture the dynamics of traffic congestion, i.e., its generation, propagation and mitigation, in order to increase operational efficiency and improve livability within smart cities. And while spatiotemporal data related to traffic is becoming common place due to the wide availability of cheap sensors and the rapid deployment of IoT platforms, the data still suffer some challenges related to sparsity, incompleteness, and noise which makes the traffic analytics difficult. In this article, we investigate the problem of missing data or noisy information in the context of real-time monitoring and forecasting of traffic congestion for road networks in a city. The road network is represented as a directed graph in which nodes are junctions (intersections) and edges are road segments. We assume that the city has deployed high-fidelity sensors for speed reading in a subset of edges; and the objective is to infer the speed readings for the remaining edges in the network; and to estimate the missing values in the segments for which sensors have stopped generating data due to technical problems (e.g., battery, network, etc.). We propose a tensor representation for the series of road network snapshots, and develop a regularized factorization method to estimate the missing values, while learning the latent factors of the network. The regularizer, which incorporates spatial properties of the road network, improves the quality of the results. The learned factors, with a graph-based temporal dependency, are then used in an autoregressive algorithm to predict the future state of the road network with a large horizon. Extensive numerical experiments with real traffic data from the cities of Doha (Qatar) and Aarhus (Denmark) demonstrate that the proposed approach is appropriate for imputing the missing data and predicting the traffic state. It is accurate and efficient and can easily be applied to other traffic datasets.
arXiv (Cornell University), Jun 19, 2017
Assigning relevant keywords to documents is very important for efficient retrieval, clustering an... more Assigning relevant keywords to documents is very important for efficient retrieval, clustering and management of the documents. Especially with the web corpus deluged with digital documents, automation of this task is of prime importance. Keyword assignment is a broad topic of research which refers to tagging of document with keywords, key-phrases or topics. For text documents, the keyword assignment techniques have been developed under two sub-topics: automatic keyword extraction (AKE) and automatic key-phrase abstraction. However, the approaches developed in the literature for full text documents cannot be used to assign keywords to low text content documents like twitter feeds, news clips, product reviews or even short scholarly text. In this work, we point out several practical challenges encountered in tagging such low text content documents. As a solution to these challenges, we show that the proposed approaches which leverage knowledge from several open source web resources enhance the quality of the tags (keywords) assigned to the low text content documents. The performance of the proposed approach is tested on real world corpus consisting of scholarly documents with text content ranging from only the text in the title of the document (5-10 words) to the summary text/abstract (100-150 words). We find that the proposed approach not just improves the accuracy of keyword assignment but offer a computationally efficient solution which can be used in real world applications.
arXiv (Cornell University), Oct 26, 2021
The quality of sleep has a deep impact on people's physical and mental health. People with insuff... more The quality of sleep has a deep impact on people's physical and mental health. People with insufficient sleep are more likely to report physical and mental distress, activity limitation, anxiety, and pain. Moreover, in the past few years, there has been an explosion of applications and devices for activity monitoring and health tracking. Signals collected from these wearable devices can be used to study and improve sleep quality. In this paper, we utilize the relationship between physical activity and sleep quality to find ways of assisting people improve their sleep using machine learning techniques. People usually have several behavior modes that their bio-functions can be divided into. Performing time series clustering on activity data, we find cluster centers that would correlate to the most evident behavior modes for a specific subject. Activity recipes are then generated for good sleep quality for each behavior mode within each cluster. These activity recipes are supplied to an activity recommendation engine for suggesting a mix of relaxed to intense activities to subjects during their daily routines. The recommendations are further personalized based on the subjects' lifestyle constraints, i.e. their age, gender, body mass index (BMI), resting heart rate, etc., with the objective of the recommendation being the improvement of that night's quality of sleep. This would in turn serve a longer-term health objective, like lowering heart rate, improving the overall quality of sleep, etc.
In an effort to curb air pollution, the city of Delhi (India), known to be one of the most popula... more In an effort to curb air pollution, the city of Delhi (India), known to be one of the most populated, polluted, and congested cities in the world has run a trial experiment in two phases of 15 days intervals. During the experiment, most of four-wheeled vehicles were constrained to move on alternate days based on whether their plate numbers ended with odd or even digits. While the local government of Delhi represented by A. Kejriwal (leader of AAP party) advocated for the benefits of the experiment, the prime minister of India, N. Modi (former leader of BJP) defended the inefficiency of the initiative. This later has led to a strong polarization of public opinion towards OddEven experiment. This real-world urban experiment provided the scientific community with a unique opportunity to study the impact of political leaning on humans perception at a large-scale. We collect data about pollution and traffic congestion to measure the real effectiveness of the experiment. We use Twitter to...
The precision matrix is the inverse of the covariance matrix. Estimating large sparse precision m... more The precision matrix is the inverse of the covariance matrix. Estimating large sparse precision matrices is an interesting and a challenging problem in many fields of sciences, engineering, humanities and machine learning problems in general. Recent applications often encounter high dimensionality with a limited number of data points leading to a number of covariance parameters that greatly exceeds the number of observations, and hence the singularity of the covariance matrix. Several methods have been proposed to deal with this challenging problem, but there is no guarantee that the obtained estimator is positive definite. Furthermore, in many cases, one needs to capture some additional information on the setting of the problem. In this paper, we introduce a criterion that ensures the positive definiteness of the precision matrix and we propose the inner-outer alternating direction method of multipliers as an efficient method for estimating it. We show that the convergence of the a...
ArXiv, 2021
Leading up to August 2020, COVID-19 has spread to almost every country in the world, causing mill... more Leading up to August 2020, COVID-19 has spread to almost every country in the world, causing millions of infected and hundreds of thousands of deaths. In this paper, we first verify the assumption that clinical variables could have time-varying effects on COVID-19 outcomes. Then, we develop a temporal stratification approach to make daily predictions on patients' outcome at the end of hospital stay. Training data is segmented by the remaining length of stay, which is a proxy for the patient's overall condition. Based on this, a sequence of predictive models are built, one for each time segment. Thanks to the publicly shared data, we were able to build and evaluate prototype models. Preliminary experiments show 0.98 AUROC, 0.91 F1 score and 0.97 AUPR on continuous deterioration prediction, encouraging further development of the model as well as validations on different datasets. We also verify the key assumption which motivates our method. Clinical variables could have time-v...
Qatar Foundation Annual Research Conference Proceedings Volume 2018 Issue 3, 2018
Obesity is one of the major health risk factors behind the rise of non-communicable conditions. U... more Obesity is one of the major health risk factors behind the rise of non-communicable conditions. Understanding the factors influencing obesity is very complex since there are many variables that can affect the health behaviors leading to it. Nowadays, multiple data sources can be used to study health behaviors, such as wearable sensors for physical activity and sleep, social media, mobile and health data. In this paper we describe the design of a dashboard for the visualization of actigraphy and biometric data from a childhood obesity camp in Qatar. This dashboard allows quantitative discoveries that can be used to guide patient behavior and orient qualitative research. CONTEXT Childhood obesity is a growing epidemic, and with technological advancements, new tools can be used to monitor and analyze lifestyle factors leading to obesity, which in turn can help in timely health behavior modifications. In this paper we present a tool for visualization of personal health data, which can a...
Computational Social Sciences, 2017
Analysis of performance of groups or teams is of a primary importance in field of social group st... more Analysis of performance of groups or teams is of a primary importance in field of social group studies. In this article we are targeting group performance analysis using computational techniques from machine learning. In order to understand the feature space, we make use of a combination of machine learning methods: decision trees, feature selection as well as correlation analysis. These models are chosen for their easy interpretability. Alongside we also propose methodology to build group level metrics from individual level data. This helps us interpret the feature space at group level and understand how things like attribute variety among group members affects performance. We propose a full methodology that employs machine learning models taking various group level metrics as input, finally providing a thorough analysis of the feature space. In this research we employ the NATO dataset collected using the game-based test-bed called SABRE. We give a hands-on experience by performing a four phase exhaustive group analysis on the SABRE dataset using Weka software, which is a user friendly GUI based machine learning tool.
2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 2016
Human Activity Recognition (HAR) is a powerful tool for understanding human behaviour. Applying H... more Human Activity Recognition (HAR) is a powerful tool for understanding human behaviour. Applying HAR to wearable sensors can (1) provide new insights by enriching the feature set in health studies, and (2) enhance the personalisation and effectiveness of health, wellness, and fitness applications. Wearable devices provide an unobtrusive platform for user monitoring, and due to their increasing market penetration, feel intrinsic to the wearer. The integration of these devices in daily life provide a unique opportunity for understanding human health and wellbeing. This is referred to as the "quantified self" movement. The analyses of complex health behaviours such as sleep, traditionally require a time-consuming manual interpretation by experts. This manual work is necessary due to the erratic periodicity and persistent noisiness of human behaviour. In this paper, we present a robust automated human activity recognition algorithm, which we call RAHAR. We test our algorithm in the application area of sleep research by providing a novel framework for evaluating sleep quality and examining the correlation between the aforementioned and an individual's physical activity. Our results improve the state-of-the-art procedure in sleep research by 15% for area under ROC and by 30% for F1 score on average. However, application of RAHAR is not limited to sleep analysis and can be used for understanding other health problems such as obesity, diabetes, and cardiac diseases.
JMIR mHealth and uHealth, 2016
Journal of Advertising, 2020
Over the past 40 years, we have witnessed seismic shifts in advertising planning and buying proce... more Over the past 40 years, we have witnessed seismic shifts in advertising planning and buying processes. Due in no small part to the emergence of digital media, consumer choices have mushroomed, while advertisers understand much more about target audiences. Advertising activities have been drastically transformed by the possibilities that technology creates for targeting and measurement, automation of activities via programmatic advertising, and an overall computational approach in which algorithmic, data-driven decisions dominate. In this era, what does it mean to "do media planning" and to do it well? The present article argues for planning decisions to move away from simply purchasing exposure to instead focusing on fostering engagement through meaningful and sustained interactions with consumers. It provides an overview of the digital ecosystem that makes computational advertising possible, updates the notion of consumer engagement for this context, and reviews how measurement becomes more central to media planning decisions. Ethical and normative considerations and computational advertising as an adaptive learning system are discussed as crosscutting issues, followed by a proposed research agenda.
This paper provides details and implementation experiences of a multimedia programming language a... more This paper provides details and implementation experiences of a multimedia programming language and associated toolkits. The language, a data-flow paradigm for multimedia streams, consists of blocks of code that can be connected through their data ports. Continuous media flows through these ports into and out of blocks. The blocks are responsible for the processing of continuous media data. Examples of such processing include capturing, displaying, storing, retrieving and analyzing their contents. The blocks also have parameter ports that specify other pertinent parameters, such as location, and display characteristics such as geometry, etc. The connection topology of blocks is spec$ed using a graphical editor called the Program Development Tool (PDT) and the geometric parameters are spec$ed by using another graphical editor called the User Interface Development Tool (UIDT). Experience with modeling multimedia presentations in our environment and the enhancements provided by the two graphical editors are discussed in detail.
Proceedings of SPIE, Dec 19, 2003
ABSTRACT
This paper provides details and implementation experiences of a multimedia programming language a... more This paper provides details and implementation experiences of a multimedia programming language and associated toolkits. The language, a data-flow paradigm for multimedia streams, consists of blocks of code that can be connected through their data ports. Continuous media flows through these ports into and out of blocks. The blocks are responsible for the processing of continuous media data. Examples of such processing include capturing, displaying, storing, retrieving and analyzing their contents. The blocks also have parameter ports that specify other pertinent parameters, such as location, and display characteristics such as geometry, etc. The connection topology of blocks is spec$ed using a graphical editor called the Program Development Tool (PDT) and the geometric parameters are spec$ed by using another graphical editor called the User Interface Development Tool (UIDT). Experience with modeling multimedia presentations in our environment and the enhancements provided by the two graphical editors are discussed in detail.
Understanding dynamics of evolution in large social networks is an important problem. In this pap... more Understanding dynamics of evolution in large social networks is an important problem. In this paper, we characterize evolution in large multi-relational social networks. The proliferation of online media such as Twitter, Facebook, Orkut and MMORPGs 1 have created social networking data at an unprecedented scale. Sony's Everquest 2 is one such example. We used game multi-relational networks to reveal the dynamics of evolution in a multi-relational setting by macroscopic study of the game network. Macroscopic analysis involves fragmenting the network into smaller portions for studying the dynamics within these sub-networks, referred to as 'communities'. From an evolutionary perspective of multi-relational network analysis, we have made the following contributions. Specifically, we formulated and analyzed various metrics to capture evolutionary properties of networks. We find that co-evolution rates in trust based 'communities' are approximately 60% higher than the trade based 'communities'. We also find that the trust and trade connections within the 'communities' reduce as their size increases. Finally, we study the interrelation between the dynamics of trade and trust within 'communities' and find interesting results about the precursor relationship between the trade and the trust dynamics within the 'communities'.
Sleep, 2022
Study Objectives Upper airway stimulation (UAS) therapy is effective for a subset of obstructive ... more Study Objectives Upper airway stimulation (UAS) therapy is effective for a subset of obstructive sleep apnea (OSA) patients with continuous positive airway pressure (CPAP) intolerance. While overall adherence is high, some patients have suboptimal adherence, which limits efficacy. Our goal was to identify therapy usage patterns during the first 3 months of therapy to enable targeted strategies for improved adherence. Methods Therapy data was retrieved from 2098 patients for three months after device activation. Data included mean and standard deviation (SD) of hours of use, therapy pauses, hours from midnight the therapy was turned ON and OFF, percentage of missing days, and stimulation amplitude. Cluster analysis was performed using Gaussian mixture models that categorized patients into six main groups. Results The six groups and their prevalence can be summarized as Cluster 1A: Excellent Use (34%); Cluster 1B: Excellent Use with variable timing (23%); Cluster 2A: Good Use with mis...
Sleep, 2018
period. At 90 days, their compliance was similar to those who did not change first masks. While c... more period. At 90 days, their compliance was similar to those who did not change first masks. While changing a mask may take days/weeks out of the 90 day compliance window, compliance did not appear decreased by time taken to change to a second preferred mask. There was some evidence of increased mask replacement in Medicare or Medicaid patients. Support (If Any): none.
Journal of Advertising, 2022
Data structured in the form of overlapping or non-overlapping sets is found in a variety of domai... more Data structured in the form of overlapping or non-overlapping sets is found in a variety of domains, sometimes explicitly but often subtly. For example, teams, which are of prime importance in social science studies are "sets of individuals"; "item sets" in pattern mining are sets and for various types of analysis in language studies a sentence can be considered as a "set or bag of words". Although building models and inference algorithms for structured data has been an important task in the fields of machine learning and statistics, research on "set-like" data still remains less explored. Relationships between pairs of elements can be modeled as edges in a graph. However, for modeling relationships that involve all members of a set, a hyperedge is a more natural representation. In this work, we focus on the problem of embedding hyperedges in a hypergraph (a network of overlapping sets) to a low dimensional vector space. We propose a probabilistic deep-learning based method as well as a tensor-based algebraic model, both of which capture the hypergraph structure in a principled manner without loosing set-level information. Our central focus is to highlight the connection between hypergraphs (topology), tensors (algebra) and probabilistic models. We present a number of baselines, some of which adapt existing node-level embedding models to the hyperedge-level, as well as sequence based language techniques which are adapted for set structured hypergraph topology. The performance is evaluated with a network of social groups and a network of word phrases. Our experiments show that accuracy wise our methods perform similar to those of baselines which are not designed for hypergraphs. Moreover, our tensor based method is more efficient than deep-learning based auto-encoder method. We, therefore, argue that the proposed methods are more generic methods suitable for hypergraphs (and therefore also for graphs) that preserve accuracy and efficiency.
IEEE Transactions on Knowledge and Data Engineering, Jun 1, 2021
Intelligent transportation systems are a key component in smart cities, and the estimation and pr... more Intelligent transportation systems are a key component in smart cities, and the estimation and prediction of the spatiotemporal traffic state is critical to capture the dynamics of traffic congestion, i.e., its generation, propagation and mitigation, in order to increase operational efficiency and improve livability within smart cities. And while spatiotemporal data related to traffic is becoming common place due to the wide availability of cheap sensors and the rapid deployment of IoT platforms, the data still suffer some challenges related to sparsity, incompleteness, and noise which makes the traffic analytics difficult. In this article, we investigate the problem of missing data or noisy information in the context of real-time monitoring and forecasting of traffic congestion for road networks in a city. The road network is represented as a directed graph in which nodes are junctions (intersections) and edges are road segments. We assume that the city has deployed high-fidelity sensors for speed reading in a subset of edges; and the objective is to infer the speed readings for the remaining edges in the network; and to estimate the missing values in the segments for which sensors have stopped generating data due to technical problems (e.g., battery, network, etc.). We propose a tensor representation for the series of road network snapshots, and develop a regularized factorization method to estimate the missing values, while learning the latent factors of the network. The regularizer, which incorporates spatial properties of the road network, improves the quality of the results. The learned factors, with a graph-based temporal dependency, are then used in an autoregressive algorithm to predict the future state of the road network with a large horizon. Extensive numerical experiments with real traffic data from the cities of Doha (Qatar) and Aarhus (Denmark) demonstrate that the proposed approach is appropriate for imputing the missing data and predicting the traffic state. It is accurate and efficient and can easily be applied to other traffic datasets.
arXiv (Cornell University), Jun 19, 2017
Assigning relevant keywords to documents is very important for efficient retrieval, clustering an... more Assigning relevant keywords to documents is very important for efficient retrieval, clustering and management of the documents. Especially with the web corpus deluged with digital documents, automation of this task is of prime importance. Keyword assignment is a broad topic of research which refers to tagging of document with keywords, key-phrases or topics. For text documents, the keyword assignment techniques have been developed under two sub-topics: automatic keyword extraction (AKE) and automatic key-phrase abstraction. However, the approaches developed in the literature for full text documents cannot be used to assign keywords to low text content documents like twitter feeds, news clips, product reviews or even short scholarly text. In this work, we point out several practical challenges encountered in tagging such low text content documents. As a solution to these challenges, we show that the proposed approaches which leverage knowledge from several open source web resources enhance the quality of the tags (keywords) assigned to the low text content documents. The performance of the proposed approach is tested on real world corpus consisting of scholarly documents with text content ranging from only the text in the title of the document (5-10 words) to the summary text/abstract (100-150 words). We find that the proposed approach not just improves the accuracy of keyword assignment but offer a computationally efficient solution which can be used in real world applications.
arXiv (Cornell University), Oct 26, 2021
The quality of sleep has a deep impact on people's physical and mental health. People with insuff... more The quality of sleep has a deep impact on people's physical and mental health. People with insufficient sleep are more likely to report physical and mental distress, activity limitation, anxiety, and pain. Moreover, in the past few years, there has been an explosion of applications and devices for activity monitoring and health tracking. Signals collected from these wearable devices can be used to study and improve sleep quality. In this paper, we utilize the relationship between physical activity and sleep quality to find ways of assisting people improve their sleep using machine learning techniques. People usually have several behavior modes that their bio-functions can be divided into. Performing time series clustering on activity data, we find cluster centers that would correlate to the most evident behavior modes for a specific subject. Activity recipes are then generated for good sleep quality for each behavior mode within each cluster. These activity recipes are supplied to an activity recommendation engine for suggesting a mix of relaxed to intense activities to subjects during their daily routines. The recommendations are further personalized based on the subjects' lifestyle constraints, i.e. their age, gender, body mass index (BMI), resting heart rate, etc., with the objective of the recommendation being the improvement of that night's quality of sleep. This would in turn serve a longer-term health objective, like lowering heart rate, improving the overall quality of sleep, etc.
In an effort to curb air pollution, the city of Delhi (India), known to be one of the most popula... more In an effort to curb air pollution, the city of Delhi (India), known to be one of the most populated, polluted, and congested cities in the world has run a trial experiment in two phases of 15 days intervals. During the experiment, most of four-wheeled vehicles were constrained to move on alternate days based on whether their plate numbers ended with odd or even digits. While the local government of Delhi represented by A. Kejriwal (leader of AAP party) advocated for the benefits of the experiment, the prime minister of India, N. Modi (former leader of BJP) defended the inefficiency of the initiative. This later has led to a strong polarization of public opinion towards OddEven experiment. This real-world urban experiment provided the scientific community with a unique opportunity to study the impact of political leaning on humans perception at a large-scale. We collect data about pollution and traffic congestion to measure the real effectiveness of the experiment. We use Twitter to...
The precision matrix is the inverse of the covariance matrix. Estimating large sparse precision m... more The precision matrix is the inverse of the covariance matrix. Estimating large sparse precision matrices is an interesting and a challenging problem in many fields of sciences, engineering, humanities and machine learning problems in general. Recent applications often encounter high dimensionality with a limited number of data points leading to a number of covariance parameters that greatly exceeds the number of observations, and hence the singularity of the covariance matrix. Several methods have been proposed to deal with this challenging problem, but there is no guarantee that the obtained estimator is positive definite. Furthermore, in many cases, one needs to capture some additional information on the setting of the problem. In this paper, we introduce a criterion that ensures the positive definiteness of the precision matrix and we propose the inner-outer alternating direction method of multipliers as an efficient method for estimating it. We show that the convergence of the a...
ArXiv, 2021
Leading up to August 2020, COVID-19 has spread to almost every country in the world, causing mill... more Leading up to August 2020, COVID-19 has spread to almost every country in the world, causing millions of infected and hundreds of thousands of deaths. In this paper, we first verify the assumption that clinical variables could have time-varying effects on COVID-19 outcomes. Then, we develop a temporal stratification approach to make daily predictions on patients' outcome at the end of hospital stay. Training data is segmented by the remaining length of stay, which is a proxy for the patient's overall condition. Based on this, a sequence of predictive models are built, one for each time segment. Thanks to the publicly shared data, we were able to build and evaluate prototype models. Preliminary experiments show 0.98 AUROC, 0.91 F1 score and 0.97 AUPR on continuous deterioration prediction, encouraging further development of the model as well as validations on different datasets. We also verify the key assumption which motivates our method. Clinical variables could have time-v...
Qatar Foundation Annual Research Conference Proceedings Volume 2018 Issue 3, 2018
Obesity is one of the major health risk factors behind the rise of non-communicable conditions. U... more Obesity is one of the major health risk factors behind the rise of non-communicable conditions. Understanding the factors influencing obesity is very complex since there are many variables that can affect the health behaviors leading to it. Nowadays, multiple data sources can be used to study health behaviors, such as wearable sensors for physical activity and sleep, social media, mobile and health data. In this paper we describe the design of a dashboard for the visualization of actigraphy and biometric data from a childhood obesity camp in Qatar. This dashboard allows quantitative discoveries that can be used to guide patient behavior and orient qualitative research. CONTEXT Childhood obesity is a growing epidemic, and with technological advancements, new tools can be used to monitor and analyze lifestyle factors leading to obesity, which in turn can help in timely health behavior modifications. In this paper we present a tool for visualization of personal health data, which can a...
Computational Social Sciences, 2017
Analysis of performance of groups or teams is of a primary importance in field of social group st... more Analysis of performance of groups or teams is of a primary importance in field of social group studies. In this article we are targeting group performance analysis using computational techniques from machine learning. In order to understand the feature space, we make use of a combination of machine learning methods: decision trees, feature selection as well as correlation analysis. These models are chosen for their easy interpretability. Alongside we also propose methodology to build group level metrics from individual level data. This helps us interpret the feature space at group level and understand how things like attribute variety among group members affects performance. We propose a full methodology that employs machine learning models taking various group level metrics as input, finally providing a thorough analysis of the feature space. In this research we employ the NATO dataset collected using the game-based test-bed called SABRE. We give a hands-on experience by performing a four phase exhaustive group analysis on the SABRE dataset using Weka software, which is a user friendly GUI based machine learning tool.
2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 2016
Human Activity Recognition (HAR) is a powerful tool for understanding human behaviour. Applying H... more Human Activity Recognition (HAR) is a powerful tool for understanding human behaviour. Applying HAR to wearable sensors can (1) provide new insights by enriching the feature set in health studies, and (2) enhance the personalisation and effectiveness of health, wellness, and fitness applications. Wearable devices provide an unobtrusive platform for user monitoring, and due to their increasing market penetration, feel intrinsic to the wearer. The integration of these devices in daily life provide a unique opportunity for understanding human health and wellbeing. This is referred to as the "quantified self" movement. The analyses of complex health behaviours such as sleep, traditionally require a time-consuming manual interpretation by experts. This manual work is necessary due to the erratic periodicity and persistent noisiness of human behaviour. In this paper, we present a robust automated human activity recognition algorithm, which we call RAHAR. We test our algorithm in the application area of sleep research by providing a novel framework for evaluating sleep quality and examining the correlation between the aforementioned and an individual's physical activity. Our results improve the state-of-the-art procedure in sleep research by 15% for area under ROC and by 30% for F1 score on average. However, application of RAHAR is not limited to sleep analysis and can be used for understanding other health problems such as obesity, diabetes, and cardiac diseases.
JMIR mHealth and uHealth, 2016
Journal of Advertising, 2020
Over the past 40 years, we have witnessed seismic shifts in advertising planning and buying proce... more Over the past 40 years, we have witnessed seismic shifts in advertising planning and buying processes. Due in no small part to the emergence of digital media, consumer choices have mushroomed, while advertisers understand much more about target audiences. Advertising activities have been drastically transformed by the possibilities that technology creates for targeting and measurement, automation of activities via programmatic advertising, and an overall computational approach in which algorithmic, data-driven decisions dominate. In this era, what does it mean to "do media planning" and to do it well? The present article argues for planning decisions to move away from simply purchasing exposure to instead focusing on fostering engagement through meaningful and sustained interactions with consumers. It provides an overview of the digital ecosystem that makes computational advertising possible, updates the notion of consumer engagement for this context, and reviews how measurement becomes more central to media planning decisions. Ethical and normative considerations and computational advertising as an adaptive learning system are discussed as crosscutting issues, followed by a proposed research agenda.
This paper provides details and implementation experiences of a multimedia programming language a... more This paper provides details and implementation experiences of a multimedia programming language and associated toolkits. The language, a data-flow paradigm for multimedia streams, consists of blocks of code that can be connected through their data ports. Continuous media flows through these ports into and out of blocks. The blocks are responsible for the processing of continuous media data. Examples of such processing include capturing, displaying, storing, retrieving and analyzing their contents. The blocks also have parameter ports that specify other pertinent parameters, such as location, and display characteristics such as geometry, etc. The connection topology of blocks is spec$ed using a graphical editor called the Program Development Tool (PDT) and the geometric parameters are spec$ed by using another graphical editor called the User Interface Development Tool (UIDT). Experience with modeling multimedia presentations in our environment and the enhancements provided by the two graphical editors are discussed in detail.
Proceedings of SPIE, Dec 19, 2003
ABSTRACT
This paper provides details and implementation experiences of a multimedia programming language a... more This paper provides details and implementation experiences of a multimedia programming language and associated toolkits. The language, a data-flow paradigm for multimedia streams, consists of blocks of code that can be connected through their data ports. Continuous media flows through these ports into and out of blocks. The blocks are responsible for the processing of continuous media data. Examples of such processing include capturing, displaying, storing, retrieving and analyzing their contents. The blocks also have parameter ports that specify other pertinent parameters, such as location, and display characteristics such as geometry, etc. The connection topology of blocks is spec$ed using a graphical editor called the Program Development Tool (PDT) and the geometric parameters are spec$ed by using another graphical editor called the User Interface Development Tool (UIDT). Experience with modeling multimedia presentations in our environment and the enhancements provided by the two graphical editors are discussed in detail.