Shen-shyang Ho - Academia.edu (original) (raw)
Papers by Shen-shyang Ho
This paper describes a novel active learning strategy using universal p-value measures of confide... more This paper describes a novel active learning strategy using universal p-value measures of confidence based on algorithmic randomness, and transconductive inference. The early stopping criterion for active learning is based on the bias-variance tradeoff for classification. This corresponds to that learning instance when the boundary bias becomes positive, and requires one to switch from active to random selection of learning examples. The sign for the boundary and the increase in the classification error are two manifestations of the same phenomena, i.e., over-training. The experimental results presented show the feasibility and usefulness of our novel approach using a non-separable two-class classification problem. Our hybrid learning strategy achieves competitive performance against standard nearest neighbor methods using much fewer training examples.
Uncertainty in Artificial Intelligence, 2005
A martingale framework for concept change detection based on testing data exchangeabil- ity was r... more A martingale framework for concept change detection based on testing data exchangeabil- ity was recently proposed (Ho, 2005). In this paper, we describe the proposed change- detection test based on the Doob's Maximal Inequality and show that it is an approx- imation of the sequential probability ratio test (SPRT). The relationship between the threshold value used in the proposed test
International Joint Conference on Artificial Intelligence, 2005
A martingale framework is proposed to enable support vector machine (SVM) to adapt to time- varyi... more A martingale framework is proposed to enable support vector machine (SVM) to adapt to time- varying data streams. The adaptive SVM is a one- pass incremental algorithm that (i) does not require a sliding window on the data stream, (ii) does not require monitoring the performance of the classifier as data points are streaming, and (iii) works well for high
Abstract-Current techniques for cyclone detection and tracking employ NCEP (National Centers for ... more Abstract-Current techniques for cyclone detection and tracking employ NCEP (National Centers for Environmental Prediction) models from in-situ measurements. This solution does not provide global coverage, unlike remote satellite observations. However it is impractical to use a single Earth orbiting satellite to detect and track events such as cyclones in a continuous manner due to limited spatial and temporal coverage. One
International Joint Conference on Artificial Intelligence, 2007
The martingale framework for detecting changes in data stream, currently only applicable to label... more The martingale framework for detecting changes in data stream, currently only applicable to labeled data, is extended here to unlabeled data using clus- tering concept. The one-pass incremental change- detection algorithm (i) does not require a sliding window on the data stream, (ii) does not require monitoring the performance of the clustering algo- rithm as data points are streaming, and
IEEE International Conference on Data Mining, 2000
A practical issue in the existing transduction methods is expensive and inefficient computation c... more A practical issue in the existing transduction methods is expensive and inefficient computation compared to induc- tion methods. This has hindered the use of transduction methods in temporal and real-time data mining. In this paper, we introduce a fast incremental transduc- tive confidence machine (TCM) based on adiabatic incre- mental support vector machine (SVM) such that critical in- formation from
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, 2008
Current techniques for cyclone detection and tracking employ NCEP (National Centers for Environme... more Current techniques for cyclone detection and tracking employ NCEP (National Centers for Environmental Prediction) models from in-situ measurements. This solution does not provide true global coverage, unlike remote satellite observations. However it is impractical to use a single Earth orbiting satellite to detect and track events such as cyclones in a continuous manner due to limited spatial and temporal coverage. One solution to alleviate such persistent problems is to utilize heterogeneous sensor data from multiple orbiting satellites. However, this solution requires overcoming other new challenges such as varying spatial and temporal resolution between satellite sensor data, the need to establish correspondence between features from different satellite sensors, and the lack of definitive indicators for cyclone events in some sensor data.
Optical Pattern Recognition XX, 2009
ABSTRACT We describe an automated remote cyclone detection and tracking approach using heterogene... more ABSTRACT We describe an automated remote cyclone detection and tracking approach using heterogeneous data from multiple satellites. Single Earth orbiting satellite has been used in the past to detect and track events such as cyclones but suffer from major drawbacks due to limited spatio-temporal coverage. Our novel approach addresses the challenges in using heterogeneous data from multiple data sources for knowledge discovery, tracking and mining of cyclones. Moreover, it offers better detection performance and spatio-temporal resolutions. Our solution is sufficiently powerful that it generalizes to multiple sensor measurement modalities. Our approach consists of: (i) feature extraction from each sensor measurement, (ii) an ensemble classifier for cyclone detection, and (iii) knowledge sharing between the different remote sensor measurements. Our extensive experimental results demonstrate (i) the superior performance of our cyclone detector compared to previous work on preprocessed historical data, (ii) stable performance of our cyclone detector when it is applied on different geographical regions (Western Pacific Ocean and the North Atlantic Ocean), (iii) meaningful knowledge derived from the cyclone detector output, and (iv) the performance quality of our automated cyclone detection and tracking solution closely match the cyclone best track information from the National Hurricane Center.
Proceedings of the International Joint Conference on Neural Networks, 2003., 2003
This paper describes a novel active learning strategy using universal p-value measures of confide... more This paper describes a novel active learning strategy using universal p-value measures of confidence based on algorithmic randomness, and transconductive inference. The early stopping criterion for active learning is based on the bias-variance tradeoff for classification. This corresponds to that learning instance when the boundary bias becomes positive, and requires one to switch from active to random selection of learning examples. The sign for the boundary and the increase in the classification error are two manifestations of the same phenomena, i.e., over-training. The experimental results presented show the feasibility and usefulness of our novel approach using a non-separable two-class classification problem. Our hybrid learning strategy achieves competitive performance against standard nearest neighbor methods using much fewer training examples.
Seventh IEEE International Conference on Data Mining (ICDM 2007), 2007
We present a novel machine learning algorithm to identify relevant objects from a large amount of... more We present a novel machine learning algorithm to identify relevant objects from a large amount of data. This approach is driven by linear discrimination based on Nonlinear Rescaling (NR) method and transductive inference. The NR algorithm for linear discrimination (NRLD) computes both the primal and the dual approximation at each step. The dual variables associated with the given labeled dataset provide important information about the objects in the data-set and play the key role in ordering these objects. A confidence score based on a transductive inference procedure using NRLD is used to rank and identify the relevant objects from a pool of unlabeled data. Experimental results on an unbalanced protein data-set for the drug target prioritization and identification problem are used to illustrate the feasibility of the proposed identification algorithm.
IEEE transactions on neural networks and learning systems, Jan 13, 2015
Multivariate variable-length sequence data are becoming ubiquitous with the technological advance... more Multivariate variable-length sequence data are becoming ubiquitous with the technological advancement in mobile devices and sensor networks. Such data are difficult to compare, visualize, and analyze due to the nonmetric nature of data sequence similarity measures. In this paper, we propose a general manifold learning framework for arbitrary-length multivariate data sequences driven by similarity/distance (parameter) learning in both the original data sequence space and the learned manifold. Our proposed algorithm transforms the data sequences in a nonmetric data sequence space into feature vectors in a manifold that preserves the data sequence space structure. In particular, the feature vectors in the manifold representing similar data sequences remain close to one another and far from the feature points corresponding to dissimilar data sequences. To achieve this objective, we assume a semisupervised setting where we have knowledge about whether some of data sequences are similar o...
To track a cyclone using a single orbiting satellite in a continuous manner is impractical as it ... more To track a cyclone using a single orbiting satellite in a continuous manner is impractical as it has limited spa-tial and temporal coverage. One solution is to use mul-tiple orbiting satellites for cyclone tracking. However, data from some orbiting satellites do not provide fea-tures as useful as other satellites in identifying cyclones. Moreover, satellite data containing strong cyclone dis-criminating features is affected by coarse temporal res-olution and object occlusion while satellite data con-taining weak cyclone features does not have positive examples for cyclone identification. In this paper, we propose a methodology for spatial-temporal knowledge transfer to enable cyclone identification and detection using data with weak features in a multiple data sources setting. This approach also minimizes the negative ef-fect of coarse temporal resolution and occlusion when only the satellite data containing strong cyclone dis-criminating features is used. Experimental results are p...
Large amount of archived unannotated satellite data is publicly available. The automated retrieva... more Large amount of archived unannotated satellite data is publicly available. The automated retrieval of satellite data from these public domains based on ad-hoc user request is extremely useful to scien-tists for analysis and as evidence to support scien-tific hypotheses on weather phenomenon. One ef-ficient approach for such data/information retrieval is to use the publicly available information about weather events (presented as text) to assist in bet-ter and more accurate identification of the rele-vant satellite data. Furthermore, by identifying the weather event in the satellite datasets that is avail-able at finer temporal scales, one can fill in the sig-nificant "information gaps" in the text data. In this paper, we describe an approach for cross-media cyclone track summarization and cyclone eye automated annotation using publicly available satellite data and cyclone information on the World Wide Web. Using a hurricane event as an example, we show (i) automated cyclon...
Lecture Notes in Computer Science, 2010
One challenge in Earth science research is the accurate and efficient ad-hoc query and retrieval ... more One challenge in Earth science research is the accurate and efficient ad-hoc query and retrieval of Earth science satellite sensor data based on user-defined criteria to study and analyze atmospheric events such as tropical cyclones. The problem can be formulated as a spatiotemporal join query to identify the spatio-temporal location where moving sensor objects and dynamic atmospheric event objects intersect, either precisely or within a user-defined proximity. In this paper, we describe an efficient query and retrieval framework to handle the problem of identifying the spatio-temporal intersecting positions for satellite sensor data retrieval. We demonstrate the effectiveness of our proposed framework using sensor measurements from QuikSCAT (wind field measurement) and TRMM (precipitation vertical profile measurements) satellites, and the trajectories of the tropical cyclones occurring in the North Atlantic Ocean in 2009.
Proceedings of the 4th ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS - SPRINGL '11, 2011
One main concern for individuals to participate in the data collection of personal location histo... more One main concern for individuals to participate in the data collection of personal location history records is the disclosure of their location and related information when a user queries for statistical or pattern mining results derived from these records. In this paper, we investigate how the privacy goal that the inclusion of one's location history in a statistical database with
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '09, 2009
We present an automated cyclone tracking system that uses images from multiple satellite sources.... more We present an automated cyclone tracking system that uses images from multiple satellite sources. The system tracks cyclones using infrared images from a Geostationary Operational Environmental Satellite �GOES), precipitation images derived from five satellite sources, and ocean surface wind field satellite images. The system consists of three main components: �i) data preprocessing steps for each data source, �ii) cyclone eye detection algorithms for each data source, and �iii) a filter-based tracker that integrates the eye detection results from each data source. Experimental results show that our prototype system is operationally feasible and has better performance than our prior cyclone tracking system.
Intelligent Information Management, 2011
... to accuracy performance, can be further established between information theory and statistica... more ... to accuracy performance, can be further established between information theory and statistical learning theory ... Additional relations that link the strangeness and the Bayesian approach using the likelihood ... specific/given working set W. Test data is not merely a passive collection ...
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10, 2010
... 91109 wqt@pacific.jpl.nasa.gov W. Timothy Liu Jet Propulsion Laboratory California Institute ... more ... 91109 wqt@pacific.jpl.nasa.gov W. Timothy Liu Jet Propulsion Laboratory California Institute of Technology 4800 Oak Grove Dr., 300-323 Pasadena CA 91109 wtliu@jpl.nasa.gov ABSTRACT The Earth Observing System Data ...
2014 IEEE 15th International Conference on Mobile Data Management, 2014
In this paper, we propose a general smartphone user activity prediction framework utilizing the g... more In this paper, we propose a general smartphone user activity prediction framework utilizing the general concept of partial repetitive behavior (instead of the stronger periodicity condition) for similarity scoring and the landmark behaviors (representative behaviors to identify groups of similar behavior vectors). Prediction of the next-day(s) behavior is based on a weighted sum of the most similar behavior vectors related to the landmark behavior of the next-day(s) behavior. These behavior vectors are selected based on the likely partial repetition of the next-day behavior and similarity in the eigenbehavior feature space. Our proposed prediction algorithm allows one to categorically quantify the frequency of a target behavior, such as no behavior, normal behavior, and high frequency behavior, or other more refined categorization based on user preference. Extensive experiments are carried out using the Nokia Mobile Data Challenge (MDC) dataset to demonstrate the feasibility of our proposed approach and its generality using arbitrary call activity, voice call activity, short message activity, media consumption, and apps usage data types.
BMC Genomics, 2014
Background: Clonal expansion is a process in which a single organism reproduces asexually, giving... more Background: Clonal expansion is a process in which a single organism reproduces asexually, giving rise to a diversifying population. It is pervasive in nature, from within-host pathogen evolution to emergent infectious disease outbreaks. Standard phylogenetic tools rely on full-length genomes of individual pathogens or population consensus sequences (phased genotypes). Although high-throughput sequencing technologies are able to sample population diversity, the short sequence reads inherent to them preclude assessing whether two reads originate from the same clone (unphased genotypes). This obstacle severely limits the application of phylogenetic methods and investigation of within-host dynamics of acute infections using this rich data source.
This paper describes a novel active learning strategy using universal p-value measures of confide... more This paper describes a novel active learning strategy using universal p-value measures of confidence based on algorithmic randomness, and transconductive inference. The early stopping criterion for active learning is based on the bias-variance tradeoff for classification. This corresponds to that learning instance when the boundary bias becomes positive, and requires one to switch from active to random selection of learning examples. The sign for the boundary and the increase in the classification error are two manifestations of the same phenomena, i.e., over-training. The experimental results presented show the feasibility and usefulness of our novel approach using a non-separable two-class classification problem. Our hybrid learning strategy achieves competitive performance against standard nearest neighbor methods using much fewer training examples.
Uncertainty in Artificial Intelligence, 2005
A martingale framework for concept change detection based on testing data exchangeabil- ity was r... more A martingale framework for concept change detection based on testing data exchangeabil- ity was recently proposed (Ho, 2005). In this paper, we describe the proposed change- detection test based on the Doob's Maximal Inequality and show that it is an approx- imation of the sequential probability ratio test (SPRT). The relationship between the threshold value used in the proposed test
International Joint Conference on Artificial Intelligence, 2005
A martingale framework is proposed to enable support vector machine (SVM) to adapt to time- varyi... more A martingale framework is proposed to enable support vector machine (SVM) to adapt to time- varying data streams. The adaptive SVM is a one- pass incremental algorithm that (i) does not require a sliding window on the data stream, (ii) does not require monitoring the performance of the classifier as data points are streaming, and (iii) works well for high
Abstract-Current techniques for cyclone detection and tracking employ NCEP (National Centers for ... more Abstract-Current techniques for cyclone detection and tracking employ NCEP (National Centers for Environmental Prediction) models from in-situ measurements. This solution does not provide global coverage, unlike remote satellite observations. However it is impractical to use a single Earth orbiting satellite to detect and track events such as cyclones in a continuous manner due to limited spatial and temporal coverage. One
International Joint Conference on Artificial Intelligence, 2007
The martingale framework for detecting changes in data stream, currently only applicable to label... more The martingale framework for detecting changes in data stream, currently only applicable to labeled data, is extended here to unlabeled data using clus- tering concept. The one-pass incremental change- detection algorithm (i) does not require a sliding window on the data stream, (ii) does not require monitoring the performance of the clustering algo- rithm as data points are streaming, and
IEEE International Conference on Data Mining, 2000
A practical issue in the existing transduction methods is expensive and inefficient computation c... more A practical issue in the existing transduction methods is expensive and inefficient computation compared to induc- tion methods. This has hindered the use of transduction methods in temporal and real-time data mining. In this paper, we introduce a fast incremental transduc- tive confidence machine (TCM) based on adiabatic incre- mental support vector machine (SVM) such that critical in- formation from
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, 2008
Current techniques for cyclone detection and tracking employ NCEP (National Centers for Environme... more Current techniques for cyclone detection and tracking employ NCEP (National Centers for Environmental Prediction) models from in-situ measurements. This solution does not provide true global coverage, unlike remote satellite observations. However it is impractical to use a single Earth orbiting satellite to detect and track events such as cyclones in a continuous manner due to limited spatial and temporal coverage. One solution to alleviate such persistent problems is to utilize heterogeneous sensor data from multiple orbiting satellites. However, this solution requires overcoming other new challenges such as varying spatial and temporal resolution between satellite sensor data, the need to establish correspondence between features from different satellite sensors, and the lack of definitive indicators for cyclone events in some sensor data.
Optical Pattern Recognition XX, 2009
ABSTRACT We describe an automated remote cyclone detection and tracking approach using heterogene... more ABSTRACT We describe an automated remote cyclone detection and tracking approach using heterogeneous data from multiple satellites. Single Earth orbiting satellite has been used in the past to detect and track events such as cyclones but suffer from major drawbacks due to limited spatio-temporal coverage. Our novel approach addresses the challenges in using heterogeneous data from multiple data sources for knowledge discovery, tracking and mining of cyclones. Moreover, it offers better detection performance and spatio-temporal resolutions. Our solution is sufficiently powerful that it generalizes to multiple sensor measurement modalities. Our approach consists of: (i) feature extraction from each sensor measurement, (ii) an ensemble classifier for cyclone detection, and (iii) knowledge sharing between the different remote sensor measurements. Our extensive experimental results demonstrate (i) the superior performance of our cyclone detector compared to previous work on preprocessed historical data, (ii) stable performance of our cyclone detector when it is applied on different geographical regions (Western Pacific Ocean and the North Atlantic Ocean), (iii) meaningful knowledge derived from the cyclone detector output, and (iv) the performance quality of our automated cyclone detection and tracking solution closely match the cyclone best track information from the National Hurricane Center.
Proceedings of the International Joint Conference on Neural Networks, 2003., 2003
This paper describes a novel active learning strategy using universal p-value measures of confide... more This paper describes a novel active learning strategy using universal p-value measures of confidence based on algorithmic randomness, and transconductive inference. The early stopping criterion for active learning is based on the bias-variance tradeoff for classification. This corresponds to that learning instance when the boundary bias becomes positive, and requires one to switch from active to random selection of learning examples. The sign for the boundary and the increase in the classification error are two manifestations of the same phenomena, i.e., over-training. The experimental results presented show the feasibility and usefulness of our novel approach using a non-separable two-class classification problem. Our hybrid learning strategy achieves competitive performance against standard nearest neighbor methods using much fewer training examples.
Seventh IEEE International Conference on Data Mining (ICDM 2007), 2007
We present a novel machine learning algorithm to identify relevant objects from a large amount of... more We present a novel machine learning algorithm to identify relevant objects from a large amount of data. This approach is driven by linear discrimination based on Nonlinear Rescaling (NR) method and transductive inference. The NR algorithm for linear discrimination (NRLD) computes both the primal and the dual approximation at each step. The dual variables associated with the given labeled dataset provide important information about the objects in the data-set and play the key role in ordering these objects. A confidence score based on a transductive inference procedure using NRLD is used to rank and identify the relevant objects from a pool of unlabeled data. Experimental results on an unbalanced protein data-set for the drug target prioritization and identification problem are used to illustrate the feasibility of the proposed identification algorithm.
IEEE transactions on neural networks and learning systems, Jan 13, 2015
Multivariate variable-length sequence data are becoming ubiquitous with the technological advance... more Multivariate variable-length sequence data are becoming ubiquitous with the technological advancement in mobile devices and sensor networks. Such data are difficult to compare, visualize, and analyze due to the nonmetric nature of data sequence similarity measures. In this paper, we propose a general manifold learning framework for arbitrary-length multivariate data sequences driven by similarity/distance (parameter) learning in both the original data sequence space and the learned manifold. Our proposed algorithm transforms the data sequences in a nonmetric data sequence space into feature vectors in a manifold that preserves the data sequence space structure. In particular, the feature vectors in the manifold representing similar data sequences remain close to one another and far from the feature points corresponding to dissimilar data sequences. To achieve this objective, we assume a semisupervised setting where we have knowledge about whether some of data sequences are similar o...
To track a cyclone using a single orbiting satellite in a continuous manner is impractical as it ... more To track a cyclone using a single orbiting satellite in a continuous manner is impractical as it has limited spa-tial and temporal coverage. One solution is to use mul-tiple orbiting satellites for cyclone tracking. However, data from some orbiting satellites do not provide fea-tures as useful as other satellites in identifying cyclones. Moreover, satellite data containing strong cyclone dis-criminating features is affected by coarse temporal res-olution and object occlusion while satellite data con-taining weak cyclone features does not have positive examples for cyclone identification. In this paper, we propose a methodology for spatial-temporal knowledge transfer to enable cyclone identification and detection using data with weak features in a multiple data sources setting. This approach also minimizes the negative ef-fect of coarse temporal resolution and occlusion when only the satellite data containing strong cyclone dis-criminating features is used. Experimental results are p...
Large amount of archived unannotated satellite data is publicly available. The automated retrieva... more Large amount of archived unannotated satellite data is publicly available. The automated retrieval of satellite data from these public domains based on ad-hoc user request is extremely useful to scien-tists for analysis and as evidence to support scien-tific hypotheses on weather phenomenon. One ef-ficient approach for such data/information retrieval is to use the publicly available information about weather events (presented as text) to assist in bet-ter and more accurate identification of the rele-vant satellite data. Furthermore, by identifying the weather event in the satellite datasets that is avail-able at finer temporal scales, one can fill in the sig-nificant "information gaps" in the text data. In this paper, we describe an approach for cross-media cyclone track summarization and cyclone eye automated annotation using publicly available satellite data and cyclone information on the World Wide Web. Using a hurricane event as an example, we show (i) automated cyclon...
Lecture Notes in Computer Science, 2010
One challenge in Earth science research is the accurate and efficient ad-hoc query and retrieval ... more One challenge in Earth science research is the accurate and efficient ad-hoc query and retrieval of Earth science satellite sensor data based on user-defined criteria to study and analyze atmospheric events such as tropical cyclones. The problem can be formulated as a spatiotemporal join query to identify the spatio-temporal location where moving sensor objects and dynamic atmospheric event objects intersect, either precisely or within a user-defined proximity. In this paper, we describe an efficient query and retrieval framework to handle the problem of identifying the spatio-temporal intersecting positions for satellite sensor data retrieval. We demonstrate the effectiveness of our proposed framework using sensor measurements from QuikSCAT (wind field measurement) and TRMM (precipitation vertical profile measurements) satellites, and the trajectories of the tropical cyclones occurring in the North Atlantic Ocean in 2009.
Proceedings of the 4th ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS - SPRINGL '11, 2011
One main concern for individuals to participate in the data collection of personal location histo... more One main concern for individuals to participate in the data collection of personal location history records is the disclosure of their location and related information when a user queries for statistical or pattern mining results derived from these records. In this paper, we investigate how the privacy goal that the inclusion of one's location history in a statistical database with
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - GIS '09, 2009
We present an automated cyclone tracking system that uses images from multiple satellite sources.... more We present an automated cyclone tracking system that uses images from multiple satellite sources. The system tracks cyclones using infrared images from a Geostationary Operational Environmental Satellite �GOES), precipitation images derived from five satellite sources, and ocean surface wind field satellite images. The system consists of three main components: �i) data preprocessing steps for each data source, �ii) cyclone eye detection algorithms for each data source, and �iii) a filter-based tracker that integrates the eye detection results from each data source. Experimental results show that our prototype system is operationally feasible and has better performance than our prior cyclone tracking system.
Intelligent Information Management, 2011
... to accuracy performance, can be further established between information theory and statistica... more ... to accuracy performance, can be further established between information theory and statistical learning theory ... Additional relations that link the strangeness and the Bayesian approach using the likelihood ... specific/given working set W. Test data is not merely a passive collection ...
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10, 2010
... 91109 wqt@pacific.jpl.nasa.gov W. Timothy Liu Jet Propulsion Laboratory California Institute ... more ... 91109 wqt@pacific.jpl.nasa.gov W. Timothy Liu Jet Propulsion Laboratory California Institute of Technology 4800 Oak Grove Dr., 300-323 Pasadena CA 91109 wtliu@jpl.nasa.gov ABSTRACT The Earth Observing System Data ...
2014 IEEE 15th International Conference on Mobile Data Management, 2014
In this paper, we propose a general smartphone user activity prediction framework utilizing the g... more In this paper, we propose a general smartphone user activity prediction framework utilizing the general concept of partial repetitive behavior (instead of the stronger periodicity condition) for similarity scoring and the landmark behaviors (representative behaviors to identify groups of similar behavior vectors). Prediction of the next-day(s) behavior is based on a weighted sum of the most similar behavior vectors related to the landmark behavior of the next-day(s) behavior. These behavior vectors are selected based on the likely partial repetition of the next-day behavior and similarity in the eigenbehavior feature space. Our proposed prediction algorithm allows one to categorically quantify the frequency of a target behavior, such as no behavior, normal behavior, and high frequency behavior, or other more refined categorization based on user preference. Extensive experiments are carried out using the Nokia Mobile Data Challenge (MDC) dataset to demonstrate the feasibility of our proposed approach and its generality using arbitrary call activity, voice call activity, short message activity, media consumption, and apps usage data types.
BMC Genomics, 2014
Background: Clonal expansion is a process in which a single organism reproduces asexually, giving... more Background: Clonal expansion is a process in which a single organism reproduces asexually, giving rise to a diversifying population. It is pervasive in nature, from within-host pathogen evolution to emergent infectious disease outbreaks. Standard phylogenetic tools rely on full-length genomes of individual pathogens or population consensus sequences (phased genotypes). Although high-throughput sequencing technologies are able to sample population diversity, the short sequence reads inherent to them preclude assessing whether two reads originate from the same clone (unphased genotypes). This obstacle severely limits the application of phylogenetic methods and investigation of within-host dynamics of acute infections using this rich data source.