Yijun Ran - Academia.edu (original) (raw)

Papers by Yijun Ran

Research paper thumbnail of Epidemic Dynamics of Two-Pathogen Spreading for Pairwise Models

Mathematics

In the real world, pathogens do not exist in isolation. The transmission of one pathogen may be a... more In the real world, pathogens do not exist in isolation. The transmission of one pathogen may be affected by the presence of other pathogens, and certain pathogens generate multiple strains with different spreading features. Hence, the behavior of multi-pathogen transmission has attracted much attention in epidemiological research. In this paper, we use the pairwise approximation method to formulate two-pathogen models capturing cross-immunity, super-infection, and co-infection phenomena, in which each pathogen follows a susceptible-infected-susceptible (SIS) mechanism. For each model, we calculate the basic reproduction number and analyze the stability of equilibria, and discuss the differences from the mean-field approach. We demonstrate that simulations are in good agreement with the analytical results.

Research paper thumbnail of Peeking Strategy for Online News Diffusion Prediction Via Machine Learning

SSRN Electronic Journal, 2022

Research paper thumbnail of Measuring similarity in co-occurrence data using ego-networks

Chaos: An Interdisciplinary Journal of Nonlinear Science, 2020

The co-occurrence association is widely observed in many empirical data. Mining the information i... more The co-occurrence association is widely observed in many empirical data. Mining the information in co-occurrence data is essential for advancing our understanding of systems such as social networks, ecosystem, and brain network. Measuring similarity of entities is one of the important tasks, which can usually be achieved using a network-based approach. Here we show that traditional methods based on the aggregated network can bring unwanted in-directed relationship. To cope with this issue, we propose a similarity measure based on the ego network of each entity, which effectively considers the change of an entity's centrality from one ego network to another. The index proposed is easy to calculate and has a clear physical meaning. Using two different data sets, we compare the new index with other existing ones. We find that the new index outperforms the traditional network-based similarity measures, and it can sometimes surpass the embedding method. In the meanwhile, the measure by the new index is weakly correlated with those by other methods, hence providing a different dimension to quantify similarities in co-occurrence data. Altogether, our work makes an extension in the network-based similarity measure and can be potentially applied in several related tasks. The co-occurrence data refer to the type of data where multiple entities simultaneously occur in a single instance, such as the co-tags in folksonomy, the co-author of a scientific paper, co-activation of brain regions under a stimulus, and more. Measuring similarity between entities is fundamental to analyze co-occurrence data, allowing us to further explore social, brain or scientific systems. Using the ego network composed by the co-occurrence relationships as the backbone, we proposed a network-based similarity measure. The new approach outperforms traditional ones and can sometimes surpass the machine learning based embedding method, providing a good tool for tasks such as community detection, link prediction, recommendation. I. INTRODUCTION Many tasks in computer science, such as knowledge management 1,2 , community detection 3,4 , nature language processing 5,6 and link prediction 7,8 , require the measure of similarity between two entities. This can be achieved via different methods based on the nature of the problem analyzed. The similarity would be most straightforward to calculate if the features of the two entities are already mapped into a high dimensional space. Nevertheless, the embedding itself is usually a hard problem and in many cases without a clear physical explanation. Hence, other methods that do not directly use feature vectors are also widely used because of their simplicity and interpretability. For example, if two entities can be expressed by a string, their similarity can be quantified by the minimum number of operations required to

Research paper thumbnail of The maximum capability of a topological feature in link prediction

Link prediction aims to predict links of a network that are not directly visible, with profound a... more Link prediction aims to predict links of a network that are not directly visible, with profound applications in biological and social systems. Despite intensive utilization of the topological feature in this task, it is unclear to what extent a particular feature can be leveraged to infer missing links. Here, we show that the maximum capability of a topological feature follows a simple mathematical expression, which is independent of how an index gauges the feature. Hence, a family of indexes associated with one topological feature shares the same performance limit. A feature's capability is lifted in the supervised prediction, which in general gives rise to better results compared with unsupervised prediction. The universality of the pattern uncovered is empirically verified by 550 structurally diverse networks, which can be applied to feature selection and the analysis of network characteristics associated with a topological feature in link prediction.

Research paper thumbnail of Predicting Scientist Collaboration by Multiple Motif Features

IEEE Transactions on Computational Social Systems

Research paper thumbnail of Predicting future links with new nodes in temporal academic networks

Journal of Physics: Complexity

Most real-world systems evolve over time in which entities and the interactions between entities ... more Most real-world systems evolve over time in which entities and the interactions between entities are added and removed---new entities or relationships appear and old entities or relationships vanish. While most network evolutionary models can provide an iterative process for constructing global properties, they cannot capture the evolutionary mechanisms of real systems. Link prediction is hence proposed to predict future links which also can help us understand the evolution law of real systems. The aim of link prediction is to uncover missing links from known parts of the network or quantify the likelihood of the emergence of future links from current structures of the network. However, almost all existing studies ignored that old nodes tend to disappear and new nodes appear over time in real networks, especially in social networks. It is more challenging for link prediction since the new nodes do not have pre-existing structure information. To solve the temporal link prediction pro...

Research paper thumbnail of A novel similarity measure for mining missing links in long-path networks

Network information mining is the study of the network topology, which answers a large number of ... more Network information mining is the study of the network topology, which answers a large number of application-based questions towards the structural evolution and the function of a real system. For example, the questions can be related to how the real system evolves or how individuals interact with each other in social networks. Although the evolution of the real system may seem to be found regularly, capturing patterns on the whole process of the evolution is not trivial. Link prediction is one of the most important technologies in network information mining, which can help us understand the real system’s evolution law. Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures. Currently, widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures. However, these algorithms on highly sparse or long-path networks have poor per...

Research paper thumbnail of CasSeqGCN: Combining Network Structure and Temporal Sequence to Predict Information Cascades

ArXiv, 2021

One important task in the study of information cascade is to predict the future recipients of a m... more One important task in the study of information cascade is to predict the future recipients of a message given its past spreading trajectory. While the network structure serves as the backbone of the spreading, an accurate prediction can hardly be made without the knowledge of the dynamics on the network. The temporal information in the spreading sequence captures many hidden features, but predictions based on sequence alone have their limitations. Recent efforts start to explore the possibility of combining both the network structure and the temporal feature for a more accurate prediction. Nevertheless, it is still a challenge to efficiently and optimally associate these two interdependent factors. Here, we propose a new end-to-end prediction method CasSeqGCN in which the structure and temporal feature are simultaneously taken into account. A cascade is divided into multiple snapshots which record the network topology and the state of nodes. The graph convolutional network (GCN) is ...

Research paper thumbnail of A generalized linear threshold model for an improved description of the spreading dynamics

Chaos, 2020

Many spreading processes in our real-life can be considered as a complex contagion, and the linea... more Many spreading processes in our real-life can be considered as a complex contagion, and the linear threshold (LT) model is often applied as a very representative model for this mechanism. Despite its intensive usage, the LT model suffers several limitations in describing the time evolution of the spreading. First, the discrete-time step that captures the speed of the spreading is vaguely defined. Second, the synchronous updating rule makes the nodes infected in batches, which cannot take individual differences into account. Finally, the LT model is incompatible with existing models for the simple contagion. Here, we consider a generalized linear threshold (GLT) model for the continuous-time stochastic complex contagion process that can be efficiently implemented by the Gillespie algorithm. The time in this model has a clear mathematical definition, and the updating order is rigidly defined. We find that the traditional LT model systematically underestimates the spreading speed and t...

Research paper thumbnail of Epidemic Dynamics of Two-Pathogen Spreading for Pairwise Models

Mathematics

In the real world, pathogens do not exist in isolation. The transmission of one pathogen may be a... more In the real world, pathogens do not exist in isolation. The transmission of one pathogen may be affected by the presence of other pathogens, and certain pathogens generate multiple strains with different spreading features. Hence, the behavior of multi-pathogen transmission has attracted much attention in epidemiological research. In this paper, we use the pairwise approximation method to formulate two-pathogen models capturing cross-immunity, super-infection, and co-infection phenomena, in which each pathogen follows a susceptible-infected-susceptible (SIS) mechanism. For each model, we calculate the basic reproduction number and analyze the stability of equilibria, and discuss the differences from the mean-field approach. We demonstrate that simulations are in good agreement with the analytical results.

Research paper thumbnail of Peeking Strategy for Online News Diffusion Prediction Via Machine Learning

SSRN Electronic Journal, 2022

Research paper thumbnail of Measuring similarity in co-occurrence data using ego-networks

Chaos: An Interdisciplinary Journal of Nonlinear Science, 2020

The co-occurrence association is widely observed in many empirical data. Mining the information i... more The co-occurrence association is widely observed in many empirical data. Mining the information in co-occurrence data is essential for advancing our understanding of systems such as social networks, ecosystem, and brain network. Measuring similarity of entities is one of the important tasks, which can usually be achieved using a network-based approach. Here we show that traditional methods based on the aggregated network can bring unwanted in-directed relationship. To cope with this issue, we propose a similarity measure based on the ego network of each entity, which effectively considers the change of an entity's centrality from one ego network to another. The index proposed is easy to calculate and has a clear physical meaning. Using two different data sets, we compare the new index with other existing ones. We find that the new index outperforms the traditional network-based similarity measures, and it can sometimes surpass the embedding method. In the meanwhile, the measure by the new index is weakly correlated with those by other methods, hence providing a different dimension to quantify similarities in co-occurrence data. Altogether, our work makes an extension in the network-based similarity measure and can be potentially applied in several related tasks. The co-occurrence data refer to the type of data where multiple entities simultaneously occur in a single instance, such as the co-tags in folksonomy, the co-author of a scientific paper, co-activation of brain regions under a stimulus, and more. Measuring similarity between entities is fundamental to analyze co-occurrence data, allowing us to further explore social, brain or scientific systems. Using the ego network composed by the co-occurrence relationships as the backbone, we proposed a network-based similarity measure. The new approach outperforms traditional ones and can sometimes surpass the machine learning based embedding method, providing a good tool for tasks such as community detection, link prediction, recommendation. I. INTRODUCTION Many tasks in computer science, such as knowledge management 1,2 , community detection 3,4 , nature language processing 5,6 and link prediction 7,8 , require the measure of similarity between two entities. This can be achieved via different methods based on the nature of the problem analyzed. The similarity would be most straightforward to calculate if the features of the two entities are already mapped into a high dimensional space. Nevertheless, the embedding itself is usually a hard problem and in many cases without a clear physical explanation. Hence, other methods that do not directly use feature vectors are also widely used because of their simplicity and interpretability. For example, if two entities can be expressed by a string, their similarity can be quantified by the minimum number of operations required to

Research paper thumbnail of The maximum capability of a topological feature in link prediction

Link prediction aims to predict links of a network that are not directly visible, with profound a... more Link prediction aims to predict links of a network that are not directly visible, with profound applications in biological and social systems. Despite intensive utilization of the topological feature in this task, it is unclear to what extent a particular feature can be leveraged to infer missing links. Here, we show that the maximum capability of a topological feature follows a simple mathematical expression, which is independent of how an index gauges the feature. Hence, a family of indexes associated with one topological feature shares the same performance limit. A feature's capability is lifted in the supervised prediction, which in general gives rise to better results compared with unsupervised prediction. The universality of the pattern uncovered is empirically verified by 550 structurally diverse networks, which can be applied to feature selection and the analysis of network characteristics associated with a topological feature in link prediction.

Research paper thumbnail of Predicting Scientist Collaboration by Multiple Motif Features

IEEE Transactions on Computational Social Systems

Research paper thumbnail of Predicting future links with new nodes in temporal academic networks

Journal of Physics: Complexity

Most real-world systems evolve over time in which entities and the interactions between entities ... more Most real-world systems evolve over time in which entities and the interactions between entities are added and removed---new entities or relationships appear and old entities or relationships vanish. While most network evolutionary models can provide an iterative process for constructing global properties, they cannot capture the evolutionary mechanisms of real systems. Link prediction is hence proposed to predict future links which also can help us understand the evolution law of real systems. The aim of link prediction is to uncover missing links from known parts of the network or quantify the likelihood of the emergence of future links from current structures of the network. However, almost all existing studies ignored that old nodes tend to disappear and new nodes appear over time in real networks, especially in social networks. It is more challenging for link prediction since the new nodes do not have pre-existing structure information. To solve the temporal link prediction pro...

Research paper thumbnail of A novel similarity measure for mining missing links in long-path networks

Network information mining is the study of the network topology, which answers a large number of ... more Network information mining is the study of the network topology, which answers a large number of application-based questions towards the structural evolution and the function of a real system. For example, the questions can be related to how the real system evolves or how individuals interact with each other in social networks. Although the evolution of the real system may seem to be found regularly, capturing patterns on the whole process of the evolution is not trivial. Link prediction is one of the most important technologies in network information mining, which can help us understand the real system’s evolution law. Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures. Currently, widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures. However, these algorithms on highly sparse or long-path networks have poor per...

Research paper thumbnail of CasSeqGCN: Combining Network Structure and Temporal Sequence to Predict Information Cascades

ArXiv, 2021

One important task in the study of information cascade is to predict the future recipients of a m... more One important task in the study of information cascade is to predict the future recipients of a message given its past spreading trajectory. While the network structure serves as the backbone of the spreading, an accurate prediction can hardly be made without the knowledge of the dynamics on the network. The temporal information in the spreading sequence captures many hidden features, but predictions based on sequence alone have their limitations. Recent efforts start to explore the possibility of combining both the network structure and the temporal feature for a more accurate prediction. Nevertheless, it is still a challenge to efficiently and optimally associate these two interdependent factors. Here, we propose a new end-to-end prediction method CasSeqGCN in which the structure and temporal feature are simultaneously taken into account. A cascade is divided into multiple snapshots which record the network topology and the state of nodes. The graph convolutional network (GCN) is ...

Research paper thumbnail of A generalized linear threshold model for an improved description of the spreading dynamics

Chaos, 2020

Many spreading processes in our real-life can be considered as a complex contagion, and the linea... more Many spreading processes in our real-life can be considered as a complex contagion, and the linear threshold (LT) model is often applied as a very representative model for this mechanism. Despite its intensive usage, the LT model suffers several limitations in describing the time evolution of the spreading. First, the discrete-time step that captures the speed of the spreading is vaguely defined. Second, the synchronous updating rule makes the nodes infected in batches, which cannot take individual differences into account. Finally, the LT model is incompatible with existing models for the simple contagion. Here, we consider a generalized linear threshold (GLT) model for the continuous-time stochastic complex contagion process that can be efficiently implemented by the Gillespie algorithm. The time in this model has a clear mathematical definition, and the updating order is rigidly defined. We find that the traditional LT model systematically underestimates the spreading speed and t...