Vwani Roychowdhury - Academia.edu (original) (raw)
Papers by Vwani Roychowdhury
A large number of Independent Component Analysis (ICA) algorithms are based on the minimization o... more A large number of Independent Component Analysis (ICA) algorithms are based on the minimization of the statistical mutual information between the reconstructed signals, in order to achieve the source separation. While it has been demonstrated that a global minimum of such cost function will result in the separation of the statistically independent sources, it is an open problem to show that such cost function has a unique minimum (up to scaling and permutations of the signals). Without such result, there is no guarantee that the related ICA algorithms will not get stuck in local minima, and hence, return signals that are statistically dependent. We derive a novel result showing that for the special case of mixtures of two independent and identically distributed (i.i.d.) signals with symmetric, nearly gaussian probability density functions, such objective function has no local minima. This result is shown to yield a useful extension of the well-known entropy power inequality.
Corr, Feb 14, 2008
We initiate the study of an interesting aspect of sponsored search advertising, namely the conseq... more We initiate the study of an interesting aspect of sponsored search advertising, namely the consequences of broad match-a feature where an ad of an advertiser can be mapped to a broader range of relevant queries, and not necessarily to the particular keyword(s) that ad is associated with. Starting with a very natural setting for strategies available to the advertisers, and via a careful look through the algorithmic lens, we first propose solution concepts for the game originating from the strategic behavior of advertisers as they try to optimize their budget allocation across various keywords. Next, we consider two broad match scenarios based on factors such as information asymmetry between advertisers and the auctioneer, and the extent of auctioneer's control on the budget splitting. In the first scenario, the advertisers have the full information about broad match and relevant parameters, and can reapportion their own budgets to utilize the extra information; in particular, the auctioneer has no direct control over budget splitting. We show that, the same broad match may lead to different equilibria, one leading to a revenue improvement, whereas another to a revenue loss. This leaves the auctioneer in a dilemma - whether to broad-match or not. This motivates us to consider another broad match scenario, where the advertisers have information only about the current scenario, and the allocation of the budgets unspent in the current scenario is in the control of the auctioneer. We observe that the auctioneer can always improve his revenue by judiciously using broad match. Thus, information seems to be a double-edged sword for the auctioneer.
There has been a rich interplay in recent years between (i) empirical investigations of real worl... more There has been a rich interplay in recent years between (i) empirical investigations of real world dynamic networks, (ii) analytical modeling of the microscopic mechanisms that drive the emergence of such networks, and (iii) harnessing of these mechanisms to either manipulate existing networks, or engineer new networks for specific tasks. We continue in this vein, and study the deletion phenomenon in the web by following two different sets of web-sites (each comprising more than 150,000 pages) over a one-year period. Empirical data show that there is a significant deletion component in the underlying web networks, but the deletion process is not uniform. This motivates us to introduce a new mechanism of preferential survival (PS), where nodes are removed according to a degree-dependent deletion kernel. We use the mean-field rate equation approach to study a general dynamic model driven by Preferential Attachment (PA), Double PA (DPA), and a tunable PS, where c nodes (c<1) are deleted per node added to the network, and verify our predictions via large-scale simulations. One of our results shows that, unlike in the case of uniform deletion, the PS kernel when coupled with the standard PA mechanism, can lead to heavy-tailed power law networks even in the presence of extreme turnover in the network. Moreover, a weak DPA mechanism, coupled with PS, can help make the network even more heavy-tailed, especially in the limit when deletion and insertion rates are almost equal, and the overall network growth is minimal. The dynamics reported in this work can be used to design and engineer stable ad hoc networks and explain the stability of the power law exponents observed in real-world networks.
Proceedings of 1994 IEEE International Symposium on Information Theory, 1994
Abstmct -We propose a novel approach for a rigorous analysis of the nonlinear asymmetric dynamics... more Abstmct -We propose a novel approach for a rigorous analysis of the nonlinear asymmetric dynamics of Linsker's unsupervised Hebbian learning network. The results provide for the Arst time comprehensive explanations of the origin of the various structured connection patterns and of the roles of the different system parameters of the model. Our theoretical predictions are corroborated by numerical simulations.
Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004., 2004
Living organisms are complex, and are typically composed of many interacting subsystems. In order... more Living organisms are complex, and are typically composed of many interacting subsystems. In order to understand the complex genetic networks present in the whole cell, it is of crucial importance to first understand the dynamic behavior of modular genetic circuits. Recently a few subsystems on the genetic level, namely, genetic repressilators or oscillators, and bi-stable gene circuits, have been constructed and manipulated. In the former case, while mathematical model predicts simple oscillations, it has been observed that period varies from one oscillation to another considerably. Realizing that laboratory biochemical experiments take place in space in a distributed way, we study coupled repressilators. Let the repressilator of Elowitz and Leibler [1] be described by the following set of 6 ordinary differential equations:
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2005
We adopt a system theoretic approach and explore the model of retinal ganglion cells as linear fi... more We adopt a system theoretic approach and explore the model of retinal ganglion cells as linear filters followed by a maximum-likelihood Bayesian predictor. We evaluate the model by using cross-validation, i.e., first the model parameters are estimated using a training set, and then the prediction error is computed (by comparing the stochastic rate predicted by the model with the rate code of the response) for a test set. As in system identification theory, we present spatially uniform stimuli to the retina, whose temporal intensity is drawn independently from a Gaussian distribution, and we simultaneously record the spike trains from multiple neurons. The optimal linear filter for each cell is obtained by maximizing the mutual information between the filtered stimulus values and the output of the cell (as measured in terms of a stochastic rate code). Our results show that the model presented in this paper performs well on the test set, and it outperforms the identity Bayesian model ...
Independent Component Analysis, 2000
We introduce a novel approach to the blind signal sep- aration (BSS) problem that is capable of j... more We introduce a novel approach to the blind signal sep- aration (BSS) problem that is capable of jointly estimating the probability density function (pdf) of the source signals and the unmixing matrix. We demonstrate that, using a kernel density estimation based Projection Pursuit (PP) al- gorithm, it is possible to extract, from instantaneous mix- tures, independent sources that are arbitrarily
Lecture Notes in Computer Science, 2007
One natural constraint in the sponsored search advertising framework arises from the fact that th... more One natural constraint in the sponsored search advertising framework arises from the fact that there is a limit on the number of available slots, especially for the popular keywords, and as a result, a significant pool of advertisers are left out. We study the emergence of diversification in the adword market triggered by such capacity constraints in the sense that new market mechanisms, as well as, new for-profit agents are likely to emerge to combat or to make profit from the opportunities created by shortages in ad-space inventory. We propose a model where the additional capacity is provided by for-profit agents (or, mediators), who compete for slots in the original auction, draw traffic, and run their own sub-auctions. The quality of the additional capacity provided by a mediator is measured by its fitness factor. We compute revenues and payoffs for all the different parties at a symmetric Nash equilibrium (SNE) when the mediator-based model is operated by a mechanism currently being used by Google and Yahoo!, and then compare these numbers with those obtained at a corresponding SNE for the same mechanism, but without any mediators involved in the auctions. Such calculations allow us to determine the value of the additional capacity. Our results show that the revenue of the auctioneer, as well as the social value (i.e. efficiency ), always increase when mediators are involved; moreover even the payoffs of all the bidders will increase if the mediator has a high enough fitness. Thus, our analysis indicates that there are significant opportunities for diversification in the internet economy and we should expect it to continue to develop richer structure, with room for different types of agents and mechanisms to coexist.
Computing Research Repository, 2007
We investigate market forces that would lead to the emergence of new classes of players in the sp... more We investigate market forces that would lead to the emergence of new classes of players in the spon- sored search market. We report a 3-fold diversification triggered by two inherent features of the spon- sored search market, namely, capacity constraints and collusion-vulnerability of current mechanisms. In the first scenario, we present a comparative study of two models motivated by capacity
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2005
There is considerable recent interest in both (i) modelling the retinal ganglion cells, so that t... more There is considerable recent interest in both (i) modelling the retinal ganglion cells, so that the models can generate output that approximates the actual response of the retina (such models will help design retinal prosthetics); and (ii) understanding how relevant information is encoded in the spike patterns generated by the ganglion cells (these neuronal codes will help understand how the brain analyzes visual scenes). Since the signals (as captured by ISI) are fundamentally stochastic, any modelling or analysis tool will have to track, and make assumptions about, the fluctuations or noise inherently present in these signals. Even though there have been recent work claiming that the fluctuations are fractal in nature, showing long-range dependencies, almost all modelling and analysis work continue to assume Poisson fluctuations. The widespread use of the Poisson model is partly for the sake of convenience, and partly due to the fact that those claiming on fractal nature of ISI ar...
Proceedings of the National Academy of Sciences, 2003
High-dimensional data sets generated by high-throughput technologies, such as DNA microarray, are... more High-dimensional data sets generated by high-throughput technologies, such as DNA microarray, are often the outputs of complex networked systems driven by hidden regulatory signals. Traditional statistical methods for computing low-dimensional or hidden representations of these data sets, such as principal component analysis and independent component analysis, ignore the underlying network structures and provide decompositions based purely on a priori statistical constraints on the computed component signals. The resulting decomposition thus provides a phenomenological model for the observed data and does not necessarily contain physically or biologically meaningful signals. Here, we develop a method, called network component analysis, for uncovering hidden regulatory signals from outputs of networked systems, when only a partial knowledge of the underlying network topology is available. The a priori network structure information is first tested for compliance with a set of identifiability criteria. For networks that satisfy the criteria, the signals from the regulatory nodes and their strengths of influence on each output node can be faithfully reconstructed. This method is first validated experimentally by using the absorbance spectra of a network of various hemoglobin species. The method is then applied to microarray data generated from yeast Saccharamyces cerevisiae and the activities of various transcription factors during cell cycle are reconstructed by using recently discovered connectivity information for the underlying transcriptional regulatory networks.
Proceedings of the National Academy of Sciences, 2004
Cells adjust gene expression profiles in response to environmental and physiological changes thro... more Cells adjust gene expression profiles in response to environmental and physiological changes through a series of signal transduction pathways. Upon activation or deactivation, the terminal regulators bind to or dissociate from DNA, respectively, and modulate transcriptional activities on particular promoters. Traditionally, individual reporter genes have been used to detect the activity of the transcription factors. This approach works well for simple, non-overlapping transcription pathways. For complex transcriptional networks, more sophisticated tools are required to deconvolute the contribution of each regulator. Here, we demonstrate the utility of network component analysis in determining multiple transcription factor activities based on transcriptome profiles and available connectivity information regarding network connectivity. We used Escherichia coli carbon source transition from glucose to acetate as a model system. Key results from this analysis were either consistent with physiology or verified by using independent measurements.
Proceedings of the National Academy of Sciences, 2008
We use sequential large-scale crawl data to empirically investigate and validate the dynamics tha... more We use sequential large-scale crawl data to empirically investigate and validate the dynamics that underlie the evolution of the structure of the web. We find that the overall structure of the web is defined by an intricate interplay between experience or entitlement of the pages (as measured by the number of inbound hyperlinks a page already has), inherent talent or fitness of the pages (as measured by the likelihood that someone visiting the page would give a hyperlink to it), and the continual high rates of birth and death of pages on the web. We find that the web is conservative in judging talent and the overall fitness distribution is exponential, showing low variability. The small variance in talent, however, is enough to lead to experience distributions with high variance: The preferential attachment mechanism amplifies these small biases and leads to heavy-tailed power-law (PL) inbound degree distributions over all pages, as well as over pages that are of the same age. The balancing act between experience and talent on the web allows newly introduced pages with novel and interesting content to grow quickly and surpass older pages. In this regard, it is much like what we observe in high-mobility and meritocratic societies: People with entitlement continue to have access to the best resources, but there is just enough screening for fitness that allows for talented winners to emerge and join the ranks of the leaders. Finally, we show that the fitness estimates have potential practical applications in ranking query results.
Physical Review E, 2004
Unlike the well-studied models of growing networks, where the dominant dynamics consist of insert... more Unlike the well-studied models of growing networks, where the dominant dynamics consist of insertions of new nodes and connections, and rewiring of existing links, we study ad hoc networks, where one also has to contend with rapid and random deletions of existing nodes (and, hence, the associated links). We first show that dynamics based only on the well-known preferential attachments of new nodes do not lead to a sufficiently heavy-tailed degree distribution in ad hoc networks. In particular, the magnitude of the power-law exponent increases rapidly (from 3) with the deletion rate, becoming ∞ in the limit of equal insertion and deletion rates. We then introduce a local and universal compensatory rewiring dynamic, and show that even in the limit of equal insertion and deletion rates true scale-free structures emerge, where the degree distributions obey a power-law with a tunable exponent, which can be made arbitrarily close to -2. These results provide the first-known evidence of emergence of scale-free degree distributions purely due to dynamics, i.e., in networks of almost constant average size. The dynamics discovered in this paper can be used to craft protocols for designing highly dynamic Peer-to-Peer networks, and also to account for the power-law exponents observed in existing popular services.
Physical Review E, 2005
This paper develops a framework for analyzing and designing dynamic networks comprising different... more This paper develops a framework for analyzing and designing dynamic networks comprising different classes of nodes that coexist and interact in one shared environment. We consider ad hoc (i.e., nodes can leave the network unannounced, and no node has any global knowledge about the class identities of other nodes) preferentially grown networks, where different classes of nodes are characterized by different sets of local parameters used in the stochastic dynamics that all nodes in the network execute. We show that multiple scale-free structures, one within each class of nodes, and with tunable power-law exponents (as determined by the sets of parameters characterizing each class) emerge naturally in our model. Moreover, the coexistence of the scale-free structures of the different classes of nodes can be captured by succinct phase diagrams, which show a rich set of structures, including stable regions where different classes coexist in heavy-tailed (i.e., exponent is between 2 and 3) and light-tailed (i.e., exponent is > 3) states, and sharp phase transitions. The topology of the emergent networks is also shown to display a complex structure, akin to the distribution of different components of an alloyed material; e.g., nodes with a light-tailed scale-free structure get embedded to the outside of the network, and have most of its edges connected to nodes belonging to the class with a heavy-tailed distribution. Finally, we show how the dynamics formulated in this paper will serve as an essential part of ad-hoc networking protocols, which can lead to the formation of robust and efficiently searchable networks (including, the well-known Peer-To-Peer (P2P) networks) even under very dynamic conditions.
Physical Review E, 2006
Due to the ubiquity of time series with long-range correlation in many areas of science and engin... more Due to the ubiquity of time series with long-range correlation in many areas of science and engineering, analysis and modeling of such data is an important problem. While the field seems to be mature, three major issues have not been satisfactorily resolved. ͑i͒ Many methods have been proposed to assess long-range correlation in time series. Under what circumstances do they yield consistent results? ͑ii͒ The mathematical theory of long-range correlation concerns the behavior of the correlation of the time series for very large times. A measured time series is finite, however. How can we relate the fractal scaling break at a specific time scale to important parameters of the data? ͑iii͒ An important technique in assessing long-range correlation in a time series is to construct a random walk process from the data, under the assumption that the data are like a stationary noise process. Due to the difficulty in determining whether a time series is stationary or not, however, one cannot be 100% sure whether the data should be treated as a noise or a random walk process. Is there any penalty if the data are interpreted as a noise process while in fact they are a random walk process, and vice versa? In this paper, we seek to gain important insights into these issues by examining three model systems, the autoregressive process of order 1, on-off intermittency, and Lévy motions, and considering an important engineering problem, target detection within sea-clutter radar returns. We also provide a few rules of thumb to safeguard against misinterpretations of long-range correlation in a time series, and discuss relevance of this study to pattern recognition.
A large number of Independent Component Analysis (ICA) algorithms are based on the minimization o... more A large number of Independent Component Analysis (ICA) algorithms are based on the minimization of the statistical mutual information between the reconstructed signals, in order to achieve the source separation. While it has been demonstrated that a global minimum of such cost function will result in the separation of the statistically independent sources, it is an open problem to show that such cost function has a unique minimum (up to scaling and permutations of the signals). Without such result, there is no guarantee that the related ICA algorithms will not get stuck in local minima, and hence, return signals that are statistically dependent. We derive a novel result showing that for the special case of mixtures of two independent and identically distributed (i.i.d.) signals with symmetric, nearly gaussian probability density functions, such objective function has no local minima. This result is shown to yield a useful extension of the well-known entropy power inequality.
Corr, Feb 14, 2008
We initiate the study of an interesting aspect of sponsored search advertising, namely the conseq... more We initiate the study of an interesting aspect of sponsored search advertising, namely the consequences of broad match-a feature where an ad of an advertiser can be mapped to a broader range of relevant queries, and not necessarily to the particular keyword(s) that ad is associated with. Starting with a very natural setting for strategies available to the advertisers, and via a careful look through the algorithmic lens, we first propose solution concepts for the game originating from the strategic behavior of advertisers as they try to optimize their budget allocation across various keywords. Next, we consider two broad match scenarios based on factors such as information asymmetry between advertisers and the auctioneer, and the extent of auctioneer's control on the budget splitting. In the first scenario, the advertisers have the full information about broad match and relevant parameters, and can reapportion their own budgets to utilize the extra information; in particular, the auctioneer has no direct control over budget splitting. We show that, the same broad match may lead to different equilibria, one leading to a revenue improvement, whereas another to a revenue loss. This leaves the auctioneer in a dilemma - whether to broad-match or not. This motivates us to consider another broad match scenario, where the advertisers have information only about the current scenario, and the allocation of the budgets unspent in the current scenario is in the control of the auctioneer. We observe that the auctioneer can always improve his revenue by judiciously using broad match. Thus, information seems to be a double-edged sword for the auctioneer.
There has been a rich interplay in recent years between (i) empirical investigations of real worl... more There has been a rich interplay in recent years between (i) empirical investigations of real world dynamic networks, (ii) analytical modeling of the microscopic mechanisms that drive the emergence of such networks, and (iii) harnessing of these mechanisms to either manipulate existing networks, or engineer new networks for specific tasks. We continue in this vein, and study the deletion phenomenon in the web by following two different sets of web-sites (each comprising more than 150,000 pages) over a one-year period. Empirical data show that there is a significant deletion component in the underlying web networks, but the deletion process is not uniform. This motivates us to introduce a new mechanism of preferential survival (PS), where nodes are removed according to a degree-dependent deletion kernel. We use the mean-field rate equation approach to study a general dynamic model driven by Preferential Attachment (PA), Double PA (DPA), and a tunable PS, where c nodes (c<1) are deleted per node added to the network, and verify our predictions via large-scale simulations. One of our results shows that, unlike in the case of uniform deletion, the PS kernel when coupled with the standard PA mechanism, can lead to heavy-tailed power law networks even in the presence of extreme turnover in the network. Moreover, a weak DPA mechanism, coupled with PS, can help make the network even more heavy-tailed, especially in the limit when deletion and insertion rates are almost equal, and the overall network growth is minimal. The dynamics reported in this work can be used to design and engineer stable ad hoc networks and explain the stability of the power law exponents observed in real-world networks.
Proceedings of 1994 IEEE International Symposium on Information Theory, 1994
Abstmct -We propose a novel approach for a rigorous analysis of the nonlinear asymmetric dynamics... more Abstmct -We propose a novel approach for a rigorous analysis of the nonlinear asymmetric dynamics of Linsker's unsupervised Hebbian learning network. The results provide for the Arst time comprehensive explanations of the origin of the various structured connection patterns and of the roles of the different system parameters of the model. Our theoretical predictions are corroborated by numerical simulations.
Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004., 2004
Living organisms are complex, and are typically composed of many interacting subsystems. In order... more Living organisms are complex, and are typically composed of many interacting subsystems. In order to understand the complex genetic networks present in the whole cell, it is of crucial importance to first understand the dynamic behavior of modular genetic circuits. Recently a few subsystems on the genetic level, namely, genetic repressilators or oscillators, and bi-stable gene circuits, have been constructed and manipulated. In the former case, while mathematical model predicts simple oscillations, it has been observed that period varies from one oscillation to another considerably. Realizing that laboratory biochemical experiments take place in space in a distributed way, we study coupled repressilators. Let the repressilator of Elowitz and Leibler [1] be described by the following set of 6 ordinary differential equations:
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2005
We adopt a system theoretic approach and explore the model of retinal ganglion cells as linear fi... more We adopt a system theoretic approach and explore the model of retinal ganglion cells as linear filters followed by a maximum-likelihood Bayesian predictor. We evaluate the model by using cross-validation, i.e., first the model parameters are estimated using a training set, and then the prediction error is computed (by comparing the stochastic rate predicted by the model with the rate code of the response) for a test set. As in system identification theory, we present spatially uniform stimuli to the retina, whose temporal intensity is drawn independently from a Gaussian distribution, and we simultaneously record the spike trains from multiple neurons. The optimal linear filter for each cell is obtained by maximizing the mutual information between the filtered stimulus values and the output of the cell (as measured in terms of a stochastic rate code). Our results show that the model presented in this paper performs well on the test set, and it outperforms the identity Bayesian model ...
Independent Component Analysis, 2000
We introduce a novel approach to the blind signal sep- aration (BSS) problem that is capable of j... more We introduce a novel approach to the blind signal sep- aration (BSS) problem that is capable of jointly estimating the probability density function (pdf) of the source signals and the unmixing matrix. We demonstrate that, using a kernel density estimation based Projection Pursuit (PP) al- gorithm, it is possible to extract, from instantaneous mix- tures, independent sources that are arbitrarily
Lecture Notes in Computer Science, 2007
One natural constraint in the sponsored search advertising framework arises from the fact that th... more One natural constraint in the sponsored search advertising framework arises from the fact that there is a limit on the number of available slots, especially for the popular keywords, and as a result, a significant pool of advertisers are left out. We study the emergence of diversification in the adword market triggered by such capacity constraints in the sense that new market mechanisms, as well as, new for-profit agents are likely to emerge to combat or to make profit from the opportunities created by shortages in ad-space inventory. We propose a model where the additional capacity is provided by for-profit agents (or, mediators), who compete for slots in the original auction, draw traffic, and run their own sub-auctions. The quality of the additional capacity provided by a mediator is measured by its fitness factor. We compute revenues and payoffs for all the different parties at a symmetric Nash equilibrium (SNE) when the mediator-based model is operated by a mechanism currently being used by Google and Yahoo!, and then compare these numbers with those obtained at a corresponding SNE for the same mechanism, but without any mediators involved in the auctions. Such calculations allow us to determine the value of the additional capacity. Our results show that the revenue of the auctioneer, as well as the social value (i.e. efficiency ), always increase when mediators are involved; moreover even the payoffs of all the bidders will increase if the mediator has a high enough fitness. Thus, our analysis indicates that there are significant opportunities for diversification in the internet economy and we should expect it to continue to develop richer structure, with room for different types of agents and mechanisms to coexist.
Computing Research Repository, 2007
We investigate market forces that would lead to the emergence of new classes of players in the sp... more We investigate market forces that would lead to the emergence of new classes of players in the spon- sored search market. We report a 3-fold diversification triggered by two inherent features of the spon- sored search market, namely, capacity constraints and collusion-vulnerability of current mechanisms. In the first scenario, we present a comparative study of two models motivated by capacity
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2005
There is considerable recent interest in both (i) modelling the retinal ganglion cells, so that t... more There is considerable recent interest in both (i) modelling the retinal ganglion cells, so that the models can generate output that approximates the actual response of the retina (such models will help design retinal prosthetics); and (ii) understanding how relevant information is encoded in the spike patterns generated by the ganglion cells (these neuronal codes will help understand how the brain analyzes visual scenes). Since the signals (as captured by ISI) are fundamentally stochastic, any modelling or analysis tool will have to track, and make assumptions about, the fluctuations or noise inherently present in these signals. Even though there have been recent work claiming that the fluctuations are fractal in nature, showing long-range dependencies, almost all modelling and analysis work continue to assume Poisson fluctuations. The widespread use of the Poisson model is partly for the sake of convenience, and partly due to the fact that those claiming on fractal nature of ISI ar...
Proceedings of the National Academy of Sciences, 2003
High-dimensional data sets generated by high-throughput technologies, such as DNA microarray, are... more High-dimensional data sets generated by high-throughput technologies, such as DNA microarray, are often the outputs of complex networked systems driven by hidden regulatory signals. Traditional statistical methods for computing low-dimensional or hidden representations of these data sets, such as principal component analysis and independent component analysis, ignore the underlying network structures and provide decompositions based purely on a priori statistical constraints on the computed component signals. The resulting decomposition thus provides a phenomenological model for the observed data and does not necessarily contain physically or biologically meaningful signals. Here, we develop a method, called network component analysis, for uncovering hidden regulatory signals from outputs of networked systems, when only a partial knowledge of the underlying network topology is available. The a priori network structure information is first tested for compliance with a set of identifiability criteria. For networks that satisfy the criteria, the signals from the regulatory nodes and their strengths of influence on each output node can be faithfully reconstructed. This method is first validated experimentally by using the absorbance spectra of a network of various hemoglobin species. The method is then applied to microarray data generated from yeast Saccharamyces cerevisiae and the activities of various transcription factors during cell cycle are reconstructed by using recently discovered connectivity information for the underlying transcriptional regulatory networks.
Proceedings of the National Academy of Sciences, 2004
Cells adjust gene expression profiles in response to environmental and physiological changes thro... more Cells adjust gene expression profiles in response to environmental and physiological changes through a series of signal transduction pathways. Upon activation or deactivation, the terminal regulators bind to or dissociate from DNA, respectively, and modulate transcriptional activities on particular promoters. Traditionally, individual reporter genes have been used to detect the activity of the transcription factors. This approach works well for simple, non-overlapping transcription pathways. For complex transcriptional networks, more sophisticated tools are required to deconvolute the contribution of each regulator. Here, we demonstrate the utility of network component analysis in determining multiple transcription factor activities based on transcriptome profiles and available connectivity information regarding network connectivity. We used Escherichia coli carbon source transition from glucose to acetate as a model system. Key results from this analysis were either consistent with physiology or verified by using independent measurements.
Proceedings of the National Academy of Sciences, 2008
We use sequential large-scale crawl data to empirically investigate and validate the dynamics tha... more We use sequential large-scale crawl data to empirically investigate and validate the dynamics that underlie the evolution of the structure of the web. We find that the overall structure of the web is defined by an intricate interplay between experience or entitlement of the pages (as measured by the number of inbound hyperlinks a page already has), inherent talent or fitness of the pages (as measured by the likelihood that someone visiting the page would give a hyperlink to it), and the continual high rates of birth and death of pages on the web. We find that the web is conservative in judging talent and the overall fitness distribution is exponential, showing low variability. The small variance in talent, however, is enough to lead to experience distributions with high variance: The preferential attachment mechanism amplifies these small biases and leads to heavy-tailed power-law (PL) inbound degree distributions over all pages, as well as over pages that are of the same age. The balancing act between experience and talent on the web allows newly introduced pages with novel and interesting content to grow quickly and surpass older pages. In this regard, it is much like what we observe in high-mobility and meritocratic societies: People with entitlement continue to have access to the best resources, but there is just enough screening for fitness that allows for talented winners to emerge and join the ranks of the leaders. Finally, we show that the fitness estimates have potential practical applications in ranking query results.
Physical Review E, 2004
Unlike the well-studied models of growing networks, where the dominant dynamics consist of insert... more Unlike the well-studied models of growing networks, where the dominant dynamics consist of insertions of new nodes and connections, and rewiring of existing links, we study ad hoc networks, where one also has to contend with rapid and random deletions of existing nodes (and, hence, the associated links). We first show that dynamics based only on the well-known preferential attachments of new nodes do not lead to a sufficiently heavy-tailed degree distribution in ad hoc networks. In particular, the magnitude of the power-law exponent increases rapidly (from 3) with the deletion rate, becoming ∞ in the limit of equal insertion and deletion rates. We then introduce a local and universal compensatory rewiring dynamic, and show that even in the limit of equal insertion and deletion rates true scale-free structures emerge, where the degree distributions obey a power-law with a tunable exponent, which can be made arbitrarily close to -2. These results provide the first-known evidence of emergence of scale-free degree distributions purely due to dynamics, i.e., in networks of almost constant average size. The dynamics discovered in this paper can be used to craft protocols for designing highly dynamic Peer-to-Peer networks, and also to account for the power-law exponents observed in existing popular services.
Physical Review E, 2005
This paper develops a framework for analyzing and designing dynamic networks comprising different... more This paper develops a framework for analyzing and designing dynamic networks comprising different classes of nodes that coexist and interact in one shared environment. We consider ad hoc (i.e., nodes can leave the network unannounced, and no node has any global knowledge about the class identities of other nodes) preferentially grown networks, where different classes of nodes are characterized by different sets of local parameters used in the stochastic dynamics that all nodes in the network execute. We show that multiple scale-free structures, one within each class of nodes, and with tunable power-law exponents (as determined by the sets of parameters characterizing each class) emerge naturally in our model. Moreover, the coexistence of the scale-free structures of the different classes of nodes can be captured by succinct phase diagrams, which show a rich set of structures, including stable regions where different classes coexist in heavy-tailed (i.e., exponent is between 2 and 3) and light-tailed (i.e., exponent is > 3) states, and sharp phase transitions. The topology of the emergent networks is also shown to display a complex structure, akin to the distribution of different components of an alloyed material; e.g., nodes with a light-tailed scale-free structure get embedded to the outside of the network, and have most of its edges connected to nodes belonging to the class with a heavy-tailed distribution. Finally, we show how the dynamics formulated in this paper will serve as an essential part of ad-hoc networking protocols, which can lead to the formation of robust and efficiently searchable networks (including, the well-known Peer-To-Peer (P2P) networks) even under very dynamic conditions.
Physical Review E, 2006
Due to the ubiquity of time series with long-range correlation in many areas of science and engin... more Due to the ubiquity of time series with long-range correlation in many areas of science and engineering, analysis and modeling of such data is an important problem. While the field seems to be mature, three major issues have not been satisfactorily resolved. ͑i͒ Many methods have been proposed to assess long-range correlation in time series. Under what circumstances do they yield consistent results? ͑ii͒ The mathematical theory of long-range correlation concerns the behavior of the correlation of the time series for very large times. A measured time series is finite, however. How can we relate the fractal scaling break at a specific time scale to important parameters of the data? ͑iii͒ An important technique in assessing long-range correlation in a time series is to construct a random walk process from the data, under the assumption that the data are like a stationary noise process. Due to the difficulty in determining whether a time series is stationary or not, however, one cannot be 100% sure whether the data should be treated as a noise or a random walk process. Is there any penalty if the data are interpreted as a noise process while in fact they are a random walk process, and vice versa? In this paper, we seek to gain important insights into these issues by examining three model systems, the autoregressive process of order 1, on-off intermittency, and Lévy motions, and considering an important engineering problem, target detection within sea-clutter radar returns. We also provide a few rules of thumb to safeguard against misinterpretations of long-range correlation in a time series, and discuss relevance of this study to pattern recognition.