Josenildo da Silva - Academia.edu (original) (raw)
Uploads
Papers by Josenildo da Silva
We present DPD-HE, a privacy preserving algorithm for mining time series data. We assume data is ... more We present DPD-HE, a privacy preserving algorithm for mining time series data. We assume data is split among several sites. The problem is to find all frequent subsequences of time series without revealing local data to any site. Our solution exploit density estimate and secure multiparty computation techniques to provide privacy to a given extent.
Engineering Applications of Artificial Intelligence, 2005
Multi-Agent Systems (MAS) offer an architecture for distributed problem solving. Distributed Data... more Multi-Agent Systems (MAS) offer an architecture for distributed problem solving. Distributed Data Mining (DDM) algorithms focus on one class of such distributed problem solving tasks-analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context of multiagents systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem solving scenarios. It reviews algorithms for distributed clustering, including privacypreserving ones. It describes challenges for clustering in sensor-network environments, potential shortcomings of the current algorithms, and future work accordingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering.
Engineering Applications of Artificial Intelligence, 2006
In this paper we address confidentiality issues in distributed data clustering, particularly the ... more In this paper we address confidentiality issues in distributed data clustering, particularly the inference problem. We present KDEC-S algorithm for distributed data clustering, which is shown to provide mining results while preserving confidentiality of original data. We also present a confidentiality framework with which we can state the confidentiality level of KDEC-S. The underlying idea of KDEC-S is to use an approximation of density estimation such that the original data cannot be reconstructed to a given extent.
A growing number of applications in distributed environment involve very large data sets that are... more A growing number of applications in distributed environment involve very large data sets that are inherently distributed among a large number of autonomous sources over a network. The demand to extend data mining technology to such distributed data sets has motivated the development of several approaches to distributed data mining and knowledge discovery, of which only a few make use of agents. We briefly review existing approaches and argue for the potential added value of using agent technology in the domain of knowledge discovery, discussing both issues and benefits. We also propose an approach to distributed data clustering, outline its agent-oriented implementation, and examine potential privacy violating attacks which agents may incur.
In this paper we address confidentiality issues in distributed data clustering, particularly the ... more In this paper we address confidentiality issues in distributed data clustering, particularly the inference problem. We present a measure of inference risk as a function of reconstruction precision and number of colluders in a distributed data mining group. We also present KDEC-S, which is a distributed clustering algorithm designed to provide mining results while preserving confidentiality of original data. The underlying idea of our algorithm is to use an approximation of density estimation such that it is not possible to reconstruct the original data with better probability than some given level.
The search for unknown frequent pattern is one of the core activities in many time series data mi... more The search for unknown frequent pattern is one of the core activities in many time series data mining processes. In this paper we present an extension of the pattern discovery problem in two directions. First, we assume data to be distributed among various participating peers, and require overhead communication to be minimized. Second, we allow the participating peer to be malicious, which means that we have to address privacy issues. We present three problems along with algorithms to solve them. They are presented in increasing order of complexity according to the extensions we are pursuing, i.e. distribution and privacy constraints. As the main result we present our secure multiparty protocol for the privacy preserving pattern discovery problem.
Spontaneous formation of peer-to-peer agent-based data mining systems seems a plausible scenario ... more Spontaneous formation of peer-to-peer agent-based data mining systems seems a plausible scenario in years to come. However, the emergence of peer-to-peer environments further exacerbates privacy and security concerns that arise when performing data mining tasks. We analyze potential threats to data privacy in a peer-topeer agent-based distributed data mining scenario, and discuss inference attacks which could compromise data privacy in a peer-to-peer distributed clustering scheme known as KDEC.
Web Intelligence and Agent Systems: An International Journal, 2006
A growing number of applications in distributed environment involve very large data sets that are... more A growing number of applications in distributed environment involve very large data sets that are inherently distributed among a large number of autonomous sources over a network. The demand to extend data mining technology to such distributed data sets has motivated the development of several approaches to distributed data mining and knowledge discovery, of which only a few make use of agents. We briefly review existing approaches and argue for the potential added value of using agent technology in the domain of knowledge discovery, discussing both issues and benefits. We also propose an approach to distributed data clustering, outline its agent-oriented implementation, and examine potential privacy violating attacks which agents may incur.
We present DPD-HE, a privacy preserving algorithm for mining time series data. We assume data is ... more We present DPD-HE, a privacy preserving algorithm for mining time series data. We assume data is split among several sites. The problem is to find all frequent subsequences of time series without revealing local data to any site. Our solution exploit density estimate and secure multiparty computation techniques to provide privacy to a given extent.
Engineering Applications of Artificial Intelligence, 2005
Multi-Agent Systems (MAS) offer an architecture for distributed problem solving. Distributed Data... more Multi-Agent Systems (MAS) offer an architecture for distributed problem solving. Distributed Data Mining (DDM) algorithms focus on one class of such distributed problem solving tasks-analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context of multiagents systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem solving scenarios. It reviews algorithms for distributed clustering, including privacypreserving ones. It describes challenges for clustering in sensor-network environments, potential shortcomings of the current algorithms, and future work accordingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering.
Engineering Applications of Artificial Intelligence, 2006
In this paper we address confidentiality issues in distributed data clustering, particularly the ... more In this paper we address confidentiality issues in distributed data clustering, particularly the inference problem. We present KDEC-S algorithm for distributed data clustering, which is shown to provide mining results while preserving confidentiality of original data. We also present a confidentiality framework with which we can state the confidentiality level of KDEC-S. The underlying idea of KDEC-S is to use an approximation of density estimation such that the original data cannot be reconstructed to a given extent.
A growing number of applications in distributed environment involve very large data sets that are... more A growing number of applications in distributed environment involve very large data sets that are inherently distributed among a large number of autonomous sources over a network. The demand to extend data mining technology to such distributed data sets has motivated the development of several approaches to distributed data mining and knowledge discovery, of which only a few make use of agents. We briefly review existing approaches and argue for the potential added value of using agent technology in the domain of knowledge discovery, discussing both issues and benefits. We also propose an approach to distributed data clustering, outline its agent-oriented implementation, and examine potential privacy violating attacks which agents may incur.
In this paper we address confidentiality issues in distributed data clustering, particularly the ... more In this paper we address confidentiality issues in distributed data clustering, particularly the inference problem. We present a measure of inference risk as a function of reconstruction precision and number of colluders in a distributed data mining group. We also present KDEC-S, which is a distributed clustering algorithm designed to provide mining results while preserving confidentiality of original data. The underlying idea of our algorithm is to use an approximation of density estimation such that it is not possible to reconstruct the original data with better probability than some given level.
The search for unknown frequent pattern is one of the core activities in many time series data mi... more The search for unknown frequent pattern is one of the core activities in many time series data mining processes. In this paper we present an extension of the pattern discovery problem in two directions. First, we assume data to be distributed among various participating peers, and require overhead communication to be minimized. Second, we allow the participating peer to be malicious, which means that we have to address privacy issues. We present three problems along with algorithms to solve them. They are presented in increasing order of complexity according to the extensions we are pursuing, i.e. distribution and privacy constraints. As the main result we present our secure multiparty protocol for the privacy preserving pattern discovery problem.
Spontaneous formation of peer-to-peer agent-based data mining systems seems a plausible scenario ... more Spontaneous formation of peer-to-peer agent-based data mining systems seems a plausible scenario in years to come. However, the emergence of peer-to-peer environments further exacerbates privacy and security concerns that arise when performing data mining tasks. We analyze potential threats to data privacy in a peer-topeer agent-based distributed data mining scenario, and discuss inference attacks which could compromise data privacy in a peer-to-peer distributed clustering scheme known as KDEC.
Web Intelligence and Agent Systems: An International Journal, 2006
A growing number of applications in distributed environment involve very large data sets that are... more A growing number of applications in distributed environment involve very large data sets that are inherently distributed among a large number of autonomous sources over a network. The demand to extend data mining technology to such distributed data sets has motivated the development of several approaches to distributed data mining and knowledge discovery, of which only a few make use of agents. We briefly review existing approaches and argue for the potential added value of using agent technology in the domain of knowledge discovery, discussing both issues and benefits. We also propose an approach to distributed data clustering, outline its agent-oriented implementation, and examine potential privacy violating attacks which agents may incur.