Distributed Database Research Papers - Academia.edu (original) (raw)

2025, ACM SIGMOD Record

In this paper we look at the application of XML data management support in scientific data analysis workflows. We describe a software infrastructure that aims to address issues associated with metadata management, data storage and management, and execution of data analysis workflows on distributed storage and compute platforms. This system couples a distributed, filter-stream based dataflow engine with a distributed XML-based data and metadata management system. We present experimental results from a biomedical image analysis use case that involves processing of digitized microscopy images for feature segmentation.

2025, Circulation in Computer Science

Cloud computing is where software applications, data storage and processing capacity are accessed via the internet. This paper involves with analysis of the importance of cloud computing in software application developed as a service. To achieved that, a web based application OPMaSwas developed to provide service that will improvetheaudit readiness and compliance, couple with conformance to records retention policies and automated record management bundled using SaaS model. The web application is meant to manage personnel of any given organization, providing strong functionalities developed in modules such as; personnel information management module, training information management module, leave management module, resume management module, appraisal management module, document management module, reporting module and payroll management module. This applies to a lot of third world countries especially in Nigeria where interest e-commerce is fast growing. The system was evaluated using...

2025, Information & Security: An International Journal

2025

We address the problem of maintaining the distributed database consistency in presence of failures while maximizing the database availability. Network Partitioning is a failure which partitions the distributed system into a number of parts, no part being able to communicate with any other. Formalizations of various notions in this context are developed and two measures for the performances of protocols in presence of a network partitioning are introduced. A general optimality theory is developed for two classes of protocols--centralized and decentralized. Optimal protocols are produced in all cases.

2025

In metropolitan areas the deployment of optical fiber and Gigabit Ethernets leads to an expansion of packetswitched networks with large available capacities. These capacities could thus be employed to carry traffic from existing GSM/UMTS base stations and PBX (Private Branch Exchanges). Carrying this traffic requires an emulation of E1/T1 telephone circuits over Ethernets, i.e. E1/T1 PDH/SDH signals from base stations need to be packetized and encapsulated in a circuit emulation adapter (CEA) before being sent over the Ethernet. The key problem with this emulation is to adjust the buffer play-out rate in the receiving CEA to the rate of the sending CEA. This synchronization is necessary: (i) to avoid long-time receiver buffer under-or overflows since base stations are up for weeks and months and (ii) to preserve the frequency of PDH signals across the Ethernet. TIK/ETH and Siemens have started the CoP (Circuit-over-Packets) project to address this problem and to build a new CEA demonstrator. This thesis documents laboratory measurements of the synchronization between pairs of CEA demonstrators over metropolitan Gigabit Ethernets, i.e. we have measured the Maximum Relative Time Interval Error (MRTIE) between the PDH signal that goes into the sending CEA and the PDH signal that comes out of the receiving CEA. To make measurements reproducible, we have employed the network emulator RplTrc [2] to emulate traffic patterns of a Gigabit Ethernets. RplTrc has also been developed by TIK/ETH. To get a feel for the achievements of the CoP project, we have additionally compared measurements of the CoP's CEA to measurements with three commercially available CEAs ("product A, B, C"). From our measurements, we conclude that the CoP's CEA generally outperforms the CEA C. The CoP's CEA also shows better performance in long-term synchronization stability than the CEA B. However, the synchronization algorithm of the CEA B is more robust to variable network conditions. Finally, we have found that the CEA A shows higher quality level for short-term synchronization stability than the CoP's CEA, while long-term behaviors are comparable. The CEA A is slightly more robust to different traffic scenarios, but equally sensitive to the limited delay (im)precision of RplTrc [2]. We explain this slightly better performance of the CEA A with the arithmetic limitations of the Siemens board used to implement the CoP's CEA demonstrator.

2025, Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

Spatial data analysis applications are emerging from a wide range of domains such as building information management, environmental assessments and medical imaging. Time-consuming computational geometry algorithms make these applications slow, even for medium-sized datasets. At the same time, there is a rapid expansion in available processing cores, through multicore machines and Cloud computing. The confluence of these trends demands effective parallelization of spatial query processing. Unfortunately, traditional parallel spatial databases are ill-equipped to deal with the performance heterogeneity that is common in the Cloud. We introduce Niharika, a parallel spatial data analysis infrastructure that exploits all available cores in a heterogeneous cluster. Niharika first uses a declustering technique that creates balanced spatial partitions. Then, Niharika adapts to performance heterogeneity and processing skew in the spatial dataset using dynamic loadbalancing. We evaluate Niharika with three load-balancing algorithms and two different spatial datasets (both from TIGER) using Amazon EC2 instances. Niharika adapts to the performance heterogeneity in the EC2 nodes, thereby achieving excellent speedups (e.g., 63.6X using 64 cores on 16 4-core EC2 nodes, in the best case) and outperforming an approach that does not adapt.

2025, Deep Science Publishing.

Databases are arguably the most critical piece of software of modern society [1
2]. They are a fundamental component of what has been called the Second
Industrial Revolution — the revolution of information. Prior to the existence of
database systems, computer programs were written specifically for an
application; that is, each application needed a new program to be developed. With
the advent of large mainframes and the implementation of time-sharing services
a rather inefficient implementation of a centralized database service came to be.
That is, a large computer stored the information needed by many organizations.
However, even for the simplest of applications, a lot of low-level programming
had to be done for each application. Each application implemented its own
routines for accessing the database and managing data formats, leaving little time
for the programmers to solve the problem at hand.

2025

An often voiced concern of business practitioners relates to the value of academic research to their enterprise operations. The purpose of our research is to analyze a specific academic article that purports to provide retailers with insights as to the sales impact of pricing promotions and regular priced merchandise. (Mulhern and Padgett, Journal of Marketing, October, 1995). We replicated and extended the original study to test for possible effects of survey question positioning. The data was collected with the cooperation of a large Canadian Hardware store, and although this is a work in progress, the preliminary findings indicate the possibility of question positioning affecting the data collected. A further contribution of our research is to demonstrate the belief that a healthy scepticism of recommendations found in published academic research can be lessened through the use of replication and extension research. Research in marketing serve two masters, social science on the one hand and managerial practice on the other. This work-in-progress paper explores the interface between the two, using as a basis an article by Mulhern and Padgett (1995 -subsequently referred to as MP). The original article explored the relationship between retail price promotions and regular price purchases.

2025, International Joint Conference on Artificial Intelligence

We investigate three parameterized algorithmic schemes for graphical models that can accommodate trade-offs between time and space: 1) AND/OR Adaptive Caching (AOC(i)); 2) Variable Elimination and Conditioning (VEC(i)); and 3) Tree Decomposition with Conditioning (TDC(i)). We show that AOC(i) is better than the vanilla versions of both VEC(i) and TDC(i), and use the guiding principles of AOC(i) to improve the other two schemes. Finally, we show that the improved versions of VEC(i) and TDC(i) can be simulated by AOC(i), which emphasizes the unifying power of the AND/OR framework.

2025

Sentiment analysis on Twitter data has attracted much attention recently. One of the system’s key features, is the immediacy in communication with other users in an easy, user-friendly and fast way. Consequently, people tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide diversity of topics. This amount of information oers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since none can invest an innite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample, is not representative to dene the sentiment polarity towards a topic due to the massive number of tweets published daily. In this paper, we go one step further and develop a novel method for sentiment le...

2025, ArXiv

Sentiment analysis (or opinion mining) on Twitter data has attracted much attention recently. One of the system's key features, is the immediacy in communication with other users in an easy, user-friendly and fast way. Consequently, people tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide diversity of topics. This amount of information offers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since none can invest an infinite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample, is not representative to define the sentiment polarity towards a topic due to the massive number of tweets published daily. In this paper, we go one step further and develop a...

2025, Very Large Data Bases

2025, INTERNATIONAL JOURNAL OF COMPUTER APPLICATION (IJCA)

Database security has gained wide notoriety among many individuals worldwide due to the increase in publicized incidents of the loss of or unauthorized exposure to sensitive or confidential data from major corporations, government agencies, and academic institutions. The amount of data collected, retained, and shared electronically by many institutions is steadily increasing. Consequently, the need for individuals to understand the issues, challenges, and available solutions of database security is being realized. At its core, database security strives to ensure that only authenticated users perform authorized activities at authorized times. A useful summary of the essence of database security is provided in the paper by. More formally, database security encompasses the constructs of confidentiality, integrity, and availability, designated as the CIA triad. In computing discipline curricula, database security is often a topic covered in either an introductory database or an introductory computer security course. This paper proposed an outline of a database security component to be included in computer science or computer engineering undergraduate or early graduate curricula by mapping a number of sub-topics to the three constructs of data security. The sub-topics include mechanisms for the individual access control, application access, vulnerability, inference, and auditing sub-topics.

2025

Alhamdulillah, thanks God, without H~m I am nothing. I would like to express my sillcere thanks to my major adviser Dr. Mitchell L. Neilsen for his guidanceẽ ncouragement, help, and support for the completion of my thesis. Without his constant support and inspiring ideas, this thesis would have been impossible. I also would like to thank Dr. K.M. George and Dr. H. Lu for serving on my graduate committee and for their precious time spending on my thesis. I also want to thank Dr. J.P. Chandler for his generous support a.nd help. I would also like to thank to my Indonesian Government represented by The Agency for The Assessment and Application of Technology (BPP Teknologi) for giving me an opportunity to pursue my education and sponsoring me. To Nusantara Aircraft Industries Limited (PT IPTN) where I have been working for, I want to thank for aU support that has been given to me and for giving me an opportunity to get my dream pursuing to the higher education. Last~but certainly not least, I wish to express with the deepest of my h art my sincere thanks to my parents, the ones who always instil in me the important of learning and always pray to God for me. My thanks also go to all of my friends who have helped, encouraged, and advised me during my stay in Stillwater.

2025

In today's data-driven world, the integrity, confidentiality, and availability of enterprise data are critical. Oracle databases, widely adopted by large-scale organizations across various sectors, play a fundamental role in managing and storing this sensitive information. However, as the reliance on Oracle systems grows, so does the surface area for potential security threats. This paper explores the key security challenges faced in Oracle database environments and presents a comprehensive analysis of best practices and solutions to mitigate these risks effectively.Several factors contribute to the growing concern around Oracle database security, including the rise of sophisticated cyberattacks, increased regulatory demands, and the shift toward cloud-based infrastructure. Oracle databases are often targeted due to misconfigurations, unpatched vulnerabilities, over-privileged accounts, and insufficient auditing mechanisms. Among the most pressing security issues are SQL injection attacks, privilege escalation, data leakage, and insider threats. Furthermore, the complexity of Oracle's architecture often leads to implementation challenges, especially when integrating advanced security tools like Oracle Database Vault, Transparent Data Encryption (TDE), and Label Security.This paper provides an in-depth literature survey of previous research on Oracle security, highlighting known vulnerabilities and industry-recommended countermeasures. It discusses the architectural principles of Oracle's built-in security mechanisms, including authentication models, access controls, encryption, and auditing frameworks. Real-world case studies are also presented to emphasize the practical implications of these threats and the effectiveness of implemented solutions.To overcome the identified challenges, the paper recommends a layered defense strategy encompassing secure configuration, regular patching, role-based access control, data encryption, and continuous monitoring. It also stresses the importance of compliance with international security standards such as GDPR and ISO/IEC 27001.Looking ahead, emerging technologies such as machine learning for anomaly detection, blockchain for immutable auditing, and automation in security patch deployment present promising directions for strengthening Oracle database security. The paper concludes by suggesting areas for future research and enhancement, particularly in the context of Oracle's evolving cloud infrastructure and hybrid deployments.By understanding the multifaceted security landscape of Oracle databases and implementing the strategies outlined in this study, organizations can significantly reduce the risk of data breaches, ensure compliance, and maintain trust in their information systems.

2025

2025, Science, Technology and Development

2025, International Journal of Innovative Research in Science, Engineering and Technology

In a rising era of information and communication technology, data plays a crucial role in all types of crossorganizational research and business applications. Data grids rely on the coordinated sharing of and interaction across multiple autonomous database management systems to provide transparent access to heterogeneous and autonomous data resources stored in grid nodes. In this paper, we present a grid-based model which provides a uniform access interface and distributed query mechanism to access heterogeneous and geographically distributed educational digital resources. We first present the overview of the grid-based model and then discussed the architectural view and implementation details in regards of educational resources.

2025, International Journal of Computer Application, 15(3), 1-14, ISSN: 2250-1797, 2025.

2025, Agrekon

University Linear programming models are widely used for farm-level investment decisions. The particular advantage of using this spatial decision support system is its ability to include region-wide competitive forces and local, national and international market constraints. The most apparent advantages of the optimisation technique can be summarised as follows: .:. The technique integrated resource potential and economic determinants in predicting land-use patterns. This interactive capability determined the relative profitability and competitive advantage of each of the selected crops vis-a-vis the resource units.

2025

Many challenges facing urban and built environment researchers stem from the complexity and diversity of the urban data landscape. This landscape is typified by multiple independent organizations each holding a variety of heterogeneous data sets of relevance to the urban community. Furthermore, urban research itself is diverse and multi-faceted covering areas as disparate as health, population demographics, logistics, energy and water usage, through to socio-economic indicators associated with communities. The Australian Urban Research Infrastructure Network (AURIN) project (www.aurin.org.au) is tasked with developing an e-Infrastructure through which a range of urban and built environment research areas will be supported. This will be achieved through development and support of a common (underpinning) e-Infrastructure. This paper outlines the requirements and design principles of the e-Infrastructure and how it aims to provide seamless, secure access to diverse, distributed data sets and tools of relevance to the urban research community. We also describe the initial case studies and their implementation that are currently shaping this e-Infrastructure.

2025, IBM Systems Journal

interfaces and two sets of error messages and their codes. On retrieval, it also has to perform the processing needed to combine data from the two databases. Application development is simplified if the two database systems support a common interface. An MDBS provides an integrated view of data from multiple, autonomous, heterogeneous, distributed sources. Examples of such interfaces are the Microsoft Open Database Connectivity (ODBC)' suite of functions, the X/Open SQL (Structured Query Language) Call Level Interface ( C L I ) , ~' ~ and the IBM Distributed Relational Database Architecture* (DRDA*). The application still recognizes that it is dealing with multiple data sources, but now their interfaces are the same. Integration processing, however, is still the responsibility of the application. Application development is simplified even further if all details of how to access the two database systems are delegated to a separate system. The term multidatabase system (MDBS) describes systems with this capability. The objective is to provide the application with the view that it is dealing with a single data source. If a request requires data from multiple sources, the multidatabase system will determine what data are required from each source, retrieve the data, and perform any integration processing needed. Large user organizations consistently express a strong need for systems that provide better data connectivity and data integration. We believe that the data connectivity problem is more or less solved: applications are now able to retrieve or update data in several different databases on several different platforms. However, simply being able to "get at" the data is not enough. 40 AITALURI ET AL. CORDS (a name stemming from an early group called "COnsortium for Research on Distributed Systems") is a research project focused on distributed applications. It is a collaborative effort involving IBM and several universities. More information about the project can be found in Reference 5. As part of this project, we have designed and prototyped an MDBS, called the CORDS-MDBS, that provides an integrated, relational view of multiple heterogeneous database systems. Currently, five data sources are supported: three different relational database systems, a network database system, and a hierarchical database system. In this paper, we present an overview of the architecture of the CORDS-MDBS and the current state of the prototype implementation. We describe the approaches taken in managing catalog information, schema integration, global query optimization, distributed transaction management, and interfacing to heterogeneous data sources. We also recommend that a few additional facilities be provided by database systems to ease the integration task. The objective of an MDBS is to provide an integrated view of data from multiple, autonomous, heterogeneous, distributed sources. Although an MDBS resembles a "traditional" distributed database system, there are major differences, mainly caused by the autonomy and heterogeneity of the underlying data sources. Autonomy implies that, to a component data source (CDS), the multidatabase system is just another application with no special privileges. It has no control over, or influence on, how the data are modeled by the CDS, how requests are processed, how transaction management is handled, and so on. Simply put, when developing a multidatabase system, we cannot rely on being able to change a CDS; we have to use whatever interface and capabilities a target CDS provides. Heterogeneity implies that the CDSS may differ in terms of data models, data representation, capabilities, and interfaces. Commonly used models include flat (indexed) files, hierarchical, network, relational, or object-oriented models. Different data models provide different primitives for structuring data, but many other properties and features are typically associated with a data model. These are, for example, the constraints that can

2025, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064)

In this paper, we present the implementation issues of a virtual backbone that supports the operations of the Uniform Quorum System (UQS) and the Randomized Database Group (RDG) mobility management schemes in an ad hoc network. The virtual backbone comprises nodes that are dynamically selected to contain databases that store the location information of the network nodes. Together with the UQS and RDG schemes, the virtual backbone allows both dynamic database residence and dynamic database access, which provide high degree of location data availability and reliability. We introduce a Distributed Database Coverage Heuristic (DDCH), which is equivalent to the centralized greedy algorithm for virtual backbone generation, but only requires local information exchange and local computation. We show how DDCH can be employed to dynamically maintain the structure of the virtual backbone, along with database merging, as the network topology changes. We also provide means to maintain connectivity among the virtual backbone nodes. We discuss optimization issues of DDCH through simulations. Simulation results suggest that the cost of ad hoc mobility management with a virtual backbone can be far below that of the conventional link-state routing.

2025, IEEE Transactions on Wireless Communications

Virtual Backbone Routing (VBR) is a scalable hybrid routing framework for ad hoc networks, which combines local proactive and global reactive routing components over a variable-sized zone hierarchy. The zone hierarchy is maintained through a novel distributed virtual backbone maintenance scheme, termed the Distributed Database Coverage Heuristic (DDCH), also presented in this paper. Borrowing from the design philosophy of the Zone Routing Protocol, VBR limits the proactive link information exchange to the local routing zones only. Furthermore, the reactive component of VBR restricts the route queries to within the virtual backbone only, thus improving the overall routing efficiency. Our numerical results suggest that the cost of the hybrid VBR scheme can be a small fraction of that of either one of the purely proactive or purely reactive protocols, with or without route caching. Since the data routes do not necessarily pass through the virtual backbone nodes, traffic congestion is considerably reduced. Yet, the average length of the VBR routes tends to be close to optimal. Compared with the traditional one-hop hierarchical protocols, our results indicate that, for a network of moderate to large size, VBR with an optimal zone radius larger than one can significantly reduce the routing traffic. Furthermore, we demonstrate VBR's improved scalability through analysis and simulations.

2025

trademark and document use rules apply. This document, developed by the Rule Interchange Format (RIF) Working Group, specifies the Basic Logic Dialect, RIF-BLD, a format that allows logic rules to be exchanged between rule systems. The RIF-BLD presentation syntax and semantics are specified both directly and as specializations of the RIF Framework for Logic Dialects, or RIF-FLD. The XML serialization syntax of RIF-BLD is specified via a mapping from the presentation syntax. A normative XML schema is also provided. Status of this Document May Be Superseded This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications

2025, arXiv (Cornell University)

This paper presents Odyssey, a novel distributed data-series processing framework that efficiently addresses the critical challenges of exhibiting good speedup and ensuring high scalability in data series processing by taking advantage of the full computational capacity of modern distributed systems comprised of multi-core servers. Odyssey addresses a number of challenges in designing efficient and highly-scalable distributed data series index, including efficient scheduling, and loadbalancing without paying the prohibitive cost of moving data around. It also supports a flexible partial replication scheme, which enables Odyssey to navigate through a fundamental trade-off between data scalability and good performance during query answering. Through a wide range of configurations and using several real and synthetic datasets, our experimental analysis demonstrates that Odyssey achieves its challenging goals. This paper appeared in PVLDB 2023, Volume 16.

2025, International Journal of Computer Trends and Technology

Database development in web development has significantly evolved, particularly over the past few decades, driven by the internet's growth and user needs. Initially rooted in the relational database model from the 1970s, advancements were influenced by major companies like Oracle and IBM. The complexity of databases has increased due to big data requirements, leading to the creation of NoSQL databases. Open-source systems like PostgreSQL, MySQL, and MongoDB have further accelerated web application development by offering flexible, cost-effective solutions. However, challenges like big data management and security remain prevalent. Future trends indicate that databases will become smarter through AI and machine learning, with technologies like blockchain potentially reshaping the landscape. This study seeks to explore these developments in depth.

2025

Genetic Algorithm (GA) has been widely used in many fields of optimization; one of them is Traveling Salesman Problem (TSP). GA in the TSP is primarily used in cases involving a lot of vertices, which is not possible to enumerate the shortest route. One of stages in GA is crossover operation to generate offspring’s chromosome based on parent’s. Example of some crossover operators in GA for TSP are Partially Mapped Crossover (PMX), Order Crossover (OX), Cycle Crossover (CX), and some others. However on constructing the route, they are not considering length of the route to maximize its fitness. The use of random numbers on constructing the route likely produces offspring (a new route) that is not better than its parent. Sequence of nodes in the route affects the length of the route. To minimize uncertainty, then the crossover operation should consider a method to arrange the chromosomes. This article studied incorporating two methods into crossover stage, in order to ensure the offsp...

2025

Cheese whey is a dairy industry effluent with a strong organic and saline content. The growing concern about the pollution and the environmental control as well as greater knowledge about its nutritional value has lead to the addition of the whey in the food chain. The purpose of this study was to develop a whey-based fruit beverage, and to compare the proximate composition and mineral content of experimental and commercial brands. From the analysis of raw materials, the proximate composition and mineral content of experimental whey-based fruit beverage was calculated. The information related to the commercial brand was obtained in the label. The experimental beverage presented proximate composition and mineral content similar to the commercial brand, except for the high content of selenium (70 μg/100g), which could be attributed to the proximate composition of whey. The production of whey-based fruit beverages is a good source of nutrients and a viable alternative to use the whey i...

2025

Storage has been extensively studied during the past few decades (Foster et al., 1997; Jose Guimaraes, 2001). However, the emerging trends on distributed computing bring new solutions for existent problems. Grid computing proposes a distributed approach for data storing. In this paper, we introduce a Grid-based system (ARCO) developed for multimedia storage of large ammounts of data. The system is being developed for Biblioteca Nacional, the National Library of Portugal. Using Grid informational system and resources management, we propose a transparent system where TeraBytes of data are stored in a beowulf cluster built of commodity components with backup solution and error recover mechanisms.

2025, ijecce.org

Database is not static but rapidly grows in size. These issues include how to allocate data, communication of the system, the coordination among the individual system, distributed transition control and query processing, concurrency control over distributed relation, design of global user interface, design of component system in different physical location, integration of existing database system security. The system architecture makes use of software portioning of the database based on data clustering, SQMD (Single Query Multiple Database) architecture, a web services interface and virtualization software technologies. The system allows uniform access to concurrently distributed database, using SQMD architecture. In this Paper explain Design Strategies of Distributed Database for SQMD architecture.

2025, ACM Transactions on Database Systems

Many algorithms have been devised for minimizing the costs associated with obtaining the answer to a single, isolated query in a distributed database system. However, if more than one query may be processed by the system at the same time and if the arrival times of the queries are unknown, the determination of optimal query-processing strategies becomes a stochastic optimization problem. In order to cope with such problems, a theoretical state-transition model is presented that treats the system as one operating under a stochastic load. Query-processing strategies may then be distributed over the processors of a network as probability distributions, in a manner which accommodates many queries over time. It is then shown that the model leads to the determination of optimal query-processing strategies as the solution of mathematical programming problems, and analytical results for several examples are presented. Furthermore, a divide-and-conquer approach is introduced for decomposing ...

2025, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)

Most of the existing nonlinear data analysis and modelling techniques including neural networks become computationally prohibitively expensive when the available data set exceeds the capacity of the computer main memory due to the slow disc access operations [ l ] . For the data received on-line from a source with an unknown probability distribution, the question addressed in this article is how to eficiently partition it to smaller representative subsets {data bases) and how to organize these data subsets in order to minimize the computational cost of the later data analysis. The proposed linear-time, on-line problem decomposition method achieves these objectives through balancing probability distributions of the individual disjoint data subsets, each aimed at approximating the original data-source distribution. Consequently, computationally eficient statistical data analysis and neural network modelling on data subsets fitting into a computer central memory will produce results similar to these obtained through a global, computationally infeasible data analysis. In

2025, Net-Centric Approaches to Intelligence and National Security

On both the public Internet and private Intranets, there is a vast amount of data available that is owned and maintained by different organizations, distributed all around the world. These data resources are rich and recent; however, information gathering and knowledge discovery from them, in a particular knowledge domain, confronts major difficulties. The objective of this article is to introduce an autonomous methodology to provide for domain-specific information gathering and integration from multiple distributed sources.

2025, 2011 IEEE World Haptics Conference

2025, Lecture Notes in Computer Science

GlobData is a project that aims to design and implement a middleware tool offering the abstraction of a global object database repository. This tool, called Copla, supports transactional access to geographically distributed persistent objects independent of their location. Additionally, it supports replication of data according to different consistency criteria. For this purpose, Copla implements a number of consistency protocols offering different tradeoffs between performance and fault-tolerance. This paper presents the work on strong consistency protocols for the Glob-Data system. Two protocols are presented: a voting protocol and a nonvoting protocol. Both these protocols rely on the use of atomic broadcast as a building block to serialize conflicting transactions. The paper also introduces the total order protocol being developed to support large-scale replication.

2025, IEEE Transactions on Systems, Man, and Cybernetics

A graphical representation tool-updated Petri nets (UPN)-has been developed to model rule-base specifications for CIM databases. UPN facilitates the modeling of relationships between operations of various manufacturing application systems and the database updates and retrievals among all the respective distributed databases. Based on this representation, a hierarchical modeling technique which includes refining and aggregating rules has also been developed. An application of the UPN is demonstrated in designing rule-based systems for controlling and integrating the information flow between manufacturing applications, including computer aided design, computer aided process planning, manufacturing resources planning, and shop floor control.

2025, 2011 11th International Conference on Intelligent Systems Design and Applications

Due to the dramatic increase of data volumes in different applications, it is becoming infeasible to keep these data in one centralized machine. It is becoming more and more natural to deal with distributed databases and networks. That is why distributed data mining techniques have been introduced. One of the most important data mining problems is data clustering. While many clustering algorithms exist for centralized databases, there is a lack of efficient algorithms for distributed databases. In this paper, an efficient algorithm is proposed for clustering distributed databases. The proposed methodology employs an iterative optimization technique to achieve better clustering objective. The experimental results reported in this paper show the superiority of the proposed technique over a recently proposed algorithm based on a distributed version of the well known K-Means algorithm (Datta et al. 2009) [1].

2025, Annales de Limnologie - International Journal of Limnology

2025, Semiconductor science and information devices

The data and internet are highly growing which causes problems in management of the big-data. For these kinds of problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for the availability of large data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This paper introduces Apache Hadoop architecture, components of Hadoop, their significance in managing vast volumes of data in a distributed system. Hadoop Distributed File System enables the storage of enormous chunks of data over a distributed network. Hadoop Framework maintains fsImage and edits files, which supports the availability and integrity of data. This paper includes cases of Hadoop implementation, such as monitoring weather, processing bioinformatics.

2025

Resumo. O projeto de banco de dados distribuído é um processo bastante complexo que envolve aspectos distintos para a realização de uma adequada distribuição dos dados (Buretta 1997). Muitos desses aspectos correspondem a requisitos não funcionais (propriedades de qualidade ou restrições dos sistemas), como por exemplo, disponibilidade, custos e desempenho. Esses requisitos normalmente são pouco explorados nas etapas do projeto, deixando assim de auxiliar no processo de distribuição. Este artigo tem como objetivo representar os requisitos não funcionais nas fases iniciais do projeto de banco de dados distribuído, através da integração de estratégias propostas pela área de Engenharia de Requisitos. O trabalho foca em especial o uso do Framework NFR e faz uma extensão de seus catálogos de requisitos não funcionais, a fim de integrar os principais aspectos relacionados com a distribuição dos dados.

2025, Databases, Knowledge, and Data Applications

Every simulation is based on an appropriate model. Particularly in 3D simulation, models are often large and complex recommending the usage of database technology for an efficient data management. However, the predominant and well-known relational databases are less suitable for the hierarchical structure of 3D models. In contrast, graph databases from the NoSQL field store their contents in the nodes and edges of a mathematical graph. The open source Neo4j is such a graph database. In this paper, we introduce an approach to use Neo4j as persistent storage for 3D simulation models. For that purpose, a runtime in-memory simulation database is synchronized with the graph database back end.

2025, Indian Engineering Journal

In this paper, the author seeks to determine the extent to which generative AI particularly the Large Language Models can redefine Database Migration. The conventional techniques that are used for migrating data to next generation databases entail scripting as well as mapping manual work which are prone to errors, cumbersome and demand the services of an expert. This research aims at developing an integral solution based on LLMs that can assist at specific and critical phases of the migration process, especially for heterogeneous migration between distinct platforms of databases. The authors specifically point out how LLMs are used for analyzing the source database schema, for handling schema translation and data type mapping automatically and for interpreting and converting other database-dependent code like stored procedures and functions. The use of LLMs in the research also seeks to achieve a major reduction in manual work, enhancement of accuracy, and the general time taken in the migration processes. The paper also considers the position of LLMs within the performance enhancement, security. Experimentations on a modified version of a Gemini model on a sample Oracle to PostgreSQL database migration justify the proposed approach. The analysis points out significant gains in precision and performance besides noticeable reduction in the likelihood of errors from the use of traditional techniques.

2025, Journal of Parallel and Distributed Computing

This paper addresses the processing of a query in distributed database systems using a sequence of semijoins. The objective is to minimize the intersite data traffic incurred by a distributed query. A method is developed which accurately and efficiently estimates the size of an intermediate result of a query. This method provides the basis of the query optimization algorithm. Since the distributed query optimization problem is known to be intractable, a heuristic algorithm is developed to determine a low-cost sequence of semijoins. The cost comparison with an existing algorithm is provided. The complexity of the main features of the algorithm is analytically derived. The scheduling time for sequences of semijoins is measured for example queries using the PASCAL program which implements the algorithm.

2025, Communications of the ACM

Diverse database management systems are used in large organizations. The heterogeneous distributed database system (DDS) can provide a flexible intergration of diverse databases for users and applications. This is because it allows for retrieval and update of distributed data under different data systems giving the illusion of accessing a single centralized database system.

2025, Indian Scientific Journal Of Research In Engineering And Management

In both Distributed and Real Time Databases Systems replication are interesting areas for the new researchers. In this paper, we provide an overview to compare replication techniques available for these database systems. Data consistency and scalability are the issues that are considered in this paper. Those issues are maintaining consistency between the actual state of the real-time object of the external environment and its images as reflected by all its replicas distributed over multiple nodes. We discuss a frame to create a replicated real-time database and preserve all timing constrains. In order to enlarge the idea for modelling a large scale database, we present a general outline that consider improving the Data consistency and scalability by using and accessible algorithm applied on the both database, with the goal to lower the degree of replication enables segments to have individual degrees of replication with the purpose of avoiding extreme resource usage, which all together contribute in solving the scalability problem for Distributed Real Time Database Systems.

2025

Concurrency manipulates the control of concurrent transaction execution. Distributed database management system enforce concurrency manipulate to make sure serializability and isolation of transaction. Lots of research has been done on this area and a number of algorithms have been purposed. In this article, we are comparing few algorithms for preserving the ACID property (atomicity, consistency, isolation, and durability) of transactions in DDBMS.

2025, CEUR Workshop Proceedings, 2024, 3668, рр 120–132

This research article presents an approach to performance tuning in distributed data streaming systems through the development of the Holistic Adaptive Optimization Technique (HAOT). The importance of
parameter tuning is underscored by its potential to significantly improve system performance without altering the existing design, thereby saving costs and avoiding the expenses associated with system redesign. However, traditional tuning methods often fall short by failing to optimize all components of the streaming architecture, leading to suboptimal performance. To address these shortcomings, our study introduces HAOT, a comprehensive optimization framework that dynamically integrates machine learning techniques to continuously analyze and adapt the configurations of sources, streaming engines, and sinks in real-time. This holistic approach not only aims to overcome the limitations of existing
parameter tuning methods but also reduces the reliance on skilled engineers by automating the optimization process. Our results demonstrate the effectiveness of HAOT in enhancing the performance
of distributed data streaming systems, thereby offering significant improvements over traditional tuning methods.