Benchmarking Spreadsheet Systems (original) (raw)

Anti-Freeze for Large and Complex Spreadsheets: Asynchronous Formula Computation

2019

Spreadsheet systems enable users to store and analyze data in an intuitive and flexible interface. Yet the scale of data being analyzed often leads to spreadsheets hanging and freezing on small changes. We propose a new asynchronous formula computation framework: instead of freezing the interface we return control to users quickly to ensure interactivity, while computing the formulae in the background. To ensure consistency, we indicate formulae being computed in the background via visual cues on the spreadsheet. Our asynchronous computation framework introduces two novel challenges: (a) How do we identify dependencies for a given change in a bounded time? (b) How do we schedule computation to maximize the number of spreadsheet cells available to the user over time? We bound the dependency identification time by compressing the formula dependency graph lossily, a problem we show to be NP-Hard. A compressed dependency table enables us to quickly identify the spreadsheet cells that ne...

Scaling up to Billions of Cells with DATASPREAD : Supporting Large Spreadsheets with Databases

2017

Spreadsheet software is the tool of choice for ad-hoc tabular data management, manipulation, querying, and visualization with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. We develop DATASPREAD, a system that holistically unifies databases and spreadsheets with a goal to work with massive spreadsheets: DATASPREAD retains all of the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the scalability and collaboration abilities of traditional relational databases. We design DATASPREAD with a spreadsheet front-end and a regular relational database back-end. To integrate spreadsheets and databases, in this paper, we develop a storage and indexing engine for spreadsheet data. We first formalize and study the problem of representing and manipulating spreadsheet data within a relational database. We demonstrate that identifying the optimal representat...

The Power of Spreadsheet Computations

We investigate the expressive power of spreadsheets. We consider spreadsheets which contain only formulas, and assume that they are small templates, which can be filled to a larger area of the grid to process input data of variable size. Therefore we can compare them to well-known machine models of computation. We consider a number of classes of spreadsheets defined by restrictions on their reference structure. Two of the classes correspond closely to parallel complexity classes: we prove a direct correspondence between the dimensions of the spreadsheet and amount of hardware and time used by a parallel computer to compute the same function. As a tool, we produce spreadsheets which are universal in these classes, i.e. can emulate any other spreadsheet from them. In other cases we implement in the spreadsheets in question instances of a polynomial-time complete problem, which indicates that the the spreadsheets are unlikely to have efficient parallel evaluation algorithms. Thus we get a picture how the computational power of spreadsheets depends on their dimensions and structure of references.

Towards unifying spreadsheets with databases for ad-hoc interactive data management at scale

2018

We are witnessing the increasing availability of data across a spectrum of domains, necessitating the interactive ad-hoc management and analysis of this data, in order to put it to use. Unfortunately, interactive ad-hoc management of very large datasets presents a host of challenges, ranging from performance to interface usability. This thesis introduces a new research direction of manipulation of large datasets using an interactive interface and makes several steps towards this direction. In particular, we develop DataSpread, a tool that enables users to work with arbitrary large datasets via a direct manipulation interface. DataSpread holistically unifies spreadsheets and relational databases to leverage the benefits of both. However, this holistic integration is not trivial due to the differences in the architecture and ideologies of the two paradigms: spreadsheets and databases. We have built a prototype of DataSpread, which, in addition to motivating the underlying challenges, ...

Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management

2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018

Spreadsheet software is the tool of choice for interactive ad-hoc data management, with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. On the other hand, database systems, while highly scalable, do not support interactivity as a first-class primitive. We are developing DATASPREAD, to holistically integrate spreadsheets as a frontend interface with databases as a back-end datastore, providing scalability to spreadsheets, and interactivity to databases, an integration we term presentational data management (PDM). In this paper, we make the first step towards this vision: developing a storage engine for PDM, studying how to flexibly represent spreadsheet data within a database and how to support and maintain access by position. We first conduct an extensive survey of spreadsheet use to motivate our functional requirements for a storage engine for PDM. We develop a natural set of mechanisms for flexibly representing spreadsheet data and demonstrate that identifying the optimal representation is NP-HARD; however, we develop an efficient approach to identify the optimal representation from an important and intuitive subclass of representations. We extend our mechanisms with positional access mechanisms that don't suffer from cascading update issues, leading to constant time access and modification performance. We evaluate these representations on a workload of typical spreadsheets and spreadsheet operations, providing up to 50% reduction in storage, and up to 50% reduction in formula evaluation time.

High performance spreadsheet simulation on a desktop grid

2008

We present a proof-of-concept prototype for high performance spreadsheet simulation called S3. Our goal is to provide a user-friendly, yet computationally powerful simulation environment for end users. Our approach is to add power of parallel computing on Windows-based desktop grid into popular Excel models. We show that, by using standard Web services and service-oriented architecture (SOA), one can build a fast and efficient system on a desktop grid for simulation. The complexity of parallelism can be hidden from users through a well-defined computation template. This work also demonstrates that a massive computing power can be harvested by linking off-the-shelf office PCs into a desktop grid for simulation. The experimental results show that the prototype system is highly scalable. In the best case, the execution time can be reduced 13.6 times using 16 desktop PCs; the simulation time is dramatically reduced from 200 minutes to 14 minutes.

Spreadsheet as a relational database engine

Proceedings of the 2010 international conference on Management of data - SIGMOD '10, 2010

Spreadsheets are among the most commonly used applications for data management and analysis. Perhaps they are even among the most widely used computer applications of all kinds. However, the spreadsheet paradigm of computation still lacks sufficient analysis.

Abacus: A New Spreadsheet Paradigm for Reducing Errors

When spreadsheets were initially developed, computers had low-resolution screens which could hold very little information and display only text-based information. Today, although nearly every computer has a large, high-resolution color graphical display, we are stuck with the paradigm of spreadsheets as a huge array of cells in which formulas are copied and modified. Formulas cannot be seen except for a cell-at-a-time view. Cells are referred to with an arcane letter-and-number syntax that belies the relative nature of the relationship between cell names and their use. This paper explores several ideas to form a new paradigm for spreadsheets for the purpose of making them easier to use correctly.

Towards a Spreadsheet Engineering

In this paper, we report some on-going focused research, but are further keen to set it in the context of a proposed bigger picture, as follows. There is a certain depressing pattern about the attitude of industry to spreadsheet error research and a certain pattern about conferences highlighting these issues. Is it not high time to move on from measuring spreadsheet errors to developing an armoury of disciplines and controls? In short, we propose the need to rigorously lay the foundations of a spreadsheet engineering discipline. Clearly, multiple research teams would be required to tackle such a big task. This suggests the need for both national and international collaborative research, since any given group can only address a small segment of the whole. There are already a small number of examples of such on-going international collaborative research. Having established the need for a directed research effort, the rest of the paper then attempts to act as an exemplar in demonstrati...

Google Sheets v Microsoft Excel: A Comparison of the Behaviour and Performance of Spreadsheet Users

2014

Summary of Findings Spreadsheet technology has traditionally been limited to a single user operating in a desktop environment and working in an isolated environment. With the advent of Cloud Computing, a paradigm shift has occurred in the way users utilise the collaborative sharing and communication of their work in both an educational and business environment. The opportunities for people to cooperate on multiple spreadsheets at the same time and in real time have grown significantly. However, the behaviour and performance of users working in this new paradigm has not been explored to a great extent in scientific. In comparison to desktop spreadsheet technologies such as Excel, Cloud based spreadsheet technologies have only started to be investigated in relation to user behaviour and performance.