Faster, Higher, Stronger: Redesigning Spreadsheets for Scale (original) (raw)
Related papers
Towards unifying spreadsheets with databases for ad-hoc interactive data management at scale
2018
We are witnessing the increasing availability of data across a spectrum of domains, necessitating the interactive ad-hoc management and analysis of this data, in order to put it to use. Unfortunately, interactive ad-hoc management of very large datasets presents a host of challenges, ranging from performance to interface usability. This thesis introduces a new research direction of manipulation of large datasets using an interactive interface and makes several steps towards this direction. In particular, we develop DataSpread, a tool that enables users to work with arbitrary large datasets via a direct manipulation interface. DataSpread holistically unifies spreadsheets and relational databases to leverage the benefits of both. However, this holistic integration is not trivial due to the differences in the architecture and ideologies of the two paradigms: spreadsheets and databases. We have built a prototype of DataSpread, which, in addition to motivating the underlying challenges, ...
Scaling up to Billions of Cells with DATASPREAD : Supporting Large Spreadsheets with Databases
2017
Spreadsheet software is the tool of choice for ad-hoc tabular data management, manipulation, querying, and visualization with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. We develop DATASPREAD, a system that holistically unifies databases and spreadsheets with a goal to work with massive spreadsheets: DATASPREAD retains all of the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the scalability and collaboration abilities of traditional relational databases. We design DATASPREAD with a spreadsheet front-end and a regular relational database back-end. To integrate spreadsheets and databases, in this paper, we develop a storage and indexing engine for spreadsheet data. We first formalize and study the problem of representing and manipulating spreadsheet data within a relational database. We demonstrate that identifying the optimal representat...
2018 IEEE 34th International Conference on Data Engineering (ICDE), 2018
Spreadsheet software is the tool of choice for interactive ad-hoc data management, with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. On the other hand, database systems, while highly scalable, do not support interactivity as a first-class primitive. We are developing DATASPREAD, to holistically integrate spreadsheets as a frontend interface with databases as a back-end datastore, providing scalability to spreadsheets, and interactivity to databases, an integration we term presentational data management (PDM). In this paper, we make the first step towards this vision: developing a storage engine for PDM, studying how to flexibly represent spreadsheet data within a database and how to support and maintain access by position. We first conduct an extensive survey of spreadsheet use to motivate our functional requirements for a storage engine for PDM. We develop a natural set of mechanisms for flexibly representing spreadsheet data and demonstrate that identifying the optimal representation is NP-HARD; however, we develop an efficient approach to identify the optimal representation from an important and intuitive subclass of representations. We extend our mechanisms with positional access mechanisms that don't suffer from cascading update issues, leading to constant time access and modification performance. We evaluate these representations on a workload of typical spreadsheets and spreadsheet operations, providing up to 50% reduction in storage, and up to 50% reduction in formula evaluation time.
SpreadMash: A Spreadsheet-Based Interactive Browsing and Analysis Tool for Data Services
Lecture Notes in Computer Science, 2008
Spreadsheets are one of the most popular end-users programming environment. Although spreadsheets provide an interactive interface for data manipulation and analysis, they are mostly used today in data entry mode and not as interactive browsing tool for data stored in underlying data sources. In this paper, we present SpreadMash, a high-level language and tool for interactive data browsing and analysis for data services. The key innovation of SpreadMash is a repository of application building blocks called data widgets that characterize various data importation and presentation patterns in spreadsheets. Data widgets enable the separation of end-users tasks (composing data widgets) from the tasks of data architects (creating data abstractions and data widgets). Through a series of examples we illustrate how tasks that would be challenging in existing environments are facilitated by SpreadMash.
NOAH: Interactive Spreadsheet Exploration with Dynamic Hierarchical Overviews
2021
Spreadsheet systems are by far the most popular platform for data exploration on the planet, supporting millions of rows of data. However, exploring spreadsheets that are this large via operations such as scrolling or issuing formulae can be overwhelming and error-prone. Users easily lose context and suffer from cognitive and mechanical burdens while issuing formulae on data spanning multiple screens. To address these challenges, we introduce dynamic hierarchical overviews that are embedded alongside spreadsheets. Users can employ this overview to explore the data at various granularities, zooming in and out of the spreadsheet. They can issue formulae over data subsets without cumbersome scrolling or range selection, enabling users to gain a high or low-level perspective of the spreadsheet. An implementation of our dynamic hierarchical overview, NOAH, integrated within DataSpread, preserves spreadsheet semantics and look and feel, while introducing such enhancements. Our user studie...
DATA-SPREAD: Unifying Databases and Spreadsheets
2015
Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer sig-nificant power, expressivity, and efficiency over spreadsheet soft-ware for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DATA-SPREAD, a data exploration tool that unifies databases and spreadsheets. DATA-SPREAD continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, Postgres. DATA-SPREAD retains all the ad-vantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spre...
Supporting the Spreadsheet Idea for Interactive Database Applications
IFIP Advances in Information and Communication Technology, 2010
Data base applications allow the analysis of complex and large data. There are many analysis functions showing different relations between the data. End users have often new requirements to see data and relations which can not be shown by the existing analysis software. They need possibilities to create new user interfaces to fit their requirements. Generally, users don't have programming knowledge and cannot wait until the development department has specified the corresponding software. They need a tool which can easily and quick produce corresponding results. The tool must allow navigating via complex data structures of data bases. This paper discusses a tool that allows end users to specify interactive applications like spreadsheets. The tool supports OLAP applications and is based on the Qt Designer.
VisualSynth: Democratizing Data Science in Spreadsheets
2020
We introduce VisualSynth, a framework that wants to democratize data science by enabling naive end-users to specify the data science tasks that match their needs. In VisualSynth, the user and the spreadsheet application interact by highlighting parts of the data using colors. The colors define a partial specification of a data science task (such as data wrangling or clustering), which is then completed and solved automatically using artificial intelligence techniques. The user can interactively refine the specification until she is satisfied with the result.
Understanding Data Analysis Workflows on Spreadsheets: Roadblocks and Opportunities
2020
Spreadsheets are widely used for data management and analysis by individuals and teams with varying degrees of programming expertise across a spectrum of domains. While several papers have studied the prevalence of errors on spreadsheets and performed ethnographic studies on spreadsheet use, little is known about how spreadsheet users approach and address computational tasks on spreadsheets, especially on relatively large datasets. To understand how users analyze data on spreadsheets, we conducted a study consisting of eight common analytical tasks, with thirty-two participants. Participants developed an execution strategy for each task and then attempted to operationalize this strategywithin the spreadsheet system. From examining the study results and transcripts, we identified the successful and unsuccessful strategies participants adopted in addressing the tasks. In general, we find that unsuccessful spreadsheet users had difficulties mapping spreadsheet models to their predeterm...
SpreadDB: Spreadsheet-Based User Interface for Querying and Updating Data of External Databases
Spreadsheet software is a software application capable of organizing, storing and analyzing data in tabular form. However, it has many limitations such as poor performance on large data sets, and lack of efficient and secured data sharing. In order to overcome these limitations, many organizations promote their users to store their data into databases. However, databases lack in the ease of use since they request database users to use SQL which is a standard language for querying and editing information stored in the databases. Since SQL is difficult for many end users to learn, database utilization of some parts of organizations does not progress. In order to relief users' burden of learning SQL, we propose SpreadDB which is a spreadsheet-based user interface for querying and updating database data. SpreadDB enables users to design spreadsheet templates used to perform data query and data update. We also present security measures of SpreadDB that prevent unauthorized persons from accessing and modifying database data.