Thejas Nair - Profile on Academia.edu (original) (raw)

Papers by Thejas Nair

IEEE Data Eng. Bull., 2013

Apache Pig allows users to describe dataflows to be executed in Apache Hadoop. The distributed na... more Apache Pig allows users to describe dataflows to be executed in Apache Hadoop. The distributed nature of Hadoop, as well as its execution paradigms, provide many execution opportunities as well as impose constraints on the system. Given these opportunities and constraints Pig must make decisions about how to optimize the execution of user scripts. This paper covers some of those optimization choices, focussing one ones that are specific to the Hadoop ecosystem and Pig’s common use cases. It also discusses optimizations that the Pig community has considered adding in the future.

Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages

Proceedings of the 2019 International Conference on Management of Data, 2019

Apache Hive is an open-source relational database system for analytic big-data workloads. In this... more Apache Hive is an open-source relational database system for analytic big-data workloads. In this paper we describe the key innovations on the journey from batch tool to fully fledged enterprise data warehousing system. We present a hybrid architecture that combines traditional MPP techniques with more recent big data and cloud concepts to achieve the scale and performance required by today's analytic applications. We explore the system by detailing enhancements along four main axis: Transactions, optimizer, runtime, and federation. We then provide experimental results to demonstrate the performance of the system for typical workloads and conclude with a look at the community roadmap.