Thejas Nair - Academia.edu (original) (raw)

Thejas Nair

Related Authors

Xuepeng Fan

Xuepeng Fan

Huazhong University of Science and Technology

Shravan Narayanamurthy

Natalie Weizenbaum

Uploads

Papers by Thejas Nair

Research paper thumbnail of Apache Pig's Optimizer

IEEE Data Eng. Bull., 2013

Apache Pig allows users to describe dataflows to be executed in Apache Hadoop. The distributed na... more Apache Pig allows users to describe dataflows to be executed in Apache Hadoop. The distributed nature of Hadoop, as well as its execution paradigms, provide many execution opportunities as well as impose constraints on the system. Given these opportunities and constraints Pig must make decisions about how to optimize the execution of user scripts. This paper covers some of those optimization choices, focussing one ones that are specific to the Hadoop ecosystem and Pig’s common use cases. It also discusses optimizations that the Pig community has considered adding in the future.

Research paper thumbnail of Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages

Research paper thumbnail of Apache Hive

Proceedings of the 2019 International Conference on Management of Data, 2019

Apache Hive is an open-source relational database system for analytic big-data workloads. In this... more Apache Hive is an open-source relational database system for analytic big-data workloads. In this paper we describe the key innovations on the journey from batch tool to fully fledged enterprise data warehousing system. We present a hybrid architecture that combines traditional MPP techniques with more recent big data and cloud concepts to achieve the scale and performance required by today's analytic applications. We explore the system by detailing enhancements along four main axis: Transactions, optimizer, runtime, and federation. We then provide experimental results to demonstrate the performance of the system for typical workloads and conclude with a look at the community roadmap.

Research paper thumbnail of Apache Pig's Optimizer

IEEE Data Eng. Bull., 2013

Apache Pig allows users to describe dataflows to be executed in Apache Hadoop. The distributed na... more Apache Pig allows users to describe dataflows to be executed in Apache Hadoop. The distributed nature of Hadoop, as well as its execution paradigms, provide many execution opportunities as well as impose constraints on the system. Given these opportunities and constraints Pig must make decisions about how to optimize the execution of user scripts. This paper covers some of those optimization choices, focussing one ones that are specific to the Hadoop ecosystem and Pig’s common use cases. It also discusses optimizations that the Pig community has considered adding in the future.

Research paper thumbnail of Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages

Research paper thumbnail of Apache Hive

Proceedings of the 2019 International Conference on Management of Data, 2019

Apache Hive is an open-source relational database system for analytic big-data workloads. In this... more Apache Hive is an open-source relational database system for analytic big-data workloads. In this paper we describe the key innovations on the journey from batch tool to fully fledged enterprise data warehousing system. We present a hybrid architecture that combines traditional MPP techniques with more recent big data and cloud concepts to achieve the scale and performance required by today's analytic applications. We explore the system by detailing enhancements along four main axis: Transactions, optimizer, runtime, and federation. We then provide experimental results to demonstrate the performance of the system for typical workloads and conclude with a look at the community roadmap.

Log In