Extensions (original) (raw)


The Cascading ecosystem is filled with support for a variety of programming languages, data sources, serializers and tools that extend the functionality of Cascading applications.

These extensions are available for use with Cascading and are contributed code from the Cascading community. Many new projects are actively available through Cascading GitHub and the Conjars Maven jar repository.

Note: Most projects are hosted on GitHub and may have multiple branches and forks as users enrich the original projects. Many are also under active development.

Supported Languages

Supported languages extend Cascading functionality with domain-specific features and functionality of another language.

Language Project Description Resources License
Clojure Cascalog Clojure for Cascading GitHub Groups Issue Tracking Stack Overflow Docs Tutorials Apache 2.0
Java Cascading GitHub Groups Docs Tutorials Apache 2.0
JRuby Cascading.JRuby From Etsy, JRuby for Cascading GitHub Issue Tracking LGPL 3
Clojure PigPen MapReduce for Clojure GitHub Apache 2.0
PMML Pattern PMML for Cascading GitHub Groups Issue Tracking Docs Tutorials Apache 2.0
JPMML-Cascading From Openscoring, PMML for Cascading GitHub Groups Issue Tracking AGPL 3.0
Python PyCascading From Twitter, Python for Cascading GitHub Issue Tracking Tutorial Apache 2.0
Scala Scalding From Twitter, Scala for Cascading GitHub Groups Issue Tracking Stack Overflow Docs Tutorials Apache 2.0
SQL Lingual an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop GitHub Groups Issue Tracking Docs Tutorials Binary Apache 2.0

Data Source Connectivity (Taps)

A tap is a Cascading term that refers to a physical data source. These data sources can be used as inputs and outputs in Cascading.

Data Source Project Description Resources License
Accumulo Cascading.Accumulo Accumulo data source for Cascading GitHub Issue Tracking Apache 2.0
Cassandra Cascading-Cassandra Cassandra data source for Cascading GitHub Issue Tracking Apache 2.0, Eclipse
Derby Cascading-JDBC Derby data source for Cascading via JDBC GitHub Issue Tracking Apache 2.0
Elasticsearch elasticsearch-hadoop Elasticsearch data source for Cascading GitHub Issue Tracking Tutorial Apache 2.0
ElephantDB ElephantDB ElephantDB data source for Cascading GitHub Issue Tracking Custom
ArangoDB Guacaphant Allows you to tap ArangoDB Github Issue Tracking Conjars MIT
H2 Cascading-JDBC H2 data source for Cascading via JDBC GitHub Issue Tracking Apache 2.0
HBase Cascading.HBase HBase data source for Cascading GitHub Tutorial Apache 2.0
Hive Cascading-Hive Integrate and run Hive in Cascading GitHub Issue Tracking Apache 2.0
Hive Cascading.Hive Hive data source for Cascading GitHub Issue Tracking Apache 2.0
JDBC Cascading-JDBC Provides support for reading/writing data to/from an RDBMS via JDBC drivers GitHub Issue Tracking Apache 2.0
Kafka Cascading-Local Provide integrations with Apache Kafka GitHub Issue Tracking Apache 2.0
Oracle Cascading-JDBC Oracle data source for Cascading via JDBC GitHub Issue Tracking Tutorial Apache 2.0
Memcached Cascading.Memcached Memcached data source for Cascading GitHub Apache 2.0
MongoDB Cascading-Mongomigrate MongoDB data source for Cascading GitHub Apache 2.0
MySQL Cascading-JDBC MySQL data source for Cascading via JDBC GitHub Issue Tracking Apache 2.0
Neo4j Cascading.Neo4j Neo4j data source for Cascading GitHub Issue Tracking Apache 2.0
OpenCSV Cascading-OpenCSV A robust CSV parser GitHub Issue Tracking Apache 2.0
Parquet Parquet-mr Parquet data source for Cascading GitHub Groups Issue Tracking Apache 2.0
PostgreSQL Cascading-JDBC PostgreSQL data source for Cascading via JDBC GitHub Issue Tracking Apache 2.0
Redshift Cascading-JDBC Amazon Redshift data source for Cascading via JDBC GitHub Issue Tracking Tutorial Apache 2.0
S3 Cascading-Local Provide integrations with Amazon S3 GitHub Issue Tracking Apache 2.0
SimpleDB Cascading.SimpleDB From Scale Unlimited, SimpleDB data source for Cascading GitHub Issue Tracking Apache 2.0
Solr Cascading.Solr From Scale Unlimited, Solr data source for Cascading GitHub Issue Tracking Custom
Splunk Tbana Splunk data source for Cascading GitHub Issue Tracking Apache 2.0
Teradata Cascading-JDBC Teradata data source for Cascading via JDBC GitHub Issue Tracking Tutorial Apache 2.0

Serializers

Serializers provide integration with Cascading by translating data objects into other formats that can be stored and reconstructed.

Serializer Project Description Resources License
Avro Cascading.Avro From Scale Unlimited, data serialization for Apache Avro GitHub Issue Tracking Apache 2.0
JSON Cascading.JSON JavaScript Object Notation (JSON) utility classes for Cascading GitHub Issue Tracking GNU
Kryo Cascading.Kryo Provides a drop-in Kryo serialization for your Cascading (or Hadoop) workflow GitHub Issue Tracking Eclipse
Protocol Buffers Cascading2-protobufs From Square, library for working with Protocol Buffers GitHub Issue Tracking MIT
Thrift Cascading-Thrift Serializer and raw comparator for using TBase and TEnum objects in Hadoop GitHub Issue Tracking Custom

Cascading tools help create, debug, maintain, and otherwise support Cascading apps and functionality.

Project Project Description Resources License
Activator Scalding From Typesafe, an integration between Scalding and Typesafe Activator GitHub Issue Tracking Apache 2.0
Bixo Web mining toolkit that runs as a series of Cascading pipes GitHub Issue Tracking Tutorial Apache 2.0
Cascading-helpers From Square, functions, filters, and other tools for Cascading GitHub Issue Tracking Apache 2.0
Cascading-dbmigrate Tool to migrate relational databases into Hadoop GitHub Issue Tracking Apache 2.0
Cascading_ext From LiveRamp, a collection of tools to build, debug, and run data workflows GitHub Issue Tracking Apache 2.0
Cascading-simhash Simhashing is an algorithm that calculates “group id” (minimum hash) content GitHub Issue Tracking GPL 3
Cascading-tube Tiny wrapper around Hadoop for chaining operations GitHub Issue Tracking Apache 2.0
Cascading.utils Set of utilities for Cascading workflows for various projects GitHub Issue Tracking Apache 2.0
Conjecture From Etsy, a framework for building machine learning models in Hadoop using Scalding GitHub Issue Tracking MIT
Fluid a Fluent API for Cascading GitHub Issue Tracking Apache 2.0
Jading From Etsy, a build and execution tool for Cascading.JRuby that handles packaging for execution on Hadoop GitHub Issue Tracking Tutorial Custom
Lein-Cascading Leiningen is for automating Clojure projects GitHub Issue Tracking Apache 2.0
Lingual an ANSI SQL shell and JDBC driver to migrate workloads and export data on/off Hadoop GitHub Groups Issue Tracking Binary Apache 2.0
Load a command line interface for load testing and benchmarking GitHub Issue Tracking Binary Apache 2.0
Multitool a command line interface for building data processing jobs GitHub Binary Apache 2.0
Riffle Library for executing collections of dependent processes as a single process GitHub Issue Tracking Apache 2.0
Plunger From Hotels.com, this is a unit testing framework for Cascading applications to simplify automated tests for cascades, flows, assemblies and operations GitHub Issue Tracking Apache 2.0
ScaldingUnit Scalding unit testing library for test-driven development GitHub Issue Tracking Apache 2.0
Scalding-REPL From Twitter, REPL environment to prototype Scalding code and explore data sets with Scalding GitHub Issue Tracking Tutorial Apache 2.0
Scaldual From Twitter, Scaldual makes it easier for Scalding users to take advantage of Lingual. GitHub Issue Tracking Apache 2.0

Let us know by emailing the mail list.