Apache Hadoop 3.4.1 – C API libhdfs (original) (raw)

Overview

libhdfs is a JNI based C API for Hadoop’s Distributed File System (HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate HDFS files and the filesystem. libhdfs is part of the Hadoop distribution and comes pre-compiled in $HADOOP_HDFS_HOME/lib/native/libhdfs.so . libhdfs is compatible with Windows and can be built on Windows by running mvn compile within the hadoop-hdfs-project/hadoop-hdfs directory of the source tree.

The APIs

The libhdfs APIs are a subset of the Hadoop FileSystem APIs.

The header file for libhdfs describes each API in detail and is available in $HADOOP_HDFS_HOME/include/hdfs.h.

A Sample Program

#include "hdfs.h"

int main(int argc, char **argv) {

hdfsFS fs = hdfsConnect("default", 0);
const char* writePath = "/tmp/testfile.txt";
hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY |O_CREAT, 0, 0, 0);
if(!writeFile) {
      fprintf(stderr, "Failed to open %s for writing!\n", writePath);
      exit(-1);
}
char* buffer = "Hello, World!";
tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1);
if (hdfsFlush(fs, writeFile)) {
       fprintf(stderr, "Failed to 'flush' %s\n", writePath);
      exit(-1);
}
hdfsCloseFile(fs, writeFile);

}

See the CMake file for test_libhdfs_ops.c in the libhdfs source directory (hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt) or something like: gcc above_sample.c -I$HADOOP_HDFS_HOME/include -L$HADOOP_HDFS_HOME/lib/native -lhdfs -o above_sample

Common Problems

The most common problem is the CLASSPATH is not set properly when calling a program that uses libhdfs. Make sure you set it to all the Hadoop jars needed to run Hadoop itself as well as the right configuration directory containing hdfs-site.xml. Wildcard entries in the CLASSPATH are now supported by libhdfs.

Thread Safe

libhdfs is thread safe.