CombineFileInputFormat (Apache Hadoop Main 3.4.1 API) (original) (raw)


@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class CombineFileInputFormat<K,V>
extends CombineFileInputFormat<K,V>
implements InputFormat<K,V>
An abstract InputFormat that returns CombineFileSplit's in InputFormat.getSplits(JobConf, int) method. Splits are constructed from the files under the input paths. A split cannot have files from different pools. Each split returned may contain blocks from different files. If a maxSplitSize is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize is equal to the block size, then this class is similar to the default spliting behaviour in Hadoop: each block is a locally processed split. Subclasses implement InputFormat.getRecordReader(InputSplit, JobConf, Reporter) to construct RecordReader's for CombineFileSplit's.
See Also:
CombineFileSplit