OutputCommitter (Apache Hadoop Main 3.4.1 API) (original) (raw)


@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class OutputCommitter
extends Object
OutputCommitter describes the commit of task output for a Map-Reduce job.
The Map-Reduce framework relies on the OutputCommitter of the job to:

  1. Setup the job during initialization. For example, create the temporary output directory for the job during the initialization of the job.
  2. Cleanup the job after the job completion. For example, remove the temporary output directory after the job completion.
  3. Setup the task temporary output.
  4. Check whether a task needs a commit. This is to avoid the commit procedure if a task does not need commit.
  5. Commit of the task output.
  6. Discard the task commit.
    The methods in this class can be called from several different processes and from several different contexts. It is important to know which process and which context each is called from. Each method should be marked accordingly in its documentation. It is also important to note that not all methods are guaranteed to be called once and only once. If a method is not guaranteed to have this property the output committer needs to handle this appropriately. Also note it will only be in rare situations where they may be called multiple times for the same task.
    See Also:
    FileOutputCommitter, JobContext, TaskAttemptContext