@InterfaceAudience.Public @InterfaceStability.Stable public abstract class FileInputFormat<K,V> extends Object implements InputFormat<K,V>
InputFormat.
 
 FileInputFormat is the base class for all file-based 
 InputFormats. This provides a generic implementation of
 getSplits(JobConf, int).
 Implementations of FileInputFormat can also override the
 isSplitable(FileSystem, Path) method to prevent input files
 from being split-up in certain situations. Implementations that may
 deal with non-splittable files must override this method, since
 the default implementation assumes splitting is always possible.
| Modifier and Type | Class and Description | 
|---|---|
| static class  | FileInputFormat.CounterDeprecated.  | 
| Modifier and Type | Field and Description | 
|---|---|
| static String | INPUT_DIR_RECURSIVE | 
| static org.apache.commons.logging.Log | LOG | 
| static String | NUM_INPUT_FILES | 
| Constructor and Description | 
|---|
| FileInputFormat() | 
| Modifier and Type | Method and Description | 
|---|---|
| static void | addInputPath(JobConf conf,
            org.apache.hadoop.fs.Path path)Add a  Pathto the list of inputs for the map-reduce job. | 
| protected void | addInputPathRecursively(List<org.apache.hadoop.fs.FileStatus> result,
                       org.apache.hadoop.fs.FileSystem fs,
                       org.apache.hadoop.fs.Path path,
                       org.apache.hadoop.fs.PathFilter inputFilter)Add files in the input path recursively into the results. | 
| static void | addInputPaths(JobConf conf,
             String commaSeparatedPaths)Add the given comma separated paths to the list of inputs for
  the map-reduce job. | 
| protected long | computeSplitSize(long goalSize,
                long minSize,
                long blockSize) | 
| protected int | getBlockIndex(org.apache.hadoop.fs.BlockLocation[] blkLocations,
             long offset) | 
| static org.apache.hadoop.fs.PathFilter | getInputPathFilter(JobConf conf)Get a PathFilter instance of the filter set for the input paths. | 
| static org.apache.hadoop.fs.Path[] | getInputPaths(JobConf conf)Get the list of input  Paths for the map-reduce job. | 
| abstract RecordReader<K,V> | getRecordReader(InputSplit split,
               JobConf job,
               Reporter reporter)Get the  RecordReaderfor the givenInputSplit. | 
| protected String[] | getSplitHosts(org.apache.hadoop.fs.BlockLocation[] blkLocations,
             long offset,
             long splitSize,
             org.apache.hadoop.net.NetworkTopology clusterMap)This function identifies and returns the hosts that contribute 
 most for a given split. | 
| InputSplit[] | getSplits(JobConf job,
         int numSplits)Splits files returned by  listStatus(JobConf)when
 they're too big. | 
| protected boolean | isSplitable(org.apache.hadoop.fs.FileSystem fs,
           org.apache.hadoop.fs.Path filename)Is the given filename splittable? Usually, true, but if the file is
 stream compressed, it will not be. | 
| protected org.apache.hadoop.fs.FileStatus[] | listStatus(JobConf job)List input directories. | 
| protected FileSplit | makeSplit(org.apache.hadoop.fs.Path file,
         long start,
         long length,
         String[] hosts)A factory that makes the split for this class. | 
| protected FileSplit | makeSplit(org.apache.hadoop.fs.Path file,
         long start,
         long length,
         String[] hosts,
         String[] inMemoryHosts)A factory that makes the split for this class. | 
| static void | setInputPathFilter(JobConf conf,
                  Class<? extends org.apache.hadoop.fs.PathFilter> filter)Set a PathFilter to be applied to the input paths for the map-reduce job. | 
| static void | setInputPaths(JobConf conf,
             org.apache.hadoop.fs.Path... inputPaths)Set the array of  Paths as the list of inputs
 for the map-reduce job. | 
| static void | setInputPaths(JobConf conf,
             String commaSeparatedPaths)Sets the given comma separated paths as the list of inputs 
 for the map-reduce job. | 
| protected void | setMinSplitSize(long minSplitSize) | 
public static final org.apache.commons.logging.Log LOG
public static final String NUM_INPUT_FILES
public static final String INPUT_DIR_RECURSIVE
protected void setMinSplitSize(long minSplitSize)
protected boolean isSplitable(org.apache.hadoop.fs.FileSystem fs,
                  org.apache.hadoop.fs.Path filename)
FileInputFormat always returns
 true. Implementations that may deal with non-splittable files must
 override this method.
 FileInputFormat implementations can override this and return
 false to ensure that individual input files are never split-up
 so that Mappers process entire files.fs - the file system that the file is onfilename - the file name to checkpublic abstract RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
InputFormatRecordReader for the given InputSplit.
 It is the responsibility of the RecordReader to respect
 record boundaries while processing the logical split to present a 
 record-oriented view to the individual task.
getRecordReader in interface InputFormat<K,V>split - the InputSplitjob - the job that this split belongs toRecordReaderIOExceptionpublic static void setInputPathFilter(JobConf conf, Class<? extends org.apache.hadoop.fs.PathFilter> filter)
filter - the PathFilter class use for filtering the input paths.public static org.apache.hadoop.fs.PathFilter getInputPathFilter(JobConf conf)
protected void addInputPathRecursively(List<org.apache.hadoop.fs.FileStatus> result, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.PathFilter inputFilter) throws IOException
result - The List to store all files.fs - The FileSystem.path - The input path.inputFilter - The input filter that can be used to filter files/dirs.IOExceptionprotected org.apache.hadoop.fs.FileStatus[] listStatus(JobConf job) throws IOException
job - the job to list input paths forIOException - if zero items.protected FileSplit makeSplit(org.apache.hadoop.fs.Path file, long start, long length, String[] hosts)
protected FileSplit makeSplit(org.apache.hadoop.fs.Path file, long start, long length, String[] hosts, String[] inMemoryHosts)
public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
listStatus(JobConf) when
 they're too big.getSplits in interface InputFormat<K,V>job - job configuration.numSplits - the desired number of splits, a hint.InputSplits for the job.IOExceptionprotected long computeSplitSize(long goalSize,
                    long minSize,
                    long blockSize)
protected int getBlockIndex(org.apache.hadoop.fs.BlockLocation[] blkLocations,
                long offset)
public static void setInputPaths(JobConf conf, String commaSeparatedPaths)
conf - Configuration of the jobcommaSeparatedPaths - Comma separated paths to be set as 
        the list of inputs for the map-reduce job.public static void addInputPaths(JobConf conf, String commaSeparatedPaths)
conf - The configuration of the jobcommaSeparatedPaths - Comma separated paths to be added to
        the list of inputs for the map-reduce job.public static void setInputPaths(JobConf conf, org.apache.hadoop.fs.Path... inputPaths)
Paths as the list of inputs
 for the map-reduce job.conf - Configuration of the job.inputPaths - the Paths of the input directories/files 
 for the map-reduce job.public static void addInputPath(JobConf conf, org.apache.hadoop.fs.Path path)
Path to the list of inputs for the map-reduce job.conf - The configuration of the jobpath - Path to be added to the list of inputs for 
            the map-reduce job.public static org.apache.hadoop.fs.Path[] getInputPaths(JobConf conf)
Paths for the map-reduce job.conf - The configuration of the jobPaths for the map-reduce job.protected String[] getSplitHosts(org.apache.hadoop.fs.BlockLocation[] blkLocations, long offset, long splitSize, org.apache.hadoop.net.NetworkTopology clusterMap) throws IOException
blkLocations - The list of block locationsoffset - splitSize - IOExceptionCopyright © 2022 Apache Software Foundation. All rights reserved.