Package org.apache.sysds.runtime.io
Class IOUtilFunctions
- java.lang.Object
-
- org.apache.sysds.runtime.io.IOUtilFunctions
-
public class IOUtilFunctions extends Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classIOUtilFunctions.CountRowsTask
-
Field Summary
Fields Modifier and Type Field Description static StringEMPTY_TEXT_LINEstatic org.apache.hadoop.fs.PathFilterhiddenFileFilterstatic StringLIBSVM_DELIMstatic StringLIBSVM_INDEX_DELIM
-
Constructor Summary
Constructors Constructor Description IOUtilFunctions()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static intbaToInt(byte[] ba, int off)static longbaToLong(byte[] ba, int off)static intbaToShort(byte[] ba, int off)static voidcheckAndRaiseErrorCSVEmptyField(String row, boolean fill, boolean emptyFound)static voidcheckAndRaiseErrorCSVNumColumns(String fname, String line, String[] parts, long ncol)static voidcloseSilently(Closeable io)static voidcloseSilently(org.apache.hadoop.mapred.RecordReader<?,?> rr)static intcountNnz(String[] cols)Returns the number of non-zero entries but avoids the expensive string to double parsing.static intcountNnz(String[] cols, int pos, int len)Returns the number of non-zero entries but avoids the expensive string to double parsing.static intcountNumColumnsCSV(org.apache.hadoop.mapred.InputSplit[] splits, org.apache.hadoop.mapred.InputFormat informat, org.apache.hadoop.mapred.JobConf job, String delim)Counts the number of columns in a given collection of csv file splits.static intcountTokensCSV(String str, String delim)Counts the number of tokens defined by the given delimiter, respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.static voiddeleteCrcFilesFromLocalFileSystem(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)Delete the CRC files from the local file system associated with a particular file and its metadata file.static <T> Tget(Future<T> in)static byte[]getBytes(ByteBuffer buff)static org.apache.hadoop.fs.FileSystemgetFileSystem(String fname)static org.apache.hadoop.fs.FileSystemgetFileSystem(org.apache.hadoop.conf.Configuration conf)static org.apache.hadoop.fs.FileSystemgetFileSystem(org.apache.hadoop.fs.Path fname)static org.apache.hadoop.fs.FileSystemgetFileSystem(org.apache.hadoop.fs.Path fname, org.apache.hadoop.conf.Configuration conf)static org.apache.hadoop.fs.Path[]getMetadataFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path file)static StringgetPartFileName(int pos)static org.apache.hadoop.fs.Path[]getSequenceFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path file)static intgetUTFSize(String value)Returns the serialized size in bytes of the given string value, following the modified UTF-8 specification as used by Java's DataInput/DataOutput.static voidintToBa(int val, byte[] ba, int off)static booleanisObjectStoreFileScheme(org.apache.hadoop.fs.Path path)static booleanisSameFileScheme(org.apache.hadoop.fs.Path path1, org.apache.hadoop.fs.Path path2)static voidlongToBa(long val, byte[] ba, int off)static FileFormatPropertiesMMreadAndParseMatrixMarketHeader(String filename)static String[]readMatrixMarketHeader(String filename)static voidshortToBa(int val, byte[] ba, int off)static org.apache.hadoop.mapred.InputSplit[]sortInputSplits(org.apache.hadoop.mapred.InputSplit[] splits)static String[]split(String str, String delim)Splits a string by a specified delimiter into all tokens, including empty.static String[]splitByFirst(String str, String delim)static String[]splitCSV(String str, String delim)Splits a string by a specified delimiter into all tokens, including empty while respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.static String[]splitCSV(String str, String delim, String[] tokens, Set<String> naStrings)Splits a string by a specified delimiter into all tokens, including empty while respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.static InputStreamtoInputStream(String input)static StringtoString(InputStream input)
-
-
-
Field Detail
-
hiddenFileFilter
public static final org.apache.hadoop.fs.PathFilter hiddenFileFilter
-
EMPTY_TEXT_LINE
public static final String EMPTY_TEXT_LINE
- See Also:
- Constant Field Values
-
LIBSVM_DELIM
public static final String LIBSVM_DELIM
- See Also:
- Constant Field Values
-
LIBSVM_INDEX_DELIM
public static final String LIBSVM_INDEX_DELIM
- See Also:
- Constant Field Values
-
-
Method Detail
-
getFileSystem
public static org.apache.hadoop.fs.FileSystem getFileSystem(String fname) throws IOException
- Throws:
IOException
-
getFileSystem
public static org.apache.hadoop.fs.FileSystem getFileSystem(org.apache.hadoop.fs.Path fname) throws IOException- Throws:
IOException
-
getFileSystem
public static org.apache.hadoop.fs.FileSystem getFileSystem(org.apache.hadoop.conf.Configuration conf) throws IOException- Throws:
IOException
-
getFileSystem
public static org.apache.hadoop.fs.FileSystem getFileSystem(org.apache.hadoop.fs.Path fname, org.apache.hadoop.conf.Configuration conf) throws IOException- Throws:
IOException
-
isSameFileScheme
public static boolean isSameFileScheme(org.apache.hadoop.fs.Path path1, org.apache.hadoop.fs.Path path2)
-
isObjectStoreFileScheme
public static boolean isObjectStoreFileScheme(org.apache.hadoop.fs.Path path)
-
getPartFileName
public static String getPartFileName(int pos)
-
closeSilently
public static void closeSilently(Closeable io)
-
closeSilently
public static void closeSilently(org.apache.hadoop.mapred.RecordReader<?,?> rr)
-
checkAndRaiseErrorCSVEmptyField
public static void checkAndRaiseErrorCSVEmptyField(String row, boolean fill, boolean emptyFound) throws IOException
- Throws:
IOException
-
checkAndRaiseErrorCSVNumColumns
public static void checkAndRaiseErrorCSVNumColumns(String fname, String line, String[] parts, long ncol) throws IOException
- Throws:
IOException
-
split
public static String[] split(String str, String delim)
Splits a string by a specified delimiter into all tokens, including empty. NOTE: This method is meant as a faster drop-in replacement of the regular string split.- Parameters:
str- string to splitdelim- delimiter- Returns:
- string array
-
splitCSV
public static String[] splitCSV(String str, String delim)
Splits a string by a specified delimiter into all tokens, including empty while respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.- Parameters:
str- string to splitdelim- delimiter- Returns:
- string array of tokens
-
splitCSV
public static String[] splitCSV(String str, String delim, String[] tokens, Set<String> naStrings)
Splits a string by a specified delimiter into all tokens, including empty while respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.- Parameters:
str- string to splitdelim- delimitertokens- array for tokens, length needs to match the number of tokensnaStrings- the strings to map to null value.- Returns:
- string array of tokens
-
countTokensCSV
public static int countTokensCSV(String str, String delim)
Counts the number of tokens defined by the given delimiter, respecting the rules for quotes and escapes defined in RFC4180, with robustness for various special cases.- Parameters:
str- string to splitdelim- delimiter- Returns:
- number of tokens split by the given delimiter
-
readAndParseMatrixMarketHeader
public static FileFormatPropertiesMM readAndParseMatrixMarketHeader(String filename) throws DMLRuntimeException
- Throws:
DMLRuntimeException
-
countNnz
public static int countNnz(String[] cols)
Returns the number of non-zero entries but avoids the expensive string to double parsing. This function is guaranteed to never underestimate.- Parameters:
cols- string array- Returns:
- number of non-zeros
-
countNnz
public static int countNnz(String[] cols, int pos, int len)
Returns the number of non-zero entries but avoids the expensive string to double parsing. This function is guaranteed to never underestimate.- Parameters:
cols- string arraypos- starting array indexlen- ending array index- Returns:
- number of non-zeros
-
getUTFSize
public static int getUTFSize(String value)
Returns the serialized size in bytes of the given string value, following the modified UTF-8 specification as used by Java's DataInput/DataOutput. see java docs: docs/api/java/io/DataInput.html#modified-utf-8- Parameters:
value- string value- Returns:
- string size for modified UTF-8 specification
-
toInputStream
public static InputStream toInputStream(String input)
-
toString
public static String toString(InputStream input) throws IOException
- Throws:
IOException
-
sortInputSplits
public static org.apache.hadoop.mapred.InputSplit[] sortInputSplits(org.apache.hadoop.mapred.InputSplit[] splits)
-
countNumColumnsCSV
public static int countNumColumnsCSV(org.apache.hadoop.mapred.InputSplit[] splits, org.apache.hadoop.mapred.InputFormat informat, org.apache.hadoop.mapred.JobConf job, String delim) throws IOExceptionCounts the number of columns in a given collection of csv file splits. This primitive aborts if a row with more than 0 columns is found and hence is robust against empty file splits etc.- Parameters:
splits- input splitsinformat- input formatjob- job configruationdelim- delimiter- Returns:
- the number of columns in the collection of csv file splits
- Throws:
IOException- if IOException occurs
-
getSequenceFilePaths
public static org.apache.hadoop.fs.Path[] getSequenceFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path file) throws IOException- Throws:
IOException
-
getMetadataFilePaths
public static org.apache.hadoop.fs.Path[] getMetadataFilePaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path file) throws IOException- Throws:
IOException
-
deleteCrcFilesFromLocalFileSystem
public static void deleteCrcFilesFromLocalFileSystem(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path) throws IOExceptionDelete the CRC files from the local file system associated with a particular file and its metadata file.- Parameters:
fs- the file systempath- the path to a file- Throws:
IOException- thrown if error occurred attempting to delete crc files
-
baToShort
public static int baToShort(byte[] ba, int off)
-
baToInt
public static int baToInt(byte[] ba, int off)
-
baToLong
public static long baToLong(byte[] ba, int off)
-
shortToBa
public static void shortToBa(int val, byte[] ba, int off)
-
intToBa
public static void intToBa(int val, byte[] ba, int off)
-
longToBa
public static void longToBa(long val, byte[] ba, int off)
-
getBytes
public static byte[] getBytes(ByteBuffer buff)
-
get
public static <T> T get(Future<T> in)
-
-