Package org.apache.sysds.runtime.util
Class DataConverter
- java.lang.Object
-
- org.apache.sysds.runtime.util.DataConverter
-
public class DataConverter extends Object
This class provides methods to read and write matrix blocks from to HDFS using different data formats. Those functionalities are used especially for CP read/write and exporting in-memory matrices to HDFS (before executing MR jobs).
-
-
Constructor Summary
Constructors Constructor Description DataConverter()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.commons.math3.linear.Array2DRowRealMatrixconvertToArray2DRowRealMatrix(MatrixBlock mb)Helper method that converts SystemDS matrix variable (varname) into a Array2DRowRealMatrix format, which is useful in invoking Apache CommonsMath.static org.apache.commons.math3.linear.BlockRealMatrixconvertToBlockRealMatrix(MatrixBlock mb)static boolean[]convertToBooleanVector(MatrixBlock mb)static DenseBlockconvertToDenseBlock(MatrixBlock mb)static DenseBlockconvertToDenseBlock(MatrixBlock mb, boolean deep)static List<Double>convertToDoubleList(MatrixBlock mb)static double[][]convertToDoubleMatrix(MatrixBlock mb)Creates a two-dimensional double matrix of the input matrix block.static double[]convertToDoubleVector(MatrixBlock mb)static double[]convertToDoubleVector(MatrixBlock mb, boolean deep)static double[]convertToDoubleVector(MatrixBlock mb, boolean deep, boolean allowNull)static FrameBlockconvertToFrameBlock(String[][] data)Converts a two dimensions string array into a frame block of value type string.static FrameBlockconvertToFrameBlock(String[][] data, Types.ValueType[] schema)static FrameBlockconvertToFrameBlock(String[][] data, Types.ValueType[] schema, String[] colnames)static FrameBlockconvertToFrameBlock(MatrixBlock mb)Converts a matrix block into a frame block of value type double.static FrameBlockconvertToFrameBlock(MatrixBlock mb, Types.ValueType vt)Converts a matrix block into a frame block of a given value type.static FrameBlockconvertToFrameBlock(MatrixBlock mb, Types.ValueType[] schema)static int[]convertToIntVector(MatrixBlock mb)static long[]convertToLongVector(MatrixBlock mb)static MatrixBlockconvertToMatrixBlock(double[][] data)Creates a dense Matrix Block and copies the given double matrix into it.static MatrixBlockconvertToMatrixBlock(double[] data, boolean columnVector)Creates a dense Matrix Block and copies the given double vector into it.static MatrixBlockconvertToMatrixBlock(int[][] data)Converts an Integer matrix to an MatrixBlockstatic MatrixBlockconvertToMatrixBlock(HashMap<MatrixIndexes,Double> map)static MatrixBlockconvertToMatrixBlock(HashMap<MatrixIndexes,Double> map, int rlen, int clen)NOTE: this method also ensures the specified matrix dimensionsstatic MatrixBlockconvertToMatrixBlock(org.apache.commons.math3.linear.RealMatrix rm)static MatrixBlockconvertToMatrixBlock(CTableMap map)static MatrixBlockconvertToMatrixBlock(CTableMap map, int rlen, int clen)NOTE: this method also ensures the specified matrix dimensionsstatic MatrixBlockconvertToMatrixBlock(FrameBlock frame)Converts a frame block with arbitrary schema into a matrix block.static MatrixBlock[]convertToMatrixBlockPartitions(MatrixBlock mb, boolean colwise)static String[][]convertToStringFrame(FrameBlock frame)Converts a frame block with arbitrary schema into a two dimensional string array.static TensorBlockconvertToTensorBlock(MatrixBlock mb, Types.ValueType vt, boolean toBasicTensor)static int[]convertVectorToIndexList(MatrixBlock mb)static voidcopyToDoubleVector(MatrixBlock mb, double[] dest, int destPos)static int[]getTensorDimensions(ExecutionContext ec, CPOperand dims)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, boolean localFS)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, boolean localFS)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, FileFormatProperties formatProperties)static MatrixBlockreadMatrixFromHDFS(ReadProperties prop)Core method for reading matrices in format textcell, matrixmarket, binarycell, or binaryblock from HDFS into main memory.static TensorBlockreadTensorFromHDFS(String dir, Types.FileFormat fmt, long[] dims, int blen, Types.ValueType[] schema)static BitSettoBitSet(double[] data)static double[]toDouble(float[] data)static double[]toDouble(int[] data)static double[]toDouble(long[] data)static double[]toDouble(String[] data)static double[]toDouble(BitSet data, int len)static float[]toFloat(double[] data)static int[]toInt(double[] data)static long[]toLong(double[] data)static String[]toString(double[] data)static StringtoString(TensorBlock tb)static StringtoString(TensorBlock tb, boolean sparse, String separator, String lineseparator, String leftBorder, String rightBorder, int rowsToPrint, int colsToPrint, int decimal)Returns a string representation of a tensorstatic StringtoString(ListObject list, int rows, int cols, boolean sparse, String separator, String lineSeparator, int rowsToPrint, int colsToPrint, int decimal)static StringtoString(FrameBlock fb)static StringtoString(FrameBlock fb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)static StringtoString(MatrixBlock mb)static StringtoString(MatrixBlock mb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)Returns a string representation of a matrixstatic voidwriteMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc)static voidwriteMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties)static voidwriteMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties, boolean diag)static voidwriteTensorToHDFS(TensorBlock tensor, String dir, Types.FileFormat fmt, DataCharacteristics dc)
-
-
-
Method Detail
-
writeMatrixToHDFS
public static void writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc) throws IOException
- Throws:
IOException
-
writeMatrixToHDFS
public static void writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties) throws IOException
- Throws:
IOException
-
writeMatrixToHDFS
public static void writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties, boolean diag) throws IOException
- Throws:
IOException
-
writeTensorToHDFS
public static void writeTensorToHDFS(TensorBlock tensor, String dir, Types.FileFormat fmt, DataCharacteristics dc) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, boolean localFS) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, boolean localFS) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, FileFormatProperties formatProperties) throws IOException
- Throws:
IOException
-
readTensorFromHDFS
public static TensorBlock readTensorFromHDFS(String dir, Types.FileFormat fmt, long[] dims, int blen, Types.ValueType[] schema) throws IOException
- Throws:
IOException
-
readMatrixFromHDFS
public static MatrixBlock readMatrixFromHDFS(ReadProperties prop) throws IOException
Core method for reading matrices in format textcell, matrixmarket, binarycell, or binaryblock from HDFS into main memory. For expected dense matrices we directly copy value- or block-at-a-time into the target matrix. In contrast, for sparse matrices, we append (column-value)-pairs and do a final sort if required in order to prevent large reorg overheads and increased memory consumption in case of unordered inputs. DENSE MxN input: * best/average/worst: O(M*N) SPARSE MxN input * best (ordered, or binary block w/ clen<=blen): O(M*N) * average (unordered): O(M*N*log(N)) * worst (descending order per row): O(M * N^2) NOTE: providing an exact estimate of 'expected sparsity' can prevent a full copy of the result matrix block (required for changing sparse->dense, or vice versa)- Parameters:
prop- read properties- Returns:
- matrix block
- Throws:
IOException- if IOException occurs
-
convertToDoubleMatrix
public static double[][] convertToDoubleMatrix(MatrixBlock mb)
Creates a two-dimensional double matrix of the input matrix block.- Parameters:
mb- matrix block- Returns:
- 2d double array
-
convertToBooleanVector
public static boolean[] convertToBooleanVector(MatrixBlock mb)
-
convertVectorToIndexList
public static int[] convertVectorToIndexList(MatrixBlock mb)
-
convertToIntVector
public static int[] convertToIntVector(MatrixBlock mb)
-
convertToLongVector
public static long[] convertToLongVector(MatrixBlock mb)
-
convertToDenseBlock
public static DenseBlock convertToDenseBlock(MatrixBlock mb)
-
convertToDenseBlock
public static DenseBlock convertToDenseBlock(MatrixBlock mb, boolean deep)
-
convertToDoubleVector
public static double[] convertToDoubleVector(MatrixBlock mb)
-
convertToDoubleVector
public static double[] convertToDoubleVector(MatrixBlock mb, boolean deep)
-
convertToDoubleVector
public static double[] convertToDoubleVector(MatrixBlock mb, boolean deep, boolean allowNull)
-
convertToDoubleList
public static List<Double> convertToDoubleList(MatrixBlock mb)
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(double[][] data)
Creates a dense Matrix Block and copies the given double matrix into it.- Parameters:
data- 2d double array- Returns:
- matrix block
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(int[][] data)
Converts an Integer matrix to an MatrixBlock- Parameters:
data- Int matrix input that is converted to double MatrixBlock- Returns:
- The matrixBlock constructed.
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(double[] data, boolean columnVector)
Creates a dense Matrix Block and copies the given double vector into it.- Parameters:
data- double arraycolumnVector- if true, create matrix with single column. if false, create matrix with single row- Returns:
- matrix block
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(HashMap<MatrixIndexes,Double> map)
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(HashMap<MatrixIndexes,Double> map, int rlen, int clen)
NOTE: this method also ensures the specified matrix dimensions- Parameters:
map- map of matrix index keys and double valuesrlen- number of rowsclen- number of columns- Returns:
- matrix block
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(CTableMap map)
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(CTableMap map, int rlen, int clen)
NOTE: this method also ensures the specified matrix dimensions- Parameters:
map- ?rlen- number of rowsclen- number of columns- Returns:
- matrix block
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(FrameBlock frame)
Converts a frame block with arbitrary schema into a matrix block. Since matrix block only supports value type double, we do a best effort conversion of non-double types which might result in errors for non-numerical data.- Parameters:
frame- frame block- Returns:
- matrix block
-
convertToStringFrame
public static String[][] convertToStringFrame(FrameBlock frame)
Converts a frame block with arbitrary schema into a two dimensional string array.- Parameters:
frame- frame block- Returns:
- 2d string array
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(String[][] data)
Converts a two dimensions string array into a frame block of value type string. If the given array is null or of length 0, we return an empty frame block.- Parameters:
data- 2d string array- Returns:
- frame block
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(String[][] data, Types.ValueType[] schema)
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(String[][] data, Types.ValueType[] schema, String[] colnames)
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(MatrixBlock mb)
Converts a matrix block into a frame block of value type double.- Parameters:
mb- matrix block- Returns:
- frame block of type double
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(MatrixBlock mb, Types.ValueType vt)
Converts a matrix block into a frame block of a given value type.- Parameters:
mb- matrix blockvt- value type- Returns:
- frame block
-
convertToFrameBlock
public static FrameBlock convertToFrameBlock(MatrixBlock mb, Types.ValueType[] schema)
-
convertToTensorBlock
public static TensorBlock convertToTensorBlock(MatrixBlock mb, Types.ValueType vt, boolean toBasicTensor)
-
convertToMatrixBlockPartitions
public static MatrixBlock[] convertToMatrixBlockPartitions(MatrixBlock mb, boolean colwise)
-
convertToArray2DRowRealMatrix
public static org.apache.commons.math3.linear.Array2DRowRealMatrix convertToArray2DRowRealMatrix(MatrixBlock mb)
Helper method that converts SystemDS matrix variable (varname) into a Array2DRowRealMatrix format, which is useful in invoking Apache CommonsMath.- Parameters:
mb- matrix object- Returns:
- matrix as a commons-math3 Array2DRowRealMatrix
-
convertToBlockRealMatrix
public static org.apache.commons.math3.linear.BlockRealMatrix convertToBlockRealMatrix(MatrixBlock mb)
-
convertToMatrixBlock
public static MatrixBlock convertToMatrixBlock(org.apache.commons.math3.linear.RealMatrix rm)
-
copyToDoubleVector
public static void copyToDoubleVector(MatrixBlock mb, double[] dest, int destPos)
-
toString
public static String toString(MatrixBlock mb)
-
toString
public static String toString(MatrixBlock mb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)
Returns a string representation of a matrix- Parameters:
mb- matrix blocksparse- if true, string will contain a table with row index, col index, value (where value != 0.0) otherwise it will be a rectangular string with all values of the matrix blockseparator- Separator string between each element in a row, or between the columns in sparse formatlineseparator- Separator string between each rowrowsToPrint- maximum number of rows to print, -1 for allcolsToPrint- maximum number of columns to print, -1 for alldecimal- number of decimal places to print, -1 for default- Returns:
- matrix as a string
-
toString
public static String toString(TensorBlock tb)
-
toString
public static String toString(TensorBlock tb, boolean sparse, String separator, String lineseparator, String leftBorder, String rightBorder, int rowsToPrint, int colsToPrint, int decimal)
Returns a string representation of a tensor- Parameters:
tb- tensor blocksparse- if true, string will contain a table with row index, col index, value (where value != 0.0) otherwise it will be a rectangular string with all values of the tensor blockseparator- Separator string between each element in a row, or between the columns in sparse formatlineseparator- Separator string between each rowleftBorder- Characters placed at the start of a new dimension levelrightBorder- Characters placed at the end of a new dimension levelrowsToPrint- maximum number of rows to print, -1 for allcolsToPrint- maximum number of columns to print, -1 for alldecimal- number of decimal places to print, -1 for default- Returns:
- tensor as a string
-
toString
public static String toString(FrameBlock fb)
-
toString
public static String toString(FrameBlock fb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)
-
toString
public static String toString(ListObject list, int rows, int cols, boolean sparse, String separator, String lineSeparator, int rowsToPrint, int colsToPrint, int decimal)
-
getTensorDimensions
public static int[] getTensorDimensions(ExecutionContext ec, CPOperand dims)
-
toDouble
public static double[] toDouble(float[] data)
-
toDouble
public static double[] toDouble(long[] data)
-
toDouble
public static double[] toDouble(int[] data)
-
toDouble
public static double[] toDouble(BitSet data, int len)
-
toDouble
public static double[] toDouble(String[] data)
-
toFloat
public static float[] toFloat(double[] data)
-
toInt
public static int[] toInt(double[] data)
-
toLong
public static long[] toLong(double[] data)
-
toBitSet
public static BitSet toBitSet(double[] data)
-
toString
public static String[] toString(double[] data)
-
-