Class DatasetSplitter
java.lang.Object
org.apache.lucene.classification.utils.DatasetSplitter
Utility class for creating training / test / cross validation indexes from the original index.
-
Constructor Summary
ConstructorsConstructorDescriptionDatasetSplitter(double testRatio, double crossValidationRatio) Create aDatasetSplitterby giving test and cross validation IDXs sizes -
Method Summary
Modifier and TypeMethodDescriptionvoidsplit(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, String classFieldName, String... fieldNames) Split a given index into 3 indexes for training, test and cross validation tasks respectively
-
Constructor Details
-
DatasetSplitter
public DatasetSplitter(double testRatio, double crossValidationRatio) Create aDatasetSplitterby giving test and cross validation IDXs sizes- Parameters:
testRatio- the ratio of the original index to be used for the test IDX as adoublebetween 0.0 and 1.0crossValidationRatio- the ratio of the original index to be used for the c.v. IDX as adoublebetween 0.0 and 1.0
-
-
Method Details
-
split
public void split(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, String classFieldName, String... fieldNames) throws IOException Split a given index into 3 indexes for training, test and cross validation tasks respectively- Parameters:
originalIndex- anLeafReaderon the source indextrainingIndex- aDirectoryused to write the training indextestIndex- aDirectoryused to write the test indexcrossValidationIndex- aDirectoryused to write the cross validation indexanalyzer-Analyzerused to create the new docstermVectors-trueif term vectors should be keptclassFieldName- name of the field used as the label for classification; this must be indexed with sorted doc valuesfieldNames- names of fields that need to be put in the new indexes ornullif all should be used- Throws:
IOException- if any writing operation fails on any of the indexes
-