- Our datafile is similar to libsvm. Data is stored using a sparse
representation, with one element per line. Each line begins with an
integer, which is the element's label and which must be either -1 or 1.
The label is then followed by a list of features, of the form
featureID:featureValue. It is like
<label> <index1>:<value1> <index2>:<value2> ...We only support binary classification, so the label must be
1/-1. Feature index must be in increasing order.
- The testing/validating data format is the same as the training file format. Note in particular that each element must still have a label, although these labels are only used for computing accuracy, precision, and recall statistics. If you don't have labels for the elements, just provide fake labels.
Suppose there are two elements, each with two features. The first one is Feature0: 1, Feature1: 2 and belongs to class 1; The second one is Feature0: 100, Feature1: 200 and belongs to class -1. Then the datafile would look like:
1 0:1 1:2 -1 0:100 1:200