- Training (svm_train) flags
- 0: normalized-linear
- 1: normalized-polynomial
- 2: RBF Gaussian
- 3: Laplasian
rank_ratio: approximation ratio between 0 and 1.
hyper_parm: C in SVM. This is the same as libsvm "-c" parameter
gamma: gamma value if you use RBF kernel. This is the same as libsvm "-g" parameter
poly_degree: degree if you use normalized-polynomial kernel. This is the same as libsvm "-d" parameter
model_path: the location to save the training model and checkpoints to. Be sure that this path is EMPTY before training a new model: svm_train will interpret any of its checkpoints left in this directory as checkpoints for the current model.
failsafe: If failsafe is set to true, program will periodically write checkpoints to
model_pathand if program fail, it will restart from last checkpoints.
save_interval: Because PSVM supports failsafe. On every
save_intervalseconds, program will write a checkpoint. If PSVM fails such as machine is down, it will restart from last checkpoint on next execution.
max_iteration: Because PSVM use Interior Point Method, there needs many iterations. The iteration will stop by checking ((surrogate_gap <
surrogate_gap_thresholdand primal residual <
feasible_thresholdand dual residual <
feasible_threshold) or iterations >
max_iteration). Usually setting them to default will handle most of the cases.
negative_weight: For unbalanced data, we should set a more-than-one weight to one of the class. For example there are 100 positive data and 10 negative data, it is suggested you set negative_weight to 10.
- Others: simply run svm_train to get description for each parameter. They are not frequently used unless you are quite familiar with algorithm details.
- Predicting (svm_predict) flags
model_path: the path of the model which we use to predict.
output_path: where to output
- Selection of
rank_ratiodecides the reduced matrix dimension
p = n*rank_ratio). Higher values yield higher accuracy at the cost of increased memory usage and training time. If you are unsure what value to use, we recommend a value of
nis the number of training samples. Also you should consider the number of machines and memory you can use when setting