Introduction

This is the project of the following paper: PSVM: Parallelizing Support Vector Machines on Distributed Computers. It is an all-kernel-support version of SVM, which can parallelly run on multiple machines.

We migrate it from Google's large scale computing infrastructure to MPI, then every one can use and run it. Please notice this open source project is a 20% project (we do it in part time), and it is still in a Beta version. :)

Software available at https://github.com/openbigdatagroup/psvm.

If you have any question, please feel free to contact us. And you can also ask your questions on: PSVM-users group

Why PSVM

Although widely used, Support Vector Machines (SVMs) suffer from a widely recognized scalability problem in both memory use and computational time.

PSVM achieves memory reduction and computation speedup via a row-based parallel Incomplete Cholesky Factorization (ICF) algorithm and parallel Interior-Point Method(IPM).

Empirical study shows that PSVM effectively speeds up training time for large-scale tasks while maintaining high training accuracy.

Let n denote the number of training instances, p the reduced matrix dimension after factorization (p is significantly smaller than n), and m the number of machines. PSVM reduces the memory requirement from O(n^2) to O(np/m), and improves computation time to O(np^2/m).

Additionally, PSVM handles kernels in contrast to other algorithmic approaches (Joachims, 2006; Chu et al., 2006).

Documentation

Citing PSVM

If you wish to publish any work based on PSVM, please cite our paper as:

Edward Chang, Kaihua Zhu, Hao Wang, Hongjie Bai, Jian Li, Zhihuan Qiu, and Hang Cui, 
PSVM: Parallelizing Support Vector Machines on Distributed  Computers. NIPS 2007.

The bibtex format is

@InProceedings{psvm,
  author =   {Edward Y. Chang and Kaihua Zhu and Hao Wang and Hongjie Bai and Jian Li and Zhihuan Qiu and Hang Cui},
  title =    {PSVM: Parallelizing Support Vector Machines on Distributed Computers},
  booktitle =    {NIPS},
  year =     {2007},
  note = {Software available at \url{http://openbigdatagroup.github.io/psvm}}
}

Acknowledgment

We would like to thank National Science Foundation for their grant IIS-0535085, which made the start of this project at UCSB in 2006 possible.