Installation


In order to run PSVM, you should install:

We use Ubuntu 12.04 (or above) for the following examples. It should works with other Linux distribution as well.

If you wish to run PSVM parallelly, please follow How to set up cluster environment as well.

GCC Setup


To be able to compile source code of MPI in the next step, you need to install GCC compiler on your computer.

sudo apt-get install build-essential

MPI library Setup


Parallel infrastructure is based on MPI. mpich should be installed before running PSVM. You can download the latest version from here.

tar xvf mpich-3.2.tar.gz && cd mpich-3.2
./configure
make
sudo make install

PSVM Setup


The code of PSVM is avaliable at: https://github.com/openbigdatagroup/psvm. Either clone the repo or download the compressed file.

git clone https://github.com/openbigdatagroup/psvm.git

The folder structure is:

Compile PSVM on your computer using make:

make
After that, binary files svm_train and svm_predict are generated.

Next step, train a sample model by PSVM using testing dataset, and then predict the result using the model:

./svm_train -rank_ratio 0.1 -kernel_type 2 -hyper_parm 1 -gamma 0.01 data/splice
./svm_predict data/splice.t

Finally, if you can get information same as below:

========== Predict Accuracy ==========
Accuracy          : 0.864368
Positive Precision: 0.875899
Positive Recall   : 0.861185
Negative Precision: 0.852305
Negative Recall   : 0.867816
Congratulations! PSVM is successfully installed on your computer.

Cluster Setup


To utilize parallel property of PSVM, you need to set up a cluster environment. In this part, we will build up the cluster step by step.
Assume we have 4 nodes with these host names: master, node0, node1, node2.

1. Define hostnames in /etc/hosts

Assuming we have the following network environment:

#/etc/hosts
127.0.0.1     localhost
192.168.133.100 master
192.168.133.101 node0
192.168.133.102 node1
192.168.133.103 node2

2. Create account

You need to create account with same username across all computers like this:

sudo adduser psvmer

3. Create shared directory among the cluster

To parallelly run PSVM on cluster, the binary files svm_train and svm_predict should be synced among all computers. To this end, we use NFS service on the master node to share a directory containing binary files with other slave nodes.

First, install NFS on each computer.

For master node:
sudo apt-get install nfs-server
For slave node:
sudo apt-get install nfs-client

Second, share directory across computers.

mkdir /home/psvmer/shared_dir
On master node do like this:
  • echo "/home/psvmer/shared_dir * (rw, sync)" | sudo tee -a /etc/exports
  • sudo service nfs-kernel-server restart
  • After that, on each slave node, mount the shared directory:
    sudo mount master:/home/psvmer/shared_dir /home/psvmer/shared_dir

    4. Install SSH Server

    On each node:

    sudo apt-get install openssh-server

    5. Set up passwordless SSH Communication from master to slave

    Log in master node with account you have created

    su psvmer
    Generate an RSA key:
    ssh-keygen -t rsa
    Send the key to all slave nodes:
    ssh-copy-id -i ~/.ssh/id_rsa.pub slave-node