In order to run PSVM, you should install:
If you wish to run PSVM parallelly, please follow How to set up cluster environment as well.
To be able to compile source code of MPI in the next step, you need to install GCC compiler on your computer.
sudo apt-get install build-essential
MPI library Setup
Parallel infrastructure is based on MPI. mpich should be installed before running PSVM. You can download the latest version from here.
tar xvf mpich-3.2.tar.gz && cd mpich-3.2 ./configure make sudo make install
The code of PSVM is avaliable at: https://github.com/openbigdatagroup/psvm. Either clone the repo or download the compressed file.
git clone https://github.com/openbigdatagroup/psvm.git
The folder structure is:
- src: source code of svm
- data: training and testing data
- docker: Dockerfile and scripts
Compile PSVM on your computer using make:
After that, binary files svm_train and svm_predict are generated.
Next step, train a sample model by PSVM using testing dataset, and then predict the result using the model:
./svm_train -rank_ratio 0.1 -kernel_type 2 -hyper_parm 1 -gamma 0.01 data/splice ./svm_predict data/splice.t
Finally, if you can get information same as below:
========== Predict Accuracy ========== Accuracy : 0.864368 Positive Precision: 0.875899 Positive Recall : 0.861185 Negative Precision: 0.852305 Negative Recall : 0.867816Congratulations! PSVM is successfully installed on your computer.
To utilize parallel property of PSVM, you need to set up a cluster environment. In this part, we will build up the cluster step by step.
Assume we have 4 nodes with these host names: master, node0, node1, node2.
1. Define hostnames in /etc/hosts
Assuming we have the following network environment:
#/etc/hosts 127.0.0.1 localhost 192.168.133.100 master 192.168.133.101 node0 192.168.133.102 node1 192.168.133.103 node2
2. Create account
You need to create account with same username across all computers like this:
sudo adduser psvmer
3. Create shared directory among the cluster
To parallelly run PSVM on cluster, the binary files svm_train and svm_predict should be synced among all computers. To this end, we use NFS service on the master node to share a directory containing binary files with other slave nodes.
First, install NFS on each computer.For master node:
For slave node:
sudo apt-get install nfs-server
sudo apt-get install nfs-client
Second, share directory across computers.
On master node do like this:
After that, on each slave node, mount the shared directory:
echo "/home/psvmer/shared_dir * (rw, sync)" | sudo tee -a /etc/exports sudo service nfs-kernel-server restart
sudo mount master:/home/psvmer/shared_dir /home/psvmer/shared_dir
4. Install SSH Server
On each node:
sudo apt-get install openssh-server
5. Set up passwordless SSH Communication from master to slave
Log in master node with account you have created
Generate an RSA key:
Send the key to all slave nodes:
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub slave-node