MPJ+ CUDA
I just experimented with mpjexpress (java MPI) plus cuda.
Now you will be thinking how I used CUDA in java ?
I used Java bindings for CUDA from http://www.jcuda.org (Do not worry I will show you how to run simple JCUDA program)
Source Code of my Matrix Multiplication on GPU accelerated Cluster here
Mission.
In this tutorial we will write a hello world program with MPJExpress and CUDA.
What we will be doing is to invoke MPJ processes and each MPJProcess will in turn invoke CUDA kernel(s).
You need to cover three step to accomplish the mission.
1) Run Hello World program in MPJExpress.
2) Run Hello World program in JCUDA.
3) Now integerate 1 and 2.
Step 1: (if you have already run MPJ programs skip this)
Download MPJExpress from http://mpj-express.org/
We will follow Version 0.35.
There are two configuration to work with MPJExprees
(i) Multicore Configuration
(ii) Cluster Configuration
We will do Cluster Configuration because it covers Multicore Configuration.
Running MPJ Express Programs in the Cluster Configuration (from README)
=========================================================
1. Download MPJ Express and unpack it.
2. Set MPJ_HOME and PATH environmental variables:
export MPJ_HOME=/path/to/mpj/
export PATH=$PATH:$MPJ_HOME/bin
(These above two lines can be added to ~/.bashrc)
3. Write your MPJ Express program (World.java) and save it.

4. Write a machines file (name it "machines") stating host names or IP
addresses of all machines involved in the parallel execution (example localhost).
5. Start daemons: mpjboot machines (install ssh server if not)
6. Compile: javac -cp .:$MPJ_HOME/lib/mpj.jar World.java
7. Execute: mpjrun.sh -np 4 -dev niodev World
8. Stop daemons: mpjhalt machines
Step 2: (if you have already run JCUDA programs skip this)
1. Go to http://www.jcuda.org
2. Download ( create a folder jcuda put all into it)
a) Source code of all libraries
b) Binaries for Linux 32bit ( If you do not find your's, we will compile and make our own)
c) API documentations of all libraries
3. Set JCUDA_HOME and PATH environmental variables:
export JCUDA_HOME=/path/to/jcuda/
export PATH=$PATH:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86
(These above two lines can be added to ~/.bashrc)
4. Write your JCUDA program (hello.java) and save it.

5. Compile: javac -cp .:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86/jcuda-0.2.3.jar hello.java
6. Execute: java -cp .:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86/jcuda-0.2.3.jar hello
Step 3: (Finally what we were wating for)
Now that we have compeleted Step 1 and 2, Step 3 is very easy :)
1. Write your MPJ + CUDA program (HelloWorld.java) and save it.

2. Write a machines file (name it "machines") stating host names or IP
addresses of all machines involved in the parallel execution (example localhost).
3. Start daemons: mpjboot machines (install ssh server if not)
4. Compile: javac -cp .:$MPJ_HOME/lib/mpj.jar:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86/jcuda-0.2.3.jar HelloWorld.java
4. Execute: mpjrun.sh -cp -cp .:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86/jcuda-0.2.3.jar -np 4 -dev niodev World
6. Stop daemons: mpjhalt machines
Thats all..
------------
Known issues :
1. Driver API function in JCUDA may not work properly.
sol: you need to compile the binaries by your own.
a) make USEDRVAPI=1
b) make USEDRVAPI=1 emu=1 (binaries for emulator mode)
2. When running MPJ with CUDA, there may arise run time error when using Driver API functions, "can not link shared library"
sol: put this some where in the class,
static {
System.load("/usr/local/cuda/lib64/libcudart.so.2");
}
Or the relevant library
I just experimented with mpjexpress (java MPI) plus cuda.
Now you will be thinking how I used CUDA in java ?
I used Java bindings for CUDA from http://www.jcuda.org (Do not worry I will show you how to run simple JCUDA program)
Source Code of my Matrix Multiplication on GPU accelerated Cluster here
Mission.
In this tutorial we will write a hello world program with MPJExpress and CUDA.
What we will be doing is to invoke MPJ processes and each MPJProcess will in turn invoke CUDA kernel(s).
You need to cover three step to accomplish the mission.
1) Run Hello World program in MPJExpress.
2) Run Hello World program in JCUDA.
3) Now integerate 1 and 2.
Step 1: (if you have already run MPJ programs skip this)
Download MPJExpress from http://mpj-express.org/
We will follow Version 0.35.
There are two configuration to work with MPJExprees
(i) Multicore Configuration
(ii) Cluster Configuration
We will do Cluster Configuration because it covers Multicore Configuration.
Running MPJ Express Programs in the Cluster Configuration (from README)
=========================================================
1. Download MPJ Express and unpack it.
2. Set MPJ_HOME and PATH environmental variables:
export MPJ_HOME=/path/to/mpj/
export PATH=$PATH:$MPJ_HOME/bin
(These above two lines can be added to ~/.bashrc)
3. Write your MPJ Express program (World.java) and save it.

4. Write a machines file (name it "machines") stating host names or IP
addresses of all machines involved in the parallel execution (example localhost).
5. Start daemons: mpjboot machines (install ssh server if not)
6. Compile: javac -cp .:$MPJ_HOME/lib/mpj.jar World.java
7. Execute: mpjrun.sh -np 4 -dev niodev World
8. Stop daemons: mpjhalt machines
Step 2: (if you have already run JCUDA programs skip this)
1. Go to http://www.jcuda.org
2. Download ( create a folder jcuda put all into it)
a) Source code of all libraries
b) Binaries for Linux 32bit ( If you do not find your's, we will compile and make our own)
c) API documentations of all libraries
3. Set JCUDA_HOME and PATH environmental variables:
export JCUDA_HOME=/path/to/jcuda/
export PATH=$PATH:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86
(These above two lines can be added to ~/.bashrc)
4. Write your JCUDA program (hello.java) and save it.

5. Compile: javac -cp .:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86/jcuda-0.2.3.jar hello.java
6. Execute: java -cp .:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86/jcuda-0.2.3.jar hello
Step 3: (Finally what we were wating for)
Now that we have compeleted Step 1 and 2, Step 3 is very easy :)
1. Write your MPJ + CUDA program (HelloWorld.java) and save it.

2. Write a machines file (name it "machines") stating host names or IP
addresses of all machines involved in the parallel execution (example localhost).
3. Start daemons: mpjboot machines (install ssh server if not)
4. Compile: javac -cp .:$MPJ_HOME/lib/mpj.jar:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86/jcuda-0.2.3.jar HelloWorld.java
4. Execute: mpjrun.sh -cp -cp .:$JCUDA_HOME/JCuda-All-0.2.3-bin-linux-x86/jcuda-0.2.3.jar -np 4 -dev niodev World
6. Stop daemons: mpjhalt machines
Thats all..
------------
Known issues :
1. Driver API function in JCUDA may not work properly.
sol: you need to compile the binaries by your own.
a) make USEDRVAPI=1
b) make USEDRVAPI=1 emu=1 (binaries for emulator mode)
2. When running MPJ with CUDA, there may arise run time error when using Driver API functions, "can not link shared library"
sol: put this some where in the class,
static {
System.load("/usr/local/cuda/lib64/libcudart.so.2");
}
Or the relevant library