Parallel

Distributed Computing: Windows and Linux

By Mohamed Abo El,Fotouh and Klaus Diepold, October 10, 2007

Mohamed and Klaus describe a distributed system in which a single Windows machine controls a Linux cluster.

Controlling the Cluster

With the NIST statistical suite, input is designed to be interactive. We modified this to accept all the parameters from the command line (some parameters were also hard-coded). We then added these parameters:

Start. The index of the first processed file.
End. The index of the last processed file.
Exp. The index of the current experiment.
Other. Other parameters that are experiment specific.

All the intermediate files (a couple of dozens for each file) are suffixed with the file index and experiment index. After processing each file, that file is deleted with all the intermediate files to free space on the Linux cluster. Only the final result file is kept.

At this point, the question was how to start the NIST statistical suite on each processor of the Linux cluster. We had several options:

Connect to each machine using PUTTY and run the command line. However, this approach required a lot of user interaction, so it was discarded.
Connect to all the machines using SSH and start a program that executes scripts—which is what we opted to do.

We wrote the Cluster program (Listing Three) to invoke the Scheduler program (Listing Four) on each node of the Linux cluster. The Scheduler executes script files prepared by the script generator program. This is done using the fork() function to create a child process for each connection. The connections to the Linux nodes using SSH are set to use automatic login and the Cluster program is run on the cgywin emulator on the windows machine.


#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h> 
#include <unistd.h> 
int main()
{
int i,pid;
char cmd[100];
static int N=1;
for( i=start;i<end;i++) {
pid=fork();
sprintf(cmd,"ssh username@node%d.cluster.com ./script/scheduler ",i);
system(cmd);
exit(0);
if(pid==-1)
printf("error on node %d\n",i);}
return 1;
}

Listing Three


#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int CPU_sleep()
{
int i=0,j=0,k=0,sum=0;
for(i=0;i<100;i++)
for(j=0;j<100;j++)
for(k=0;k<10;k++)
sum=sum+j*j*k;
return sum;
}

void main()
{

char * file="/home/script/max.txt";
char* filebase="/home/script/";
char filename[200];
char cmd[200];
char tempo[200];
FILE* W;
FILE* F=fopen(file,"r");
int N=0;
int X,max;
int S,E;
fscanf(F,"%d",&max);


fclose(F);
while(N<max)
{
N++;
sprintf(filename,"%s%d.do",filebase,N);
CPU_sleep();
W=fopen(filename,"r");
if(W==NULL)
{
   FILE* ch=fopen(filename,"w");
   fclose(ch); // we reserve the file
   sprintf(cmd,"/home/script/%d.sh",N);
   system(cmd);
   ch=fopen(filename,"w");
   fprintf(ch,"DONE");//now the file is marked to be done
   fclose(ch);
}
else
{
   fclose(W);// by pass
}
F=fopen(file,"r");
fscanf(F,"%d",&max);  //read the maximum again, 
                      //  to be able to increase it
fclose(F);
CPU_sleep();
}
}

Listing Four

The script generator program generates scripts to be executed by the Linux schedule. A script processes each job and the number of the scripts is stored in a max.txt file. The script files are generated in the form #.sh, where # is the number of the script. These files are then transferred to the directory "script" on the Linux cluster and their access mode is changed to be executable using the chmod +x *.sh command executed from a PUTTY terminal.

The Scheduler program first opens the max.txt file to get the maximum number of scripts. It then enters the loop where it sequentially checks all the status files with the form "#.do." When it finds a nonexisting status file, then it directly assigns it to itself by creating it and executes the corresponding script file. After finishing the script file, it rewrites the status file with the string "DONE" in it, then searches for the next status file until the maximum is reached. It then stops.

The max is read in each loop (as sometimes we increased it) as we introduce new experiments (with new files).

Only one job is assigned per script. If one node stops working due to any reasons, only one file will not be processed, and can be processed later.

The generated scripts look like the script in Listing Five. After the Cluster program finishes, we check the status files. If any are of size 0, the corresponding file is not executed due to a node failure. We then execute this script (manually or by deleting this status file(s) and calling the Cluster program). If the Windows machine is restarted for some reason, we restart the Cluster program. In the worse case, two scripts execute at the same time on each node. Because the Linux cluster is not dedicated to our processes, we use the lowest priority of execution using the nice -n 19 command, which lets others use the Linux nodes.


cd
cd  test
nice -n 19 ./nist  1  1  2481  1  128 > /dev/null

Listing Five

Once the status files are four bytes in size (the word DONE is written in them), we transfer the result files back to the Windows machine using WinSCP, then run the unmodified Extractor program.

Previous 1 2 3

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Parallel

Distributed Computing: Windows and Linux

Controlling the Cluster

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Parallel

Distributed Computing: Windows and Linux

Controlling the Cluster

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content