Cluster Computing FAQ

Submitting MATLAB jobs

How can I submit a Matlab job to the batch cluster?

In order to run a matlab job on the RCE Batch Cluster, you must create a wrapper shell script to set a number of environment variables and run matlab in a command line mode.

In order to make this process simplier for RCE users, HMDC has developed a simple program to automate this process called submitMatlabBatch.sh:

[username@rce-1 ~]$ submitMatlabBatch.sh

Usage: /opt/bin/submitMatlabBatch.sh <MATLAB_FILE>

Users simply need to supply their matlab code file (typically FILENAME.m) as an argument.  STDOUT from your job will be captured in the condor_submit_util/FILENAME.m.condor.out file and STDERR will be captured in the condor_submit_util/FILENAME.m.condor.err file.

Viewing cluster status

How can I view the status of the cluster?

The status of the cluster can be viewed interactively as a graph showing current cluster usage and cluster usage over time. These graphs show both the number of jobs on the cluster, and the owners of those jobs.

The status of the Batch Cluster can be viewed at http://batch-head.hmdc.harvard.edu/

The status of the Interactive Cluster (where "RCE Powered" jobs are run) can be viewed at http://cod-head.hmdc.harvard.edu/

If you look at the "Pool Resource (Machine) Statistics" for the past day, you will see the usage, over time, of the Interactive Cluster nodes.

If you look at "Pool User (Job) Statistics" for the past hour you can see who currently has jobs running on the Interactive Cluster ("User" column) and how many nodes they are using ("JobsRunning Average" column).

 

Alternatively, you can also view the cluster status from the command line in the RCE.

To view the status of the Batch Cluster, run: condor_status -pool batch-head.priv.hmdc.harvard.edu

To view the status of the Interactive Cluster (where "RCE Powered" jobs are run), run: condor_status -pool cod-head.priv.hmdc.harvard.edu

To better understand the output of the condor_status command, refer to the Condor Documentation.

For detailed information on how to check the status of jobs in the cluster, please refer to the Getting Started With Batch Processing guide.

Available R packages

Which R packages are available for cluster jobs?

You can use the following packages from any cluster node without using install.packages:

  1. CRAN (http://cran.r-project.org/src/contrib)
    Updated weekly on Sundays.
  2. Gary King's R Packages (http://gking.harvard.edu/src/contrib)
    Updated daily.

Using home directory files

How do I use files in my home directory from my batch job?

When you submit a batch job, the R script is copied to a staging area and then executed by a cluster node. This means that you must set paths explicitly in your R scripts.

To set paths, add the following code to the beginning of all your R scripts. This tells R to find the absolute path to your home directory, then set the working directory to that path:

setwd(path.expand("~<username>"))

Use this code to address such problems as the following:

Loading required package: MASS Error in file(file, "r") : unable to open connection In addition: Warning message: cannot open file '<filename>', reason 'No such file or directory' Execution halted 

Note: If you use a subdirectory, include the path to the subdirectory in the setwd command referenced previously.

Tracking iteration numbers

How can I make my batch submission track iteration number?

To track iteration number for batch submissions, use one of the following:

  • Add --args '$(Process)' to the Arguments line of your Condor submit file. This passes to the R process the process number of the R run, which progresses from 0 to one less than the number of runs.
  • Capture the argument in a variable in your R code by entering the following line: run <- commandArgs(TRUE). The R object run contains the run number. You then can use this object to construct appropriate output file names for your job.

Multithreaded jobs

Many applications and programming libraries can make use of multiple CPU cores simultaneously by running multiple active threads or processes. If you would like to use multiple CPU cores simultaneously, please be sure to create the appropriate resource reservations so that your jobs do not compete with other jobs for a CPU, and are allocated resources for your exclusive use.

We do not currently place any technological limitations on CPU core usage in cluster computing, and instead ask that you observe the rule of "one CPU core per job instance" unless specifically reserving additional cores for your jobs.

Batch cluster jobs are only ever allocated one CPU core (and 4GB memory) per instance, although you can queue an unlimited number of job instances for high-throughput parallel computing.

For RCE Powered (interactive) jobs, you can request an allocation of up to 24 CPU cores (and up to 250GB RAM), but each job submission queues only one instance.

The following RCE Powered Applications will reserve more than one CPU core per job by default:

  • Stata/MP: 8 cores

To request a different number of CPU cores, use command-line interactive job submission:

condorInteractiveSubmit.pl -c num_cpu -x command

Some applications and libraries require additional options to set the number of CPU cores used.

Examples:

  • To set the number of CPU cores used by the R Goto linear algebra library:
Sys.setenv(GOTO_NUM_THREADS=5)

  • To run Matlab on a single CPU core:
matlab -singleCompThread

For assistance with setting CPU core usage for other applications, please contact HMDC Support.

In both batch and interactive cluster computing, available cluster resources may be lower during periods of intense utilization.