RCE Documentation

We designed our RCE cluster (a large, powerful pool of computers) around open standards for reliability, scalability, extensibility, and interoperability. We use hardware from major vendors and a standard, enterprise-grade Linux distribution customized to address the specific needs of our users. Our infrastructure is designed to provide the greatest possible range of options to you, rather than obliging you to restrict yourself to a narrow range of tools and methodologies. We provide a stable platform on which a wide range of technologies can be deployed.

Our computing clusters consist of two main pools of resources:

Batch processing is intended for long-running processes that are CPU intensive and able to run in parallel. Batch servers enables users to perform multiple commands and functions without waiting for results from one set of instructions before beginning another, and to execute these processes without being present.

Interactive servers are intended for large processes that are memory intensive. Our interactive cluster allows users to view and engage with their jobs in real time.

Both batch and interactive servers at HMDC run on a high throughput cluster, based on HTCondor, on which users can perform extensive, time-consuming calculations without the technical limitations imposed by a typical workstation. Our computing clusters use parallel processing to enable faster execution of computation-intensive tasks. Many computing tasks can benefit from implementation in a parallel processing form. The cluster is extremely useful for the following applications:

Jobs that run for a long time: You can submit a batch processing job that executes for days or weeks and does not tie up your RCE session during that time.

Jobs that are too big to run on your desktop: You can submit jobs that requires more infrastructure than your workstation provides. For example, you could use a dataset that is larger in size than the memory on your workstation.

Groups of dozens or hundreds of jobs that are similar: You can submit batch processing that entails multiple uses of the same program with different parameters or input data. Examples of these types of submission are simulations, sensitivity analysis, or parameterization studies.

Access to our computing clusters is available to all RCE users.

Accessing the RCE

You can access HMDC-managed systems from any modern workstation or laptop with a high-speed connection. The RCE provides a familiar, consistent user experience to all researchers; our remote desktop environment is built on CentOS Linux and Gnome. This allows sessions to be suspended and resumed at will: you can begin a session from one workstation, suspend it, move to another system, and resume the previous session, all with no disruption to your environment. If you work in a command line environment, and want the highest throughput without a graphical interface, the RCE is also accessible via a SSH connection.

Remote access provides three categories of services:

  • Research Environment (graphical desktop)
  • Secure Shell (command-line tools)
  • File Access (home directory)

This guide assumes you already have an RCE account. If you do not, please contact us at support@help.hmdc.harvard.edu to request one.

To connect to the RCE, you need a hostname and a port number. The hostname was given to you in your account confirmation email. Normally it will be rce6.hmdc.harvard.edu. Kennedy School researchers should use kennedy.fas.harvard.edu. If you're unsure which one to use, please contact us. The port number will always be 22.

If you would like to skip entering your password each time you log in, the RCE supports using public key authentication. Please see the FAQ, How do I use public key authentication? for instructions on creating a public key.

Graphical Desktop

To connect to the RCE remote desktop, a NX client is required. Currently we are using OpenNX for Linux and Mac systems, and NoMachine 3.5 for Windows systems.

Linux

There are different instructions depending on your distribution. Review the OpenNX download list first! The following instructions assume you are using Ubuntu!

  1. Open the Terminal application.

  2. Check your version: cat /etc/issue

  3. Import the key(replace the distro version with your own):
    wget -P ~/ http://download.opensuse.org/repositories/home:/felfert/xUbuntu_12.04/Release.key ; sudo apt-key add ~/Release.key

  4. Become "root" to add the new repository: sudo su -

  5. Create the OpenNX repository (replace the distro version with your own):
    echo "deb http://download.opensuse.org/repositories/home:/felfert/xUbuntu_12.04 ./" > /etc/apt/sources.list.d/opennx.list ; apt-get update ; exit
    Technical note: the above is three commands. The first creates a new file with the "deb..." line, then refreshes the list of available packages, and exits the "root" account.

  6. Install the client: sudo apt-get install opennx

  7. Fix a library issue: sudo ln -s /usr/lib/opennx/lib /usr/lib/opennx/lib64

  8. You can find OpenNX in your application list now.

    To invoke opennx from the command line, you need to modify your path. Edit ~/.profile and add this line to the bottom: PATH=$PATH:/usr/lib/opennx/bin

  9. Follow the instructions for Mac OS X to setup the client.

Mac OS X

Newer versions of OS X include Gatekeeper, a set of options to restrict unknown applications from being installed on the system. This can sometimes cause warnings, or prevent installation altogether. Please see the Apple KB article for more information.
OS X 10.6 and later no longer includes X11, which is required for NX clients. Please download and install XQuartz instead.

  1. To begin, download and install OpenNX. Then run the Connection Wizard.

  2. Name the Session something identifiable, e.g. "RCE6" if you are migrating from the old RCE. The Port will always be 22. Move the slider to "LAN". The Host was provided in your confirmation email. It will most likely be rce6.hmdc.harvard.edu or kennedy.fas.harvard.edu. Please verify this with HMDC Support if you are unsure.

  3. Your deskop selection needs to be "Unix" : "GNOME" in order to connect. You are welcome to leave the remote desktop resolution as "Available Area", but you may find "1024x768" a bit more manageable to start with.

  4. Leave the "Enable SSL encryption of all traffic" checkbox enabled.

  5. Creating a desktop shortcut is optional. All sessions are available in a drop-down menu from the main OpenNX application.

  6. The Connection Wizard may "crash" upon completion, but it will still save your new session.

Windows

  1. To begin, download and install NoMachine 3.5. Then run the Connection Wizard.

  2. Name the Session something identifiable, e.g. "RCE6" if you are migrating from the old RCE. The Port will always be 22. Move the slider to "LAN". The Host was provided in your confirmation email. It will most likely be rce6.hmdc.harvard.edu or kennedy.fas.harvard.edu. Please verify this with HMDC Support if you are unsure.

  3. Your deskop selection needs to be "Unix" : "GNOME" in order to connect. You are welcome to leave the remote desktop resolution as "Available Area", but you may find "1024x768" a bit more manageable to start with.

  4. Leave the "Enable SSL encryption of all traffic" checkbox enabled.

  5. Creating a desktop shortcut is optional. All sessions are available in a drop-down menu from the NoMachine application.

SSH Access

  1. On Mac and Linux, this is the built-in Terminal application.
    1. Enter ssh username@rce6.hmdc.harvard.edu to connect.
    2. A message will appear asking you to confirm the server's host key. (Accept the host key.)
    3. You are prompted to enter your password (assuming you used your user name above).
  2. On Windows, download PuTTY (putty.exe) and run it. (PuTTY does not have an installation; it is a stand-alone executable.)
    1. Enter your Host Name, leave the Port as is (22), the Connection type is SSH.
    2. Create a name for the Saved Session and click "Save".

    3. Click "Open" to connect.
    4. A message will appear asking you to confirm the server's host key. (Accept the host key.)
    5. You are prompted to enter your user name and password.
  3. You are now at an RCE command prompt. To exit type: exit

Transferring Files

A SFTP client is required to transfer files to and from the RCE. We recommend FileZilla for all systems.

  1. Download and install FileZilla.

  2. Run FileZilla on your local computer.

  3. Select File → Site Manager to input a new connection.

  4. Click the New Site button. Enter the information below.

  5. Replace rce6.hmdc.harvard.edu with the name of the server you were given in your confirmation email. The connection name may be any name you wish. Be sure to use your own username.

  6. FileZilla will ask to confirm the connection. You may proceed.

  7. Drag files across the panels to upload and download to your home directory. Remember that you only have 500MB, so large files should be placed in your project space!

Moving to RCE6

The RCE6 is accessible via a desktop client and SSH (command line). For Mac and Linux, we suggest using OpenNX, and NoMachine 3.5 on Windows. (Please contact support@help.hmdc.harvard.edu if you require the NoMachine 3.5 client.)

There are three considerations before connecting to RCE6:

  1. The URL is rce6.hmdc.harvard.edu to differentiate it while RCE5 is still in production.

  2. Complete all jobs on RCE5, and terminate your session! After moving to RCE6, you will not be able to reconnect to RCE5.

  3. The SSH keys for RCE5 should be removed in order to prevent problems in the future.
    Technical note: SSH keys can be thought of as a digital fingerprint. When RCE5 is retired, rce.hmdc.harvard.edu will point to RCE6. If you still have the old SSH key by that time, your client won't connect as the key won't match the server.

Note to Kennedy School Users

The below instructions apply to the Kennedy RCE server as well. In addition to rce.hmdc.harvard.edu, please repeat the process for kennedy.fas.harvard.edu and login.hmdc.harvard.edu. The RCE6 version for Kennedy servers is kennedy.fas.harvard.edu (login.hmdc.harvard.edu is now retired).

Clearing the RCE5 SSH key

SSH keys are all kept in one file (known_hosts). If you connect to other services besides the RCE (e.g. Odyssey cluster, SFTP for uploading/downloading files, etc.) you will see other keys in this file. Deleting them won't harm anything, but may cause your client to alert you to a new key.

Mac / Linux

  1. Mac: open ApplicationsUtilitiesTerminal.
    Linux: the Terminal application will vary depending on your distro.

  2. Open the file in the text editor "vi" or "vim": vi ~/.ssh/known_hosts

  3. In vi search for the RCE5 entry using /rce.hmdc and hitting 'return'.
     

  4. If an entry is found, the cursor will automatically be placed on that line.

  5. To delete the key (assuming your cursor is on the proper line) hit the d key twice, as if it where a double mouse click. It removes the entire line.

  6. Save and exit vi by typing :wq and hitting 'return'.

Windows

  1. Open Start menu[username] (located by default immediately under your user icon).
    You can also open Windows Explorer (from the taskbar/dock at the bottom of the screen) and navigate to C:/Users/[username].

  2. Open the folder .ssh. By default it is not a hidden folder.

  3. Double-click on the file known_hosts to open it. You are prompted with the Open with window. Select WordPad and click "OK".

  4. Highlight and delete the rce.hmdc.harvard.edu key.

  5. Save and close the file.

  • delete-sshkey-mac-01.png
  • delete-sshkey-mac-02.png
  • delete-sshkey-mac-03.png
  • delete-sshkey-mac-04.png
  • delete-sshkey-windows-01.png
  • delete-sshkey-windows-02.png
  • delete-sshkey-windows-03.png

Working in the RCE

This guide provides information about basic features and functions that might be useful when you begin working within the RCE.

RCE Basics

Desktop

screen_shot_2013-11-01_at_11.45.13_am.pngWhen you open the RCE, the desktop is displayed in a window. The open area in the RCE window, the icons on it, the toolbar and toolbar icons, and the workspaces together comprise the RCE. The term desktop sometimes refers to the open space in the RCE window, but this guide refers to that area as the workspace.

Mouse

RCE documentation often assumes use of a thee-button mouse. The buttons sometimes are named left-mouse, middle-mouse, and right-mouse. If you use a two-button mouse, you can emulate middle-mouse by pressing left-mouse and right-mouse simultaneously. If you use a wheel mouse, the wheel functions as middle-mouse. Functions available for each mouse button are as follows:

  • Left-mouse - Select text, select items, drag items, activate items
  • Middle-mouse - Paste text, move items, move windows to the back
  • Right-mouse - Open a context menu for an item (if a menu applies)

Note: You can switch button assignments if you are left-handed. In the RCE, choose SystemsPreferencesMouseGeneral.

Workspaces

A workspace is a distinct and separate area in the RCE, which provides a convenient tool for organizing work in progress. You can open more than one application in the default workspace, or you can open applications in different workspaces and move from one workspace to the other. Click the gray tabs in the lower-right corner of the RCE to move among your workspaces. To move open applications between workspaces, drag the application's icon from one workspace tab to another.

Directories

- jsmith
|- bin (directory for user built packages)
|- cvswork (the CVS working directory)
|- Desktop
|- lib (directory for user built packages)
|- man (directory for user built packages)
|- printjobs (documents printed to a PDF file)
|- public_html (personal web hosting by request only)
|- pylib (custom Python libraries)
|- shared_space (symbolic links to project and web spaces)

All home directories are kept on storage separate from the RCE and cluster, so that you can access files no matter which server you are logged in to.

  • Your home directory is located at ~ or /nfs/home/J/jsmith
  • Project space is located at /nfs/projects/m/my_tape_backup_project/ or /nfs/projects_nobackup/m/my_no_tape_backup_project/
  • Shortcuts to your project space are located at ~/shared_space/my_project_name/
  • Backups are located in every directory, and kept in hidden directories named .snapshot
  • screen_shot_2013-11-01_at_11.45.13_am.png

Projects & Shared Space

Making files group writeable in your project space

If you create files in a project space, the default is to create files that your whole project group can read, but only you can modify. We run an automatic nightly process to ensure that whatever permissions you have on files in your project space, they are also granted to your project group. Any project files you can modify will become group-modifiable, and any project files you can execute will become group-executable.

In order to grant group writeable permissions immediately (to other members of your shared_space project), please do the following:

  1. Open a terminal window from ApplicationsAccessoriesTerminal
  2. Determine which project directory you wish to modify: ls ~/shared_space/
  3. Run the script: fixGroupPerm.sh
  4. You will be prompted for the project directory name. This is a directory located under ~/shared_space/, which you obtained in step 2.

Running this script will grant writeable permissions to all files under this location. Please give this script only one argument (i.e. one project space at a time).

You will need to run this command after each time you create files in your shared project space in order to grant your collaborators the same level of access you have. If you are running a script to create the files (e.g. R code, or a Stata .do file), it may be simplest to add a call to the fixGroupPerms.sh script at the end of your code.

Configure your default file sharing preferences

If most or all of your work in the RCE is done in collaborative project spaces, you may want to change the default file creation mode (i.e. the file access permissions) for your RCE account so that all files you create can be modified by members of the group which owns them. If you decide to pursue this option, two important caveats apply:

  1. This change only affects the default access mode assigned to newly-created files. The application creating the file can override the default to set a more restrictive access mode on the files.
  2. The access mode on the files/directories you create apply to whichever group owns the file/directory, which may or may not be your collaborative group.

Other notes regarding file permissions:

  • You can run the command ls -l on a file to view its ownership. Group ownership will be displayed in the fourth column.
  • If you are creating files under ~/shared_space, then they should automatically be created with ownership by your collaborative group.
  • If you create files in your home directory, however, they will be owned by the "users" group and the access permissions (e.g. allow read/write/access by the group) will apply to all RCE users.
  • Restricting access to your home directory itself can help limit the impact of such an exposure.

If you understand and accept these caveats, you can proceed to change your default file creation mode for your RCE account by doing the following:

  1. Navigate to ApplicationsRCE UtilitiesFile Sharing Config Helper
  2. Choose the option for 002.
  3. This can also be run from a terminal window with the command fileSharing.
  4. Terminate your current RCE session and start a new RCE session.

Installing and Using Extensions

Many programming languages and applications have extensions (aka add-ons, modules or packages) that can be installed in a users home directory.  In this section we describe how to install extensions for a number of software.

Python

Do you need to use a Python library you think may not be installed on RCE6? As an RCE user, you have the ability to install Python modules locally to your home directory and use them in your projects.

Before going ahead and installing that Python module you need, determine whether that module is already installed globally.

List installed Python Modules

1. Determine The Required Python Version: RCE6 supports Python 2.6, 2.7, and 3.3. You need to determine which version of Python you'd like to develop with. Remember, some Python modules are version specific.

2. Access the Console: If you know how to access the console in the RCE via NX or SSH, skip this step, otherwise, please read #HMDCBasics.

2. Search for the Python Module: Each version of Python installed on the RCE maintains its own module path. Determine whether a Python module is installed for a specific version of Python by using pip, Python's package management utility, to list packages installed for a desired version.

Python 2.6

pip list | grep $MODULE

Example:

$ pip list | grep simplejson
simplejson (2.0.9)

Python 2.7

pip27 list | grep $MODULE

Example:

$ pip27 list | grep simplejson
$

Python 3.3

pip33 list | grep $MODULE

Example:

$ pip33 list | grep simplejson
$

 

3. Success? If your module is installed for the Python version you need, stop right there. You're done. In the examples above, a user determines that the module simplejson is only installed for Python 2.6.

Installing a Python Module

1. Install the Module: If the module is not installed or installed for the wrong version of Python, install the module locally to your home directory. You can install the module for multiple versions of Python if your project requires cross-compatibility.

Python 2.6

pip install $MODULE --user

Example:

$ pip install simplejson --user

Python 2.7

pip27 install $MODULE --user

Example:

$ pip27 install simplejson --user

Python 3.3

pip33 install $MODULE --user

Example:

$ pip33 install simplejson --user

 

Troubleshooting

1. Can't Find Your Module? If you're unable to locate your module using pip, maybe you're searching for the wrong module name. If you've decided you needed to install a module because, for example, import simplejson, did not work from a Python interactive console, you may have the wrong name. Often Python class names differ from Python module names. Try using the pip search feature. When searching the Python module repository using pip, try using a more generic derivative of the module name. Search for json, instead of simplejson.

Python 2.6

pip search $MODULE

Example:

$ pip search json

Python 2.7

pip27 search $MODULE

Example:

$ pip27 search json

Python 3.3

pip33 search $MODULE

Example:

$ pip33 search json

 

2. Still No Dice? Try searching the PyPI repository, the official Python module repository. Use google. Very rarely, some modules require that you manually compile Python packages using setup.py. Pylucene is a popular Python module which provides a Python interface to Lucene which is not in the PyPi repository. 

R

Installing an R Package

The RCE provides almost all stable libraries maintained in the Comprehensive R Archive Network (CRAN), and others. For a full list refer to "Which R packages are available?"

If you locate an R library that is not available in the RCE, contact us and request we install that library. When the we install a library, it is available for use by all RCE users.

If you would like to install a library separately for your own personal use, follow these instructions:

  1. In R, type library(<package_name>).

    For example, to install R Commander, type the following:

    > library(Rcmdr)

    R prompts you with a warning if the package that you chose to install uses other packages that are not installed already.

  2. To install missing packages on which your target package depends:

    1. Click Yes to continue.

      The Install Missing Packages window is displayed.

    2. Click OK to continue.

      R prompts you to select a mirror site from which to download the packages' sources.

  3. Select a site from which to download the sources, and then click OK.

    The dependent packages and your target package are now installed. If it is an executable, the function is executed.

The sequence of activity that occurs in R by performing this installation is:

> library(Rcmdr)
Loading required package: tcltk
Loading Tcl/Tk interface ... done
Loading required package: car
--- Please select a CRAN mirror for use in this session ---
trying URL 'http://cran.us.r-project.org/src/contrib/RODBC_1.1-7.tar.gz'
Content type 'application/x-tar' length 79624 bytes
opened URL

downloaded 77Kb

* Installing *source* package 'RODBC' ...
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking how to run the C preprocessor... gcc -E
checking for egrep... grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
...

The downloaded packages are in
/tmp/RtmpjJMJkz/downloaded_packages

Rcmdr Version 1.2-0

Running Batch Jobs

This section describes the batch processing environment in our facilities.

What is Batch Processing?

Batch processing is a procedure by which you submit a program for delayed execution. Batch processing enables you to perform multiple commands and functions without waiting for results from one command to begin another, and to execute these processes without your attendance. The terms process and job are interchangeable.

The batch processing system at HMDC runs on a high throughput cluster on which you can perform extensive, time-consuming calculations without the technical limitations imposed by a typical workstation.

Why Use Batch Processing?

HMDC provides a large, powerful pool of computers that are available for you to use to conduct research. This pool is extremely useful for the following applications:

  • Jobs that run for a long time - You can submit a batch processing job that executes for days or weeks and does not tie up your RCE session during that time.

  • Jobs that are too big to run on your desktop - You can submit batch processing that requires more infrastructure than your workstation provides. For example, you could use a dataset that is larger in size than the memory on your workstation.

  • Groups of dozens or hundreds of jobs that are similar - You can submit batch processing that entails multiple uses of the same program with different parameters or input data. Examples of these types of submission are simulations, sensitivity analysis, or parameterization studies.

Condor System for Batch Processing

The Condor system enables you to submit a program for execution as batch processing, which then does not require your attention until processing is complete. The Condor project website is located at the following URL:

http://www.cs.wisc.edu/condor/

To view the user manual for this software, go to the following URL and choose a viewing option:

http://www.cs.wisc.edu/condor/manual/

Condor System Components and Terminology

A Condor system comprises a central manager and a pool. A Condor central manager machine manages the execution of all jobs that you submit as batch processing. An associated pool of Condor machines associated with that central manager execute individual processes based on policies defined for each pool member. If a computing installation has multiple Condor pools or additional machine clusters dedicated to Condor system use, these pools and clusters can be associated as a flock.

Listed below are some common Condor terms and references, which are unique to Condor:

  • Cluster - A group of jobs or processes submitted together to Condor for batch processing is known as a cluster. Each job has a unique job identifier in a cluster, but shares a common cluster identifier.

  • Pool - A Condor pool comprises a single machine serving as a central manager, and an arbitrary number of machines that have joined the pool. Simply put, the pool is a collection of resources (machines) and resource requests (jobs).

  • Jobs - In a Condor system, jobs are unique processes submitted to a pool for execution and are tracked with a unique process ID number.

  • Flock - A Condor flock is a collection of Condor pools and clusters associated for managing jobs and clusters with varying priorities. A Condor flock functions in the same manner as a pool, but provides greater processing power.

When you submit batch processing to the Condor system, you use a submit description file (or submit file) to describe your jobs. This file results in aClassAd for each job, which defines requirements and preferences for running that job. Each pool machine has a description of what job requirements and preferences that machine can run, called the machine ClassAd. The central manager matches job ClassAds with pool machine ClassAds to select the machine on which to execute a job.

Process Identification Numbers

For Condor batch processing, there are two identification numbers that are important to you:

  • Cluster number - The cluster number represents each set of executable jobs submitted to the Condor system. It is a cluster of jobs, or processes. A cluster can consist of a single job.

  • Process number - The process number represents each individual job (process) within a single cluster. Process numbers for a cluster always start at zero.

Each single job in a cluster is assigned a process identification number, called the process ID or job ID. This ID consists of both cluster and process number in the form <cluster>.<process>.

For example, if you submit a batch that consists of a single job, and your batch submission to the Condor queue is assigned cluster number 20, then your process ID is 20.0. If you submit a batch that consists of fifteen jobs that all use the same executable, and your batch submission to the Condor queue is assigned cluster number 8, then your process IDs range from 8.0 to 8.14.

Batch Basics

This short slideshow will explain the basics of Batch Processing, introducing:

  • Terminology and commands
  • What is needed for a job
  • How jobs are scheduled
rce_workshop_-_batch_processing.pdf73.29 KB

Batch workflow

The workflow to submit batch processing to the Condor system is as follows:

  1. Create a directory in which to submit jobs to the Condor system.

    Make sure that the directory and files with which you plan to work are readable and writable by other users, which include Condor processes.

  2. Choose an execution environment, called a universe, for your jobs.

    At HMDC, you always use the vanilla universe. This execution environment supports processing of individual serial jobs, but has few other restrictions on the types of jobs that you can execute.

  3. Make your jobs batch ready.

    Batch processing runs in the background, meaning that you cannot input to your executable interactively. You must create a program or script that reads in your inputs from a file, and writes out your outputs to another file.

    You also must identify the full path and executable source to use for your Condor cluster. The default executable for the condor_submit_util script is the R language. In the RCE, the path and executable source for this language is /usr/bin/R.

  4. If you choose to use the condor_submit_util script to create the submit description file (or submit file) and submit your jobs to the Condor system for batch processing automatically, skip to step the next step.

    If you choose to submit your batch processing to the Condor system manually, create a submit file.

    A submit file is a plain-text file that describes a batch of jobs for the Condor software. This file contains the following descriptors:

    • Environment (vanilla)

    • Executable program path and file name

    • Program arguments

    • Input and output file names

    • Log and error file names

  5. Execute the condor_submit_util command to write the submit file and submit your program automatically to the Condor job queue.

    If you chose to write your own submit file, execute the condor_submit <submit file>.submit command to submit your jobs to the queue.

    Condor then checks the submit file for errors, creates a ClassAd object and the object attributes for that cluster, and then places this object in the queue for processing.

Setting up your batch environment

Before you submit any programs for batch processing, perform the following:

Create a directory in which to submit your batch processing, and then change to that directory.

Make sure that you set permission to enable the Condor software to read from and write to the directory and its contents.

For example, type the following:

> mkdir condor
> cd condor

You can and request that a project directory be set up for you to use for batch processing. If you perform you batch processing within your home directory, the space used for your data and program files can consume much of your allotted resources.

Determining batch parameters

Before you submit dwarves.pl for batch processing, you need to determine the parameters for this submission. To use the Condor system for batch processing, you must define these parameters by assigning values to submit file arguments, which describe the jobs that you choose to submit for processing.

In the RCE, you always use the vanilla environment.

To determine the remaining submit file arguments, answer the following questions:

  • What is the executable path and file name?

    For any shell script or statistical application installed in the RCE, the condor_submit_util script can determine the full path for the executable. At the script prompt, you type in the name of your script, program, or application. The default executable in the RCE is the R language, and the path and executable name are /usr/bin/R.

  • Do you have any arguments to supply to the executable?

    Arguments are parameters that you specify for your executable. For example, the default arguments in the condor_submit_util script are --no-save and --vanilla, which specify how to launch and exit the R program. The argument --no-save specifies not to save the R workspace at exit. The argument --vanilla instructs R to not read any user or site profiles or restored data at start up and to not save data files at exit.

  • What are the input file names?

    If you are using the R program, your input file(s) will be whatever R script you want to execute.

  • What do you plan to name the output files?

    A general rule for batch processing is that you have one output file for each input file. Therefore, if you have seven input files, you expect to have seven output files after processing is complete. A useful practice is to correlate the names of input and output files.

  • How many times do you need to execute this script or program?

    A general rule for batch processing is that you execute your job one time for each input file that you use.

Batch example

To set up our batch processing example for use, you first download the source material, and then determine your batch processing parameters.

Downloading the Source Files

To download the source files for use in this case study:

  1. Log in to your RCE session.

  2. Open this page in a web browser in your RCE session.

  3. Click the file condor_example.tar.gz to download it to your desktop

    You are prompted to save or open the file (Figure 1).

    Figure 1. Download dwarves Case Study Sources

  4. Click the Save to Disk option, and then click OK to save the tar file to your desktop.

    The file is downloaded to your desktop, and the Downloads window is displayed (Figure 2), listing files downloaded in your RCE session.

    Figure 2. Downloads Window

  5. Open a terminal window, and unzip the tar file in the Desktop directory. Type:

    > tar zxvf Desktop/condor_example.tar.gz
    condor_example/
    condor_example/condor_submit_util/
    condor_example/bootstrap.R

You now have a directory named condor_example in your home directory, which contains the files necessary to run our example.

Make batch ready programs

Before you submit a program for batch processing, write your program files and compile the programs if necessary. Write the input files for your submission. Then, place your program files in your working directory.

If you need assistance with making your program batch ready, .

If you need assistance with statistical questions and not technical issues, contact the HMDC Data Fellows by email at dataquest@help.hmdc.harvard.edu.

Submit with batch script

After you set up your working directory and define your batch processing parameters, you can submit your script and input files for processing. You can use the HMDC script to set up your submit file and submit your batch or create your submit file manually and submit to the cluster.

To build a submit file automatically and submit your program for batch processing, you can use the Automated Condor Submission script in two modes: interactive or command line.

Note: If you do not specify any options when you use the script, the script enters interactive mode automatically. Also, if you do not specify required options when you use the script in command-line mode, the script enters interactive mode automatically, or it reports an error and returns you to the command-line prompt.

Working interactively with script

When you use the script in interactive mode, you can press the Return key to accept default values. Default values are specified in the prompts inside square brackets, and appear at the end of the prompt.

To use the condor_submit_util script in interactive mode:

  1. Execute the condor_submit_util command.

    Type the following at the command prompt in your Condor working directory:

    > condor_submit_util
    *** No arguments specified, defaulting to interactive mode...
    *** Entering interactive mode.
    *** Press return to use default value.
    *** Some options allow the use of '--' to unset the value.
  2. The script first prompts you to define the executable program that you choose to submit for batch processing, and then requests the list of arguments to provide to that executable:

    Enter executable to submit [/usr/bin/R]: <executable name>
    Enter arguments to /usr/bin/R [--no-save --vanilla]: <arguments>

    The default argument --no-save specifies not to save the R workspace at exit. The default argument --vanilla instructs R to not read any user or site profiles or restored data at start up and to not save data files at exit.

    If you do not have any arguments to apply to your executable, then type -- to supply no arguments.

  3. Next, the script prompts you to provide a name or pattern for the input, output, log, and error files for this Condor cluster submission. You can include a relative path in these entries, if you choose:

    Enter input file base [in]: <input path and file name or pattern>
    Enter output file base [out]: <output path and file name or pattern>
    Enter log file base [log]: <log path and file name or pattern>
    Enter error file base [error]: <error path and file name or pattern>
  4. After specifying the files, the script prompts you to define the number of iterations that you choose to execute your program for processing:

    Enter number of iterations [10]: <integer>
  5. The system creates the submit file for this batch process using your responses to script prompts.

    An example submit file is shown here. To view the contents of your submit file, include the option -v (verbose) when you launch the condor_submit_util script:

    *** creating submit file '<login account name>-<date-time>.submit'

    Universe = vanilla
    Executable = /usr/bin/R
    Arguments = --no-save --vanilla
    when_to_transfer_output = ON_EXIT_OR_EVICT
    transfer_output_files = <output file>

    input = <input file>
    output = <output file>
    error = <error file>
    Log = <log file>
    Queue <integer>
  6. If you use the verbose option, the script prompts you to confirm that the submit file is correct. To continue, press Return or type y.

    Condor checks the submit file for errors, creates the ClassAd object for your submission, and adds that object to the end of the queue for processing. The script lists messages that report this progress in your terminal window, and includes the cluster number assigned to the batch process. For example:

    Is this correct? (Enter y or n) [yes]: y
    ] submitting job to condor...
    ] removing submit file '<login account name>-<date-time>'
    *** Job successfully submitted to cluster <cluster ID>.
  7. Finally, the script prompts whether you choose to receive email when execution of your batch processing is complete. Press Return or type y to receive email, or type n to not send email and exit the script.

    If you choose to receive email, before exiting, the script prompts you to enter the email address to which you choose to send the notification. The default email address for notification is your email account on the server on which you launched the script. For example:

    Would you like to be notified when your jobs complete? (Enter y or n)
    [yes]: y
    Please enter your email address [<your email account on this server>]:
    *** creating watch file '/nfs/fs1/projects/condor_watch/<Condor machine>.<batch cluster>.<your email>'
  8. View your job queue to ensure that your batch processing begins execution successfully.

    See for complete details about checking the queue. An example is:

    > condor_q

    -- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> : vnc.hmdc.harvard.edu
    IDOWNER SUBMITTED RUN_TIME STPRISIZECMD
    9.0arose10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl
    9.1arose10/4 11:02 0+00:00:00 R0 9.8 dwarves.pl
    9.2arose10/4 11:02 0+00:00:00 I 0 9.8 dwarves.pl
    9.3arose10/4 11:02 0+00:00:00 R 0 9.8 dwarves.pl

    4 jobs; 1 idle, 3 running, 0 held

Working with command line submit

When you use the script in command-line mode, you must specify all required options or the script does not execute. For example, the default number of iterations for the script is 10. If you do not have 10 input files in your working directory and you do not enter the option to specify the correct number of iterations that you plan to perform, the script does not execute and returns a message similar to the following:

> condor_submit_util -v
*** Fatal error; exiting script
*** Reason: could not find input file 'in.7'.

To use the condor_submit_util script in command-line mode:

  1. Execute the condor_submit_util command with the appropriate arguments. See for detailed information about script options.

    At a minimum, you must include the following options on the command line:

    • Executable program file name

    • Executable file arguments, or --noargs option

    • Input file, or --noinput option

    • Number of iterations, if you do not have 10 input files

    At a minimum, type the following at the command prompt from within your Condor working directory:

    > condor_submit_util -x <program> -a <arguments> -i <input files> 
  2. Condor creates a submit file and checks it for errors, creates the ClassAd object, and adds that object to the end of the queue for processing. The script supplies messages that report this progress, and includes the cluster number assigned to your Condor cluster. For example:

    > condor_submit_util -x <program> --noargs

    Submitting job(s)..........
    Logging submit event(s)..........
    10 job(s) submitted to cluster 24.

    If the script encounters a problem when creating the submit file, it enters interactive mode automatically and prompts you for the correct inputs.

  3. View your job queue to ensure that your batch processing begins execution.

    See for complete details about checking the queue.

Saving and Reusing a Submit File

When you use the script in command-line mode to submit a program for batch processing, include the option -k (keep) to save the submit file created by the utility.

You can edit and reuse that submit file to submit similar programs to the Condor queue for batch processing. You also can include Condor macros to further improve the usability of the file. See for detailed information about how to use Condor macros.

For example, if you plan to submit several iterations of a program for batch processing, you can use a single submit file for all iterations. In that submit file, you use the $(PROCESS) macro to specify unique input, output, error, and log files for each iteration.

Use of the $(PROCESS) macro requires that you develop a naming convention for files or subdirectories that includes the full range of process IDs for your iterations.

To use an existing submit file when you submit a batch process, you cannot use the script and must execute the condor_submit command instead. Type the following:

> condor_submit my.submit

Passing Arguments to the Program

You can pass arguments to the batch program using the --args flag in your submit file. For example, if you change the arguments line in your submit file to something like the following:

Arguments = --no-save --vanilla --args <arguments>

Then the contents of <arguments> will be passed in to the program as command-line arguments. The syntax for passing and handling these arguments differs depending on the statistics program in use.

Passing Arguments to R

To parse command-line arguments in R, use the following command in your R script:

args <- commandArgs(TRUE)

This puts the command-line arguments (the contents of <arguments>) into the variable args.

Script options

The Automated Condor Submission script makes the task of running jobs using the batch servers easier and more intuitive. The process it automates is described in . This script negotiates all job scheduling; it constructs the appropriate submit file for your job, and calls the condor_submit function. To use this utility you need a program to run. The format for using this script is:

condor_submit_util [OPTIONS]

In addition, the script can notify you when your job is done via email so you do not have to check the queue constantly using condor_q. In future releases, the script also will be able to keep usage data so administrators can track overall performance.

The script can be run in two ways, interactively or from the command line. When running interactively, the script prompts you for the values required to run the batch job. If you supply arguments on the command line, these arguments are used in addition to default values for any values you do not supply.

Options

  • -h, --help
    Print help page and exit.
  • -V, --version
    Print version information and exit.
  • -v, --verbose
    Show information about what goes on during script execution.
  • -I, --Interactive
    Enter interactive mode, in which the script prompts you for the required values.
  • -s, --submitfile FILE
    Specify the name of the created submit file (default is <user-name-datetime>.submit).
  • -k, --keep
    Do not delete the created submit file.
  • -N, --Notify
    Receive notification by email when jobs are complete.
  • -x, --executable FILE
    The executable for condor to run (default is /usr/bin/R).
  • -a, --arguments ARGS
    Any arguments you want to pass to the executable (should be quoted, default is "--no-save --vanilla").
  • -i, --input [FILE|PATT]
    Either an explicit file name or base name of input files to the executable (default is in).
  • -o, --output [PATT]
    Base name of output files for the executable (default is out).
  • -e, --error [PATT]
    Base name of error files for the executable (default is error).
  • -l, --log [PATT]
    Base name of log files for the executable (default is log).
  • -n, --iterations NUM
    Number of iterations to submit (default is 10).
  • -f, --force
    Overwrite any existing files.
  • --noinput
    Use no input file for executable.
  • --noargs
    Send no arguments to executable.

Examples

  1. You have a compiled executable (named foo) that takes a data set and does some analysis. You have five different data sets to run against (named data.0, data.1 ... data.4). You want to save the submit file and be notified when the job is done.

    condor_submit_util -x foo -i "data" -k -N
  2. You have an R program that has some random output. You want to run it 10 times to see the results.

    condor_submit_util -i random.R -n 10
  3. You have an R program that will take a long time to complete. You only need to run it once, but you want to be notified when it is done.

    condor_submit_util -i long.R -n 1 -N

Notes: For -o, -e, and -l, these options are considered base names for the implied files. The actual file names are created with a numerical extension tied to its condor process number (0 indexed). This means that if you execute condor_submit_util -o "out" -n 3, three output files named out.0, out.1, and out.2 are created.
Also, for -i, the script first checks to see if the name supplied is an actual file on disk, if not it uses the argument as a base name, similar to -o, -e, and -i.

    Option conventions

    Options for the condor_submit_util script are described in . For most options, there are two conventions that you can use to specify that option on the command line:

    • The -<letter> convention - Use this simple convention as a short cut.

      For example, the simple option to receive email notification when your batch processing is complete is -N.

    • The --<term> convention - Use this lengthy convention to make it easy to determine what option you use.

      For example, the lengthy option to receive email notification when your batch processing is complete is --Notify.

    Both conventions for specifying an option perform the same function. For example, to receive email notification when your batch processing is complete, the options -N and --Notify perform the same function.

    Pattern Arguments

    For file-related options, such as the output file name or the error file name, you can use a pattern-matching argument. For example, if you specify the option -i "run", Condor looks for an input file with the name run. If there is no file named run, Condor looks for a file name that begins with run., such as run.14.

    If there are multiple files with names that begin with the pattern that you specify, then for the first execution within a cluster, Condor uses the file with the name that matches first in alphanumeric order. For successive executions within a cluster, Condor uses the files with names that match successively in alphanumeric order.

    Submit to batch manually

    You use the command condor_submit to submit batch processing manually to the Condor system.

    In the RCE, you must include the attribute Universe = vanilla in every submit file. If you do not include this statement, Condor attempts to enable job-check pointing, which consumes the central manager resource.

    Perform the following to submit batch processing manually:

    1. Before you submit your program for batch processing, create a directory in which to run your submission, and then change to that directory. Make sure that you set permissions to enable the Condor software to read from and write to the directory and its contents.

      Also make sure that your program is batch ready.

    2. Create a submit file for your program.

      For information about how to create a submit file, see .

      Note: You can use the HMDC Automated Condor Submission script and include the -k option to create a submit file, and then edit and reuse that submit file for other submissions.

    3. Submit your program for batch processing.

      Type the following at the command prompt:

      > condor_submit <submit file>

      Condor then checks the submit file for errors, creates the ClassAd object, and places that object in the queue for processing. New jobs are added to the end of the queue. For example:

      > condor_submit hosttest1.submit

      Submitting job(s)..........
      Logging submit event(s)..........
      10 job(s) submitted to cluster 24.
    4. View your job queue (type condor_q) to ensure that execution begins. When your batch processing is complete, check your output for errors. Output from the example program is as follows:

      > cat out.* | grep -A 1 '^> system'
      > system("hostname -f")
      x1.hmdc.harvard.edu
      --
      > system("hostname -f")
      x2.hmdc.harvard.edu
      --
      > system("hostname -f")
      x3.hmdc.harvard.edu
      --
      > system("hostname -f")
      x1.hmdc.harvard.edu
      --
      > system("hostname -f")
      x2.hmdc.harvard.edu
      --
      > system("hostname -f")
      >x3.hmdc.harvard.edu
      --
      > system("hostname -f")
      x1.hmdc.harvard.edu
      --
      > system("hostname -f")
      x2.hmdc.harvard.edu
      --
      > system("hostname -f")
      x3.hmdc.harvard.edu
      --
      > system("hostname -f")
      x1.hmdc.harvard.edu

    Submit file basics

    You send input to the Condor system using a submit file, which is a text file of <attribute> = <value> pairs. The naming convention for a submit file is <file name>.submit. Before you submit any batch processing, you first set up a directory in which to work, and create the executable script or program that you choose to submit for processing.

    Basic attributes used in the submit file include the following:

    • Universe - At HMDC you specify the vanilla universe, which supports serial job processing. HMDC does not support use of other Condor environments.

    • Executable - Type the name of your program. In the job ClassAd, this becomes the Cmd value. The default value in the RCE for this attribute is the R program.

    • Arguments - Include any arguments required as parameters for your program. When your program is executed, the Condor software issues the string assigned to this attribute as a command-line argument. In the RCE, the default arguments for the R program are --no-save and --vanilla.

    • Input - Type the name of the file or the base name of multiple files that contain inputs for your executable program.

    • Output - Type the name of the file or the base name of multiple files in which Condor can place the output from your batch job.

    • Log - Type the name of the file or the base name of multiple files in which Condor can record information about your job's execution.

    • Error - Type the name of the file or the base name of multiple files in which Condor can record errors from your job.

    • Queue - The command queue instructs the Condor system to submit one set of program, attributes, and input file for processing. You use this command one time for each input file that you choose to submit.

    When you specify file-related attributes (executable, input, output, log, and error), either place those files within the directory from which you execute the Condor submission or include the relative path name of the files. See  for more information about using subdirectories.

    Required attributes

    There are three additional attributes that are required in your submit file when you use batch processing in the RCE. These attributes define when to write an output file, the name of the output file to write at that time, and the number of Condor machines to use when executing your batch process. For each of these attributes, use the following specific values in your submit file:

    when_to_transfer_output = ON_EXIT_OR_EVICT
    transfer_output_files = <your output file>

    Submission weights

    When you acquire a login account to the RCE, your account is assigned to a group. When you use the Automated Condor Submission script to submit batch processing, your jobs are assigned a weight (priority) based on the group to which your login account belongs. If you submit jobs manually for batch processing, your jobs might not have the same weight that they would have if you submit them by using the script.

    You can use the script to submit a job and include the option -k to keep a copy of the submit file. Then, you can edit and reuse this submit file to make sure that your job have the same weight when you submit them manually that they have when you submit them by using the script.

    Example submit file

    An example submit file with the minimum required arguments is as follows:

    > cat hosttest1.submit

    Universe = vanilla
    Executable = /usr/bin/R
    Arguments = --no-save --vanilla
    when_to_transfer_output = ON_EXIT_OR_EVICT
    transfer_output_files = out.$(PROCESS)

    input = <program>.R
    output = out.$(Process)
    error = error.$(Process)
    Log = log.$(Process)
    Queue 10

    This file instructs Condor to execute ten R jobs using one input program (<program>.R) and to write unique output files to the current directory.

    Working with submit files

    This section describes some common submit file attributes that you might find useful when submitting your batch processes. In particular, this section describes useful attributes for performing iterative executions of a program.

    Note: This section describes how to use input and output files for iterative jobs in a batch process. The same information applies to log and error files.

    Macros and Directories

    Macros are generic attributes that are replaced with specific values during execution of batch processing. Two useful predefined macros are $(Process) and $(Cluster), which return the process number or cluster number of a job.

    For example, you can use the $(Process) macro to submit batch processing that executes the same job numerous times and uses individual input files for each execution. You create unique input files before you submit your batch processing, and use a consistent naming convention for each file that includes the full range of process IDs for your iterations. This enables Condor to match the process ID with the name of a file (or directory).

    Another good use of the $(Process) macro combines the macro with the use of the initialdir attribute to perform iterative executions from within unique directories. The initialdir attribute gives individual job executions a directory for file input and output use. If you specify a path for this attribute, it is relative to the directory in which you execute the script or the condor_submit command. Note that the path to the executable is not related to the value of initialdir.

    Another macro enables you to use the dollar sign ($) as a literal character. For example, to include a dollar sign in a file name, use the macro $(DOLLAR) before the symbol in the file name.

    Specifying multiple executions

    When you write a submit file, you define the execution parameters for your Condor cluster processes, and then you specify the Queue command. This command instructs the Condor software to create a job cluster and place that cluster in the queue for execution.

    To instruct Condor to repeat the execution of a process, include the number of times you choose to repeat execution (the number of iterations) after the Queue command. Syntax for this command is Queue <integer>.

    For example, to execute the program RanVal.R 10 times use the following attributes:

    Executable = RanVal.R
    Queue 10

    Executing in unique directories

    You can instruct Condor to read input files from and write output files to more than one directory.

    To direct Condor to use individual directories for reading and writing files in an iterative process, first create the directories. Use a consistent naming convention for each directory, and include in the names the full range of process numbers that you plan to execute.

    For example, to execute a program four times and use individual directories for each of the four executions, create four directories. Use a naming convention that includes the full range of process IDs for four executions, 0 - 3. You can use any naming convention that you choose. In this example, you might name your directories dir0 - dir3.

    The submit file for this example contains attributes and commands that instruct the Condor system to perform one executable four times. The following attributes direct Condor to perform each of the program executions within individual directories:

    InitialDir = dir0 # This directory is used for job <cluster ID>.0
    Queue
    InitialDir = dir1 # This directory is used for job <cluster ID>.1
    Queue
    InitialDir = dir2 # This directory is used for job <cluster ID>.2
    Queue
    InitialDir = dir3 # This directory is used for job <cluster ID>.3
    Queue

    A shorter way to do the same thing is to use the following:

    InitialDir = dir$(Process)
    Queue 4

    You then can place an input file for individual executions within each directory.

    Combining Unique Input and Output Files

    You can instruct the Condor software to read an individual input file for each iteration of a process, and then to write an individual output file for each iteration. You can organize these input and output files by placing them within one directory, or you can place them within individual directories for each iteration. For example, to execute an R program (named RanVal.R) 15 times and use individual directories, input files, and output files for each execution: Create fifteen directories. Use a naming convention that includes the full range of process IDs for 15 executions, 0 - 14. For this example, name your directories dir_0 - dir_14. Create individual input files for each execution that you plan to perform. Use a naming convention that includes the full range of process IDs for 15 executions, 0 - 14. For this example, name your files in.0 - in.14. Place each input file in the associated directory. That is, place in.0 in the directory dir_0, place in.1 in the directory in.1, and so on. Your input files look like this: //dir_0/ in.0 //dir_1/ in.1 ... //dir_14/ in.14 Instruct the Condor system to write unique output files for each iteration of the program. For this example, use the output file name out.$(PROCESS). Instruct the Condor system to execute your program 15 times. The attributes in your submit file look like this: Executable = RanVal.R InitialDir = dir_$(PROCESS) input = in.$(PROCESS) output = out.$(PROCESS) Queue 15 The results of execution of your batch process are as follows: //dir_0/ in.0 out.0 //dir_1/ in.1 out.1 ... //dir_14/ in.14 out.14

    Defining R Component Paths

    In R, /usr/lib64/R/library is the location of installed libraries and packages. However, if R packages and libraries are installed manually by using the command install.packages() or R CMD build source.tar.gz, the path for these newly installed components are not known to Condor unless specified.

    The following are common R utilities and commands used to find and specify absolute and relative paths for unique components:

    • To identify the default directory from which you read input and to which you write output, use the command getwd(). For example:

      > getwd()
      [1] "/nfs/home/S/sspade"

      You also can write to a specific directory using the command sink(<path and file name>).

    • To set the default or working directory from which you read input and to which you write output, use the command setwd(). Insert the full path between the parentheses. For example:

      > setwd("/nfs/home/S/sspade")
    • When installing update packages or libraries from sources other than HMDC's Comprehensive R Archive Network (CRAN) repository, you must specify an absolute path for new components if they do not reside in the default working directory.

      For example, to load a library installed in your home directory type the following:

      > library(experiment, lib.loc="/nfs/home/S/sspade/.R/library-x886_64")
    • To save results to your home directory, use the command save.image(<path and file name>). For example:

      > save.image("/nfs/home/S/sspade/condor-temp/condorprac.Rdata")

    Directing Output to Unique Files

    You can instruct Condor to write unique output files for iterative processes, or to write output files in more than one directory, or both.

    To direct Condor to write output from your batch processing to specific directories, first create the directories. Use a consistent naming convention for each directory, and include in the names the full range of process numbers that you plan to execute. Then specify an output file name for each process.

    For example, you first create fifteen directories named dir_0 - dir_14. You instruct the Condor system to execute your program 15 times (using the Queue 15 command). You instruct Condor to create individual output files for each iteration of the executable and name those files out.<process ID>. The Condor system then places those files in the directory that is assigned the name that includes the same <process ID>.

    For this example, your submit file includes the following attributes:

    InitialDir = dir_$(PROCESS)
    output = out.$(PROCESS)
    Queue 15

    Your results look like this:

    /<working directory>/dir_0/
    out.0
    /<working directory>/dir_1/
    out.1
    ...
    /<working directory>/dir_14/
    out.14

    Executing from unique files

    You can instruct Condor to read unique input files for each execution in an iterative process. To direct Condor to read unique input files for individual iterations within your cluster, use unique input file names, or unique directory names, or both.

    Use a consistent naming convention for your files or directories, and include in the names the full range of process numbers that you plan to execute.

    Retrieving One File from Unique Directories

    To read an input file from individual directories for each execution during batch processing, you can use the same input file name for every execution. Because the input files are located in unique directories, you can use the same file name but include unique content within the file.

    For this example, you might name your input file infile. You create 500 copies of this file, and place one copy in each directory. After you place a copy of infile in a directory, you can edit the content of that copy to contain the unique inputs for that iteration. Using the same directory names from the previous section, your input files now look like this:

    /<working directory>/run_0/
    infile
    /<working directory>/run_1/
    infile
    ...
    /<working directory>/run_499/
    infile

    Using individual directories and one file name for each iteration, the attributes in your submit file for this example look like this:

    Executable = RanVal.R
    InitialDir = run_$(PROCESS)
    Input = infile
    Queue 500

    mospagebreak title=Directing Output from Multiple Executions to Unique Files

    Retrieving Unique Files from Unique Directories

    To read unique input files from individual directories for each execution, create one directory for each execution that you plan to perform and use a consistent naming convention that includes the full range of process IDs. Then, create one input file for each execution and use a consistent naming convention that includes the full range of process IDs. Place each input file in the associated directory.

    For this example, use the same input file names you used in the previous section, and create directories named run_0 - run_599. Place in.0 in the directory run_0, place in.1 in the directory run_1, and so on. Your input files look like this:

    /<working directory>/run_0/
    in.0
    /<working directory>/run_1/
    in.1
    ...
    /<working directory>/run_599/
    in.599

    Using unique directories and file names for each iteration, the attributes in your submit file for this example look like this:

    Executable = RanVal.R
    InitialDir = run_$(PROCESS)
    Input = in.$(PROCESS)
    Queue 600

    Retrieving from one directory

    You can direct Condor to read unique input files for iterative processes from one directory. Create the input files, and use a consistent naming convention that includes the full range of process numbers that you plan to execute.

    For example, to run the RanVal.R process 600 times and use unique input files for each execution, you might name your input files in.<process ID>, where <process ID> ranges from 0 - 599. The attributes and command for this example are as follows:

    Executable = RanVal.R
    Input = in.$(PROCESS)
    Queue 600

    This example uses a single directory for all iterations of the process.

    Checking Your Process Status

    To check the status of your jobs and determine if they are processing without error, type the following command:

    Once you have submitted your job(s) to the queue, you have various ways of checking in on the status of your jobs including e-mail notification of job completion and command line access to both your jobs status and the current state of the cluster.

    This section covers:

    • Automated e-mail notification via condor_watch
    • Checking status of jobs with condor_q
    • Checking status of cluster with condor_status

    Managing Job Status

    You can monitor progress of your batch processing using the condor_status and condor_q commands. This section describes how to check the status of your processes at any time, and how to remove a process from the Condor queue.

    After you submit a cluster for processing, you can check the status of the Condor pool machines and verify that machines are available on which your jobs can execute.

    To check the status of the Condor pool, type the command condor_status. This command returns information about the pool resources. Output lists the number of virtual machines (VMs) available in the pool and whether they are in use. If there are no idle VMs, your batch processing is queued when it is submitted.

    For example:

    > condor_status

    Name OpSys Arch State ActivityLoadAvMemActvtyTime

    vm1@mc-1-1.hm LINUX X86_64Claimed Busy 1.060 19750+17:43:50
    vm2@mc-1-1.hm LINUX X86_64 Claimed Busy 1.060 1975 0+17:43:48
    vm1@mc-1-2.hm LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:43
    vm2@mc-1-2.hm LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:36
    vm1@mc-1-3.hm LINUX X86_64 UnclaimedIdle 0.010 1975 0+00:03:57
    vm2@mc-1-3.hm LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04
    vm1@mc-1-4.hm LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04

    Total Owner Claimed Unclaimed Matched Preempting Backfill

    X86_64/LINUX 7 0 4 3 0 0 0
    Total 7 0 4 3 0 0 0

    To check the cumulative use of resources within in the Condor pool, include the option -submitter with the command condor_status. This command returns information about each user in the Condor pool. Output lists the user's name, machine in use, and current number of jobs per machine. Use this command to help determine how many resources Condor has available to run your jobs. An example is shown here:

    > condor_status -submitter

    Name Machine Running IdleJobs HeldJobs

    mkellerm@hmdc.harvar w4.hmdc.ha 2 0 0
    jgreiner@hmdc.harvar x1.hmdc.ha 9 0 0
    jgreiner@hmdc.harvar x3.hmdc.ha 40 0 0
    kquinn@hmdc.harvard. x5.hmdc.ha 32 0 0

    RunningJobs IdleJobs HeldJobs

    jgreiner@hmdc.harvar 49 0 0
    kquinn@hmdc.harvard. 32 0 0
    mkellerm@hmdc.harvar 2 0 0

    Total 83 0 0

    Receiving email notifications

    If you accepted the HMDC script default to receive notification when your batch processing is complete, you receive two emails from the Condor system. The sender of these emails is condor_watch.

    You receive one email to notify you that the Condor system is watching your cluster of jobs. For example:

    Date: Wed Oct 4 10:20:01 2006 -0400
    From: condor_watch@hmdc.harvard.edu
    To: sspade@hmdc.harvard.edu
    Subject: Condor Watch Greeting

    Hello,

    You've requested that I watch your jobs running on cluster 7.

    When these jobs complete, I will send you another message.

    -Condor Watch on vnc.hmdc.harvard.edu

    You receive a second email to notify you that your cluster processing is complete. For example:

    Date: Wed Oct 4 10:25:02 2006 -0400
    From: condor_watch@hmdc.harvard.edu
    To: sspade@hmdc.harvard.edu
    Subject: Condor Watch - Job(s) Complete

    Hello,

    Your 7 job(s) on cluster 7 running dwarves.pl -- are complete.

    Thank you for using Condor Watch.

    -Condor Watch on vnc.hmdc.harvard.edu

    Removing your job

    To remove a process from the queue, type the command condor_rm <cluster ID>.<process ID>. For example:

    > condor_rm 9.9
    Job 9.9 marked for removal

    To remove all jobs affiliated with a cluster, type the command condor_rm <cluster ID>. For example, the command condor_rm 4 removes all jobs assigned to cluster 4.

    To remove all of your clusters' jobs from the Condor queue, type condor_rm -a. For example:

    > condor_rm -a
    All jobs marked for removal.

    Note:

    Jobs must be deleted from the host they were submitted from

    For example, running the command below shows your job was submitted from w2.hmdc.harvard.edu

    > condor_q `whoami` |grep Schedd
    -- Schedd: HMDC.batch@w2.hmdc.harvard.edu : <10.0.0.31:56822>

    Other Batch Examples

    We created the condor_submit_util script to automate the process of writing a submit file and submitting a cluster of jobs to the Condor queue. When you execute this script, you can include all arguments on the command line. Or, you can execute the script in interactive mode and be prompted for your submit file attributes.

    The default settings for the Automated Condor Submission script support creation of submit files for programs that are written in the R language. To submit another type of program to the Condor queue, such as an Octave program, specify the full path and program for the executable (in this example, Octave). You then define your program file as the input to the executable.

    Note: To use the condor_submit_util script, you must have an RCE account.  See for more information.

    The following are example uses of the condor_submit_util script and options to submit batch processing in the RCE. A complete description of options is provided in .

    Example Using Multiple Input Files

    Start with an executable program (named foo) that uses a set of input data files (named data0 - data4) and does some analysis.

    To save the submit file and receive notification when processing is done, type the following command:

    > condor_submit_util -x foo -i "data" -k -N

    The submit file for this batch looks like this:

    Universe = vanilla
    Executable = /usr/bin/foo
    Arguments = --no-save --vanilla
    when_to_transfer_output = ON_EXIT_OR_EVICT
    transfer_output_files = out.$(Process)
    Notification = Complete

    input = data.$(Process)
    output = out.$(Process)
    error = err.$(Process)
    Log = log.$(Process)
    Queue 5

    Example Using Multiple Iterations of One Executable Program

    An R program (named random.R) produces random output.

    To execute this program eight times and place the output of each execution in separate files in your default working directory, type the following command:

    > condor_submit_util -i random.R -n 8 -o "outrun"

    Following is the submit file for this batch:

    Universe = vanilla
    Executable = /usr/bin/R
    Arguments = --no-save --vanilla
    when_to_transfer_output = ON_EXIT_OR_EVICT
    transfer_output_files = outrun.$(Process)

    input = random.R
    output = outrun.$(Process)
    error = error.$(Process)
    Log = log.$(Process)
    Queue 8

    Example Checking Process Status

    To check the status of the Condor queue after submitting your program for processing, type:

    > condor_q

    -- Submitter: x1.hmdc.harvard.edu : <10.0.0.47:60603> : x1.hmdc.harvard.edu
    ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
    24.4 mcox8/18 16:35 0+00:00:01 R 0 0.0 R --no-save--vani
    24.5 mcox8/18 16:35 0+00:00:00 R 0 0.0 R --no-save--vani
    24.6 mcox8/18 16:35 0+00:00:00 R 0 0.0 R --no-save--vani
    24.7 mcox8/18 16:35 0+00:00:00 I 0 0.0 R --no-save--vani
    24.8 mcox8/18 16:35 0+00:00:00 I 0 0.0 R --no-save--vani
    24.9 mcox8/18 16:35 0+00:00:00 I 0 0.0 R --no-save--vani

    6 jobs; 3 idle, 3 running, 0 held

    The column ID lists the process IDs for your jobs. The column ST lists the status of each job in the Condor queue. A value of R indicates that the job is running. Valid status values are listed in .

    Troubleshooting Problems

    The Condor central manager stops (evicts or preempts) a process for several reasons, including the following:

    • Another job or another user's job in the queue has a higher priority and preempts or evicts your job.

    • The pool machine on which your process is executed encounters an issue with the machine state or the machine policy.

    • You specified attributes in your submit file that cannot process without error.

    Refer to the Condor project's Frequently Asked Questions (FAQ) website at the following URL for detailed information about submission, job status, and processing errors:

    http://www.cs.wisc.edu/condor/manual/v6.8/7_Frequently_Asked.html

    Note: A simple action can help you to diagnose problems if you submit multiple jobs to Condor. Be sure to specify unique file names for each job's output, history, error, and log files. If you do not specify unique file names for each submission, Condor overwrites existing files that have the same names. This can prevent you from locating information about problems that might occur.

    Priorities and Preemption

    Job priorities enable you to assign a priority level to each submitted Condor job. Job priorities, however, do not impact user priorities.

    User priorities are linked to the allocation of Condor resources based upon a user's priority. A lower numerical value for user priority means higher priority, so a user with priority 5 is allocated more resources than a user with priority 50. You can view user priorities by using the condor_userprio command. For example:

    > condor_userprio -allusers

    Condor continuously calculates the share of available machines. For example, a user with a priority of 10 is allocated twice as many machines as a user with a priority of 20. New users begin with a priority of 0.5 and, based upon increased usage, their priority rating rises proportionately in relation to other users. Condor enforces this function such that each user gets a fair share of machines according to user priority and historical volume. For example, if a low-priority user is using all available machines and a higher-priority user submits a job, Condor immediately performs a checkpoint and vacates the jobs that belong to the lower-priority user, except for that user's last job.

    User priority rating decreases over time and returns to a baseline of 0.5 as jobs are completed and idle time is realized relative to other users.

    Process Tracking

    To track progress of your processes:

    • Type condor_q to view the status of your process IDs.

    • Check your output directory for the time stamps of your output, log, and error files.

      If the output file and log file for a submitted process are more current than the error file, your process probably is running without error.

    Process Queue

    To view detailed information about your processes, including the ClassAd requirements for your jobs, type the command condor_q -analyze.

    Refer to the Condor Version 6.8.0 Manual for a description of the value that represents why a process was placed on hold or evicted. Go to the following URL for section 2.5, "Submitting a Job," and search for the text JobStatus under the heading "ClassAd Job Attributes":

    http://www.cs.wisc.edu/condor/manual/v6.8.0/2_5Submitting_Job.html

    For example:

    > condor_q -analyze
    Run analysis summary. Of 43 machines,
    43 are rejected by your job's requirements
    0 are available to run your job
    WARNING: Be advised:
    No resources matched request's constraints
    Check the Requirements expression below:
    Requirements = ((Memory > 8192)) && (Disk >= DiskUsage)

    Error Log

    An error file includes information about any errors occurred when your batch processing executed.

    Refer to the Condor Version 6.8.0 Manual for information about entries in the error file. Go to the following URL:

    http://www.cs.wisc.edu/condor/manual/v6.8.0/ref.html

    To view the error file for a process and determine where an error occurred, use the cat command. For example:

    > cat errorfile
    Error in readChar(con, 5) : cannot open the connection
    In addition: Warning message:
    cannot open compressed file 'Utilization1.RData'
    Execution halted

    History File

    When batch processing completes, Condor removes the cluster from the queue and records information about the processes in the history file. History is displayed for each process on a single line. Information provided includes the following:

    • ID - The cluster and process IDs of the job

    • OWNER - The owner of the job

    • SUBMITTED - The month, day, hour, and minute at which the job was submitted to the queue

    • CPU_USAGE - Remote user central processing unit (CPU) time accumulated by the job to date, in days, hours, minutes, and seconds

    • ST - Completion status of the job, where C is completed and X is removed

    • COMPLETED - Time at which the job was completed

    • CMD - Name of the executable

    To view information about processes that you executed on the Condor system, type the command condor_history. For example:

    > condor_history
    IDOWNER SUBMITTED RUN_TIME ST COMPLETED CMD
    1.0 arose 9/26 11:45 0+00:00:00 C 9/26 11:45 /usr/bin/R --no
    2.0 arose 9/26 11:48 0+00:00:01 C 9/26 11:48 /usr/bin/R --no
    3.0 arose 9/26 11:49 0+00:00:00 C 9/26 11:50 /usr/bin/R --no
    3.1 arose 9/26 11:49 0+00:00:01 C 9/26 11:50 /usr/bin/R --no
    6.0 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.1 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.2 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.5 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.3 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.4 arose 10/3 15:52 0+00:00:00 C 10/3 15:52 /nfs/fs1/home/A
    6.6 arose 10/3 15:52 0+00:00:01 C 10/3 15:52 /nfs/fs1/home/A
    9.0 arose 10/4 11:02 0+00:00:00 C 10/4 11:02 /nfs/fs1/home/A
    9.1 arose 10/4 11:02 0+00:00:01 C 10/4 11:02 /nfs/fs1/home/A
    9.2 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.3 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.5 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.6 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.4 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A

    Search through the history file for your process and cluster IDs to locate information about your jobs.

    To view information about all completed processes in a cluster, type the command condor_history <cluster ID>. To view information about one process, type the command condor_history <cluster ID>.<process ID>. For example:

    > condor_history 9
    IDOWNER SUBMITTED RUN_TIME ST COMPLETED CMD
    9.0 arose 10/4 11:02 0+00:00:00 C 10/4 11:02 /nfs/fs1/home/A
    9.1 arose 10/4 11:02 0+00:00:01 C 10/4 11:02 /nfs/fs1/home/A
    9.2 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.3 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.5 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.6 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A
    9.4 arose 10/4 11:02 0+00:00:00 X ??? /nfs/fs1/home/A

    Process Log File

    A log file includes information about everything that occurred during your cluster processing: when it was submitted, when execution began and ended, when a process was restarted, if there were any issues. When processing finishes, the exit conditions for that process are noted in the log file.

    Refer to the Condor Version 6.8.0 Manual for a description of the entries in the process log file. Go to the following URL for section 2.6, "Managing a Job," and go to subsection 2.6.6, "In the log file":

    http://www.cs.wisc.edu/condor/manual/v6.8.0/2_6Managing_Job.html

    To view the log file for a process and determine where an error occurred, use the cat command. For example, the following log file indicates that the process completed normally:

    > cat log.1
    000 (012.001.000) 10/04 12:14:51 Job submitted from host: <10.0.0.47:60603>
    ...
    001 (012.001.000) 10/04 12:15:00 Job executing on host: <10.0.0.61:37097>
    ...
    005 (012.001.000) 10/04 12:15:00 Job terminated.
    (1) Normal termination (return value 0)
    Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
    Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
    Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
    Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
    7 - Run Bytes Sent By Job
    163 - Run Bytes Received By Job
    7 - Total Bytes Sent By Job
    163 - Total Bytes Received By Job
    ...

    Following is an example log file for a process that did not complete execution:

    > cat log.4
    000 (09.000.000) 09/20 14:47:31 Job submitted from host:
    <x1.hmdc.harvard.edu>
    ...
    007 (09.000.000) 09/20 15:02:10 Shadow exception!
    Error from starter on x1.hmdc.harvard.edu: Failed
    to open 'scratch.1/frieda/workspace/v67/condor-
    test/test3/run_0/b.input' as standard input: No such
    file or directory (errno 2)
    0 - Run Bytes Sent By Job
    0 - Run Bytes Received By Job
    ...

    Held Process

    To view information about processes that Condor placed on hold, type condor_q -hold. For example:

    > condor_q -hold

    -- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> : vnc.hmdc.harvard.edu
    ID OWNER HELD_SINCEHOLD_REASON
    17.0 arose 10/5 12:53via condor_hold (by user arose)
    17.1 arose 10/5 12:53via condor_hold (by user arose)
    17.2 arose 10/5 12:53via condor_hold (by user arose)
    17.3 arose 10/5 12:53via condor_hold (by user arose)
    17.4 arose 10/5 12:53via condor_hold (by user arose)
    17.5 arose 10/5 12:53via condor_hold (by user arose)
    17.6 arose 10/5 12:53via condor_hold (by user arose)
    17.7 arose 10/5 12:53via condor_hold (by user arose)
    17.9 arose 10/5 12:53via condor_hold (by user arose)

    9 jobs; 0 idle, 0 running, 9 held

    Refer to the Condor Version 6.8.0 Manual for a description of the value that represents why a process was placed on hold. Go to the following URL for section 2.5, "Submitting a Job," and look for subsection 2.5.2.2, "ClassAd Job Attributes." Look for the entry HoldReasonCode:

    http://www.cs.wisc.edu/condor/manual/v6.8.0/2_5Submitting_Job.html

    To place a process on hold, type the command condor_hold <cluster ID>.<process ID>. For example:

    > condor_hold 8.33
    Job 8.33 held

    To place on hold any processes not completed in a full cluster, type condor_hold <cluster ID>. For example:

    > condor_hold 8
    Cluster 8 held.

    The status of those uncompleted processes in cluster 8 is now H (on hold):

    > condor_q

    -- Submitter: vnc.hmdc.harvard.edu : <10.0.0.47:60603> vnc.hmdc.harvard.edu
    ID OWNER SUBMITTED RUN_TIME STPRISIZECMD
    8.2 sspade 10/4 11:19 0+00:00:00 H 0 9.8 dwarves.pl
    8.5 sspade 10/4 11:19 0+00:00:00 H 0 9.8 dwarves.pl
    8.6 sspade 10/4 11:19 0+00:00:00 H 0 9.8 dwarves.pl

    3 jobs; 0 idle, 0 running, 3 held

    To release a process from hold, type the command condor_release <cluster ID>.<process ID>. For example:

    > condor_release 8.33
    Job 8.33 released.

    To release the full cluster from hold, type the command condor_release <cluster ID>. For example:

    > condor_release 8
    Cluster 8 released.

    You can instruct the Condor system to place your batch processing on hold if it spends a specified amount of time suspended (that is, not processing). For example, include the following attribute in your submit file to place your jobs on hold if they spends more than 50 percent of their time suspended:

    Periodic_hold = CumulativeSuspensionTime > (RemoteWallClockTime /2.0)

    Running Interactive Jobs

    Interactive Servers are intended for large processes that are memory intensive. If you have a group of dozens or hundreds of jobs, or jobs that will run for hours, days, or longer, please use the Batch Servers.

    Working with RCE Powered Statistical Applications

    The Applications menu in the RCE includes the same statistical applications on two submenus: Mathematics and RCE Powered Applications. Tools in the Mathematics menu are executed locally. Tools in the RCE Powered Applications menu are executed on RCE servers' dedicated to handling interactive applications, also known as compute on demand (COD) servers. 

    22launchingr_1.png

    • 22launchingr_1.png

    Launching an RCE Powered Application

    To launch an RCE powered version of a statistical application:

    1. Click the Applications menu, select the RCE Powered Applications submenu, and then select the statistical tool that you choose to launch.

    appmenu_rce_matlab.png

    2. You will prompted to enter the amount of memory to reserve for your job in GB.

     Determine the Amount of RAM Reserved for Your Job

    Please Note: To ensure a fair distribution of resources we ask that you be conservative in your memory reservation requests.

    3. The system analyzes resources and determines if there are resources available on the primary flock for your process.

    4. If there are resources available for your process on the primary flock:

    4.1. A Please Wait message is displayed indicating that your application is allocated to a primary flock.

    rce_submit_primary.png

    Note: You can click Cancel to stop this action and not open the application. A Warning message is displayed to notify you that your application launch is canceled. Click OK to acknowledge this warning.

    4.2. When the application is ready to launch, a Please Note window prompts you to complete the launch action. You must click OK within two minutes or the application times out and you must start again from the beginning. If you do not respond to the Launch Application prompt within the allotted time, your application does not launch and the prompt window closes.
    Please Note: This notice also indicates the specific server on which your job is running. In this example, the host is cod6-1.clus.hmdc.harvard.edu . If you need to contact support when you use this tool, supply the server name identified here.

    rce_wrapper_notice.png

    4.3. Click OK to launch your application.

    If you launched an application with a command-line interface (CLI), the tool opens in a new window. 
    If you launched an application with a GUI, the GUI application will open as normal.

    5. If there are no resources available in the primary flock, but resources available in the backup flock:

    5.1. A Please Note message is displayed indicating that there are no primary flock resources available for your application, but there are backup flock resources available.

    rce_noprimary_backup.png

    5.2. Click OK to acknowledge this message.

    5.3. A message window prompts you to launch your application on the backup flock

    .rce_backup_prompt_1.png

        • Click Yes to launch your application on a backup flock server.
        • Click No to cancel the launch of your application.

    5.4. A Please Wait message is displayed indicating that your application is allocated to a backup flock server.

    rce_submit_backup_1.png

    Note: You can click Cancel to stop this action and not open the application. A Warning message is displayed to notify you that your application launch is canceled. Click OK to acknowledge this warning.

    5.5. When the application is ready to launch, a Please Note window prompts you to complete the launch action. You must click OK within 30 seconds or the application times out and you must start again from the beginning. If you do not respond to the Launch Application prompt within the allotted time, your application does not launch and the prompt window closes.

    5.6. Click OK to launch your application. 

    If you launched an application with a CLI, the tool opens in a new window. The banner on this window indicates the specific server on which your are working. If you need to contact support when you use this tool, supply this server name.

    6. If there are no resources available in the primary flock, and no resources available in the backup flock:

    6.1. You will see a notice indicating that there are no resources available and asking you to lower your requirements.

    rce_noprimary_nobackup.png

    6.2. Click OK to exit and try again.

    7. Once your process is started, you will see a small green clock icon and see a notification about your job run time.

    r_running.png

    • r_running.png
    • rce_noprimary_nobackup.png
    • rce_submit_backup_1.png
    • rce_backup_prompt_1.png
    • rce_noprimary_backup.png
    • RCE Prompt: Job RAM Reservation
    • appmenu_rce_matlab.png
    • rce_submit_primary.png
    • rce_wrapper_notice.png

    Extending an RCE Powered Application

    When first started, RCE Powered Applications have a run time of 120 hours (5 days). After 50% of your run time has expired you can request an extension using the procedure below.

    Note: Please note that there are two type of extension requests, automatic and manual. Users are highly encouraged to use the automatic method, if possible, as it streamlines the process greatly.

    Note: Please make sure your e-mail on file with RCE account is up-to-date. This is how HMDC staff will contact you if there are issues with your request. If you are unsure which email address is associated with your account, you can user the Account Self Service interface to confirm or update it.

    1. If you have more than one RCE Powered Application running, find the job you'd like to extend by hovering over each clock icon and reading the tooltip that pops up

    .r_days_remaining.png

    2. Click on the clock icon for your job to bring up a selection menu:

    req_extension.png

    3. Click "Request Extension"

    4. If your job is eligible for an extension, you will either be granted an automatic extension (if you are using 4 CPUs or fewer) or you will be asked if you would like to request a manual extension (which may take 1 business day and is not guaranteed). You will then be prompted for the number of extra hours you are requesting.

     rce_extension_granted.jpg

    Note: Reasons for being ineligible for an extension include not letting your job run long enough (50% of your run time must have expired), having already requested a manual extension, and run time exceeding 45 days.

    5. Click "OK" after entering a number of hours. Click "Cancel" to cancel your extension request.

    6. You will see a popup notification indicating your submission has been processed. Either the new remaining time will be indicated (in the case of an automatic extension) or a message will be displayed letting you know a manual extension request has been made.

    7. If your manual extension request is granted, you will see a popup notification indicating so.

    • r_days_remaining.png
    • req_extension.png
    • rce_extension_granted.png

    Statistical Applications

    The following statistical applications are available on the RCE:

    For in-depth statistical and application-related questions, including workshop schedules and links to statistical research resources, please contact:

    For help using the Harvard Dataverse network, please see the Dataverse Network Guide

    Developing Software

     The RCE provides many ways to develop and test your own code, using common languages, editors and source code utilities.

    Creating R Modules

    Building R modules in the RCE

    The IQSS Data Science team is putting together the finishing touches on a new R package build system using Jenkins CI platform and GitHub. Check back for more information on the new Rbuild platform.

    Programming Languages

    Common programming languages available in the RCE include:

    • R
    • Python (versions 2.6, 2.7, and 3.3)
    • Java
    • Perl
    • Ruby
    • C/C++
    • Shell

    Programming Tools and Utilities

    Code/text editors available:

    • Emacs
    • Eclipse
    • Gedit
    • Bluefish
    • Kwrite
    • Vim

    Tools to interact with a number of well-known source code repositories:

    Outage Notification

    Outage Notification - Overview

    We strive to provide advance notice whenever we must schedule an interruption of service; in addition, we post announcements on our web site in the event of an unplanned service outage. We recommend that you use at least one of the channels of communication described in this section so that you are not caught unaware by a service outage.

    Maintenance Windows

    Some systems maintenance tasks require temporary interruptions of our managed services. To minimize the disruption, we reserve the first and third Wednesday of every month for regular systems maintenance.

    When you receive notification that an interruption of service will take place, please be sure to save all work and disconnect from all of our login servers at least 30 minutes before the start of the scheduled outage window. The outage notification specifies which managed services will be affected.

    You can sign up to receive outage announcements by email.  Outage notifications also are posted on our support web site at http://projects.iq.harvard.edu/rce/calendar and in the RCE itself through the built-in Outage Notifier.

    If you have questions concerning an interruption in service, please contact us.