Always On: Persistent Interactive Jobs on the RCE

Do you run interactive jobs on the Research Computing Environment (RCE)? You probably do, but, do you know what differentiates interactive jobs from non-interactive or batch RCE jobs?

It’s simple: Interactive jobs expect user-input and are often GUI providing applications like XStata, Matlab, or Mathematica. Batch jobs expect no user input and run in parallel without intervention.

How RCE powered applications work now

 

Anytime you run an application from the RCE Powered Applications menu on the RCE Desktop, you’re executing an interactive job. All applications listed in the RCE Powered Applications menu provide GUI functionality. All RCE GUI-based applications have the same requirements:

  • An RCE Powered Application must have access to a desktop in order to display a GUI. This desktop is your NoMachine NX4 session.

  • If you terminate your NoMachine NX4 desktop, your RCE Powered Application, too, disappears, because the running application is unable to draw GUI objects on your desktop.

Most of our users prefer interactive jobs. Why?

  • Interactive jobs provide GUI functionality, making them easier to manage.

  • Interactive jobs are easier to create; no knowledge of submit files is necessary.

  • Interactive jobs have no memory constraints -- you can allocate slot memory upwards of one hundred gigabytes. In batch mode, you can only allocate upwards of four gigabytes of memory per slot, however, you can allocate many more slots. 

For memory-intensive computations or basic computations, interactive jobs appear easier to deal with. 

Can you see the problem?  What happens if your NoMachine NX4 desktop crashes? What if we need to perform an emergency maintenance on the RCE Login nodes? If your NX4 desktop crashes or we need to perform a maintenance on the RCE login nodes, your interactive jobs will terminate.

 

Our solution: providing job stability in the RCE.

 

We’re demoing a new solution: We have re-written our RCE Powered utilities to allow for persistent interactive jobs which do not terminate when NoMachine NX4 crashes or when we perform routine maintenance on the RCE Login Nodes. Our new RCE Powered utilities allow you to:

  • Submit interactive jobs, perform tasks, forcibly cancel your NoMachine NX4 session, re-login, re-connect to your interactive jobs and access them just as you had left them before.

  • Submit interactive jobs over SSH. If you’re an RCE power user or a remote user overseas, you may have wanted to access RCE Powered R without GUI overhead. Now, you can! RCE Powered Applications which do not require a GUI like Matlab and R can now be accessed over SSH without the need to create a NoMachine NX4 desktop.

Want to know how this works? Like technical details? Want to try the demo? Read on!

 

When you login to NoMachine NX4, you create a display. Each display is referenced by an integer in the $DISPLAY variable of your environment. If you launch the Terminal application from the System Tools sub-menu and execute:

echo $DISPLAY

 

you will receive the following string:

 

:1005.0

 

This means your NoMachine NX4 desktop is running on DISPLAY port 1005. Your DISPLAY is remotely accessible by applications in the RCE. When you run an RCE Powered Application like Matlab, Matlab acquires the value of your $DISPLAY variable, attempts to connect to your display, and then draws its interface on your display. If your DISPLAY is inaccessible, in the case of NoMachine NX4 terminating prematurely or in the case of routine maintenance, Matlab will be unable to draw its interface and will terminate, leaving you jobless and lost.

Enter Xpra

 

Xpra is a display server, much like NoMachine NX4. Xpra allows GUI-based applications to display its user interface. Unlike NoMachine NX4, Xpra can run on the compute nodes. Therefore, Matlab, XStata, and other applications can use Xpra as a display server rather than NX4. When you submit an interactive job using the new RCE utilities, the compute node spawns your application underneath an Xpra process, guaranteeing your application a permanent display unencumbered by routine maintenance or unexpected NoMachine NX4 display errors. 

Try it out

 

Soon, HMDC DevOps will deploy the new RCE interactive tools to the cluster and you’ll be able to demo these new features. Here’s a sneak peak at how these utilities work.

Establish a NoMachine NX4 desktop session, launch Terminal from the System Tools and type:

rce_submit.py -l

 

When executed, you will see the following output:

 

Screenshot 2015-02-02 09.26.13.png

 This output shows you the RCE Application and version pairs you can run on the RCE. If you want to run xstata-mp 13.1, the default xstata-mp version, type:

rce_submit.py -r -a xstata-mp

 

However, if you want to run Matlab R2014a, type:

rce_submit.py -r -a matlab -v R2014a

 

When you run the xstata-mp application, xstata-mp will display on your screen as usual, but, there are two differences:

Screenshot 2015-02-02 09.31.42.png

 

Do you see that little X in the upper-right hand corner of the screen, in the system tray? This represents the XPRA client. In this example, xstata-mp is running under the xpra display server and you are accessing the xpra display server using the xpra client. 

If you close the Stata/MP window, Stata/MP will terminate along with XPRA. However, if you right click the Xpra icon in the system tray and click ‘Disconnect’, Stata/MP will still be running and you can re-connect to it, only your Xpra client will disconnect. This is the magic which allows interactive jobs to persist even across RCE Login maintenances or NoMachine NX4 errors.

Re-attaching a job

 

Whoops! You lost your NoMachine NX4 desktop session and you need to create a new one. You were working on xstata-mp last and you hope all your hard work still exists. (It does.) Open the Terminal and type:

rce_submit.py --list-jobs

 

rce_submit.py will output a list of interactive jobs you are running including the job’s JobID, the application name, and the application version. Find the xstata-mp job you were just working on, and note its JobID. Then type

rce_submit.py --attach JobID

 

When executed, you will see your xstata-mp application, just as you left it, displayed to your desktop.

Next Steps

Neat, right? You can resume your interactive job despite your desktop crashing. But, we can do even better.

Right now rce_submit.py is a command line only utility but we want to build GUI capability into rce_submit.py to improve usability for our users who are less familiar with command line utilities.

Additionally, we want to embed notifications into rce_submit.py which notify you if your job is reaching its memory limit, cpu limit, or appears to have crashed.

Have any ideas on how to make our new RCE utilities better? Feel free to shoot us an e-mail at support@hmdc.harvard.edu or e-mail the author, Evan Sarmiento.