Managing your batch job

Once you have submitted your job(s) to the queue, you have various ways of checking in on the status of your jobs including e-mail notification of job completion and command line access to both your jobs status and the current state of the cluster.

Managing Job Status

You can monitor progress of your batch processing using the condor_status and condor_q commands. This section describes how to check the status of your processes at any time, and how to remove a process from the Condor queue.

After you submit a cluster for processing, you can check the status of the Condor pool machines and verify that machines are available on which your jobs can execute.

To check the status of the Condor pool, type the command condor_status. This command returns information about the pool resources. Output lists the number of virtual machines (VMs) available in the pool and whether they are in use. If there are no idle VMs, your batch processing is queued when it is submitted.

For example:

> condor_status

Name OpSys Arch State ActivityLoadAvMemActvtyTime LINUX X86_64Claimed Busy 1.060 19750+17:43:50 LINUX X86_64 Claimed Busy 1.060 1975 0+17:43:48 LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:43 LINUX X86_64 Claimed Busy 1.000 1975 0+17:44:36 LINUX X86_64 UnclaimedIdle 0.010 1975 0+00:03:57 LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04 LINUX X86_64 Unclaimed Idle 0.000 1975 0+00:00:04

Total Owner Claimed Unclaimed Matched Preempting Backfill

X86_64/LINUX 7 0 4 3 0 0 0
Total 7 0 4 3 0 0 0

To check the cumulative use of resources within in the Condor pool, include the option -submitterwith the command condor_status. This command returns information about each user in the Condor pool. Output lists the user's name, machine in use, and current number of jobs per machine. Use this command to help determine how many resources Condor has available to run your jobs. An example is shown here:

condor_status -submitter

Name Machine Running IdleJobs HeldJobs

mkellerm@hmdc.harvar w4.hmdc.ha 2 0 0
jgreiner@hmdc.harvar x1.hmdc.ha 9 0 0
jgreiner@hmdc.harvar x3.hmdc.ha 40 0 0
kquinn@hmdc.harvard. x5.hmdc.ha 32 0 0

RunningJobs IdleJobs HeldJobs

jgreiner@hmdc.harvar 49 0 0
kquinn@hmdc.harvard. 32 0 0
mkellerm@hmdc.harvar 2 0 0

Total 83 0 0

Removing your job

To remove a process from the queue, type the command condor_rm <cluster ID>.<process ID>. For example:

> condor_rm 9.9
Job 9.9 marked for removal

To find a list of your jobs type:

condor_q <username>

To remove all jobs affiliated with a cluster, type the command condor_rm <cluster ID>. For example, the command condor_rm 4 removes all jobs assigned to cluster 4.

To remove all of your clusters' jobs from the Condor queue, type condor_rm -a. For example:

> condor_rm -a
All jobs marked for removal.

Jobs must be deleted from the host they were submitted from.