#acl Known:read,write,revert All:read

[English|[[ClusterSystemTips|Japanese]]]

= Cluster System Usage Tips =

Share any tips that you have discovered on using the Cluster System. You can edit this article after logging in.

== Log of the information sharing mailing list for research users ==

Log of the information sharing mailing list for research users is available at: 
http://lists.imc.tut.ac.jp/pipermail/research-users/

== Measuring resource usage (e.g. memory usage) ==

Resources (e.g. memory) used by the processes can be viewed by using the GNU '''time''' command (/usr/bin/time). Specifying the option -v displays the resource details used by the process, such as follows ([[http://linuxjm.osdn.jp/html/LDP_man-pages/man1/time.1.html|see time command man page]]).

{{{
-bash-4.1$ /usr/bin/time --version
GNU time 1.7

-bash-4.1$ /usr/bin/time -v whoami
my016
        Command being timed: "whoami"
        User time (seconds): 0.00
        System time (seconds): 0.00
        Percent of CPU this job got: 0%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3088
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 243
        Voluntary context switches: 3
        Involuntary context switches: 1
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
}}}

'''Maximum resident set size''' shows the memory usage. Note that the GNU '''time''' command V. 1.7 shows this figure 4 times greater than the actual memory usage, which is a known bug ([[http://qiita.com/guicho271828/items/2ad3df13e915ecbb9cac|see Massive bug in GNU time (Japanese)]]).

You can run the GNU '''time''' command on a calculation node, however, the result is output as the standard error. 

== Batch Deletion of Submitted Jobs ==

End users are not permitted to use the '''-all''' option for '''qdel''' command, and '''qdel -t''' does not function correctly. For this reason, use the following script to perform batch deletion of multiple submitted jobs.

{{{#!highlight sh
#!/bin/bash

if [ $# -ne 1 -a $# -ne 2 ]; then
    echo "Usage: bash $0 [ 'all' | firstId lastId | lastId ]"
    exit
fi

# Is the first argument a numerical value?
if expr "$1" : '[0-9]*' > /dev/null ; then
    ids=`seq $1 $2`
else
    ids=`qstat | cut -d . -f 1 | tail -n +3 | column`
fi

echo qdel $ids
qdel $ids
}}}

Specifying '''all''' as the first argument deletes all the submitted jobs (regardless of whether they are being executed or not). It is also possible to delete a range of jobs by specifying the starting job ID as the first argument and the ending job ID as the second argument. To execute a script called '''qdel.sh''', write the code such as follows:

{{{
-bash-4.1$ bash qdel.sh all

-bash-4.1$ bash qdel.sh 100 200
}}}

== Script to Submit Specified Command as a Job ==

The '''qsub''' command with the '''-v''' option can pass an environment variable to a script. The following script uses this function to execute the specified command as a job on a calculation node. In this script, the standard error is output with the standard output ('''-j oe'''), and the script also includes commands to log the execution time and the node where the calculation took place ('''date''' and '''hostname''').

{{{#!highlight sh
#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -q wLrchq
#PBS -j oe

cd $PBS_O_WORKDIR

date
hostname
echo "$JOB_CMD"
eval "$JOB_CMD"
date
}}}

To execute a script called '''qsub.sh''', specify the command to execute as the argument of '''JOB_CMD'''. Executing this script submits the command as a job.

{{{
-bash-4.1$ qsub -v JOB_CMD="/usr/bin/time -v perl i_love_cats.pl" qsub.sh

-bash-4.1$ qsub -v JOB_CMD="perl catching_cats.pl | perl counting_cats.pl" qsub.sh
}}}

== List of All Users' Jobs in the Queue ==

A list of all jobs including other users' jobs in the queue can be checked using the {{{/usr/local/maui/bin/showq}}} command.


== Displaying Job Status Repeatedly ==

It is useful to use the '''watch''' command to display the execution status of a submitted job. The code examples below run the '''watch''' command every 5 seconds, displaying the latest job execution status. Option '''-d''' highlights the difference from the last status.  Note that an alias cannot be specified to a command that is executed repeatedly. 

{{{
-bash-4.1$ watch -n 5 qstat -a

-bash-4.1$ watch -n 5 qstat -Q

-bash-4.1$ watch -n 5 -d qstat -Q
}}}

== Limitation by ulimit -t (the CPU time) ==

The development node has a limitation on the CPU time as specified by the '''ulimit''' command. This limit value seems to apply to the value that are not shown by '''time''' command (hidden values). If a job is aborted due to an unknown reason, this limitation could be the cause. For example, synchronizing a large volume of data using the '''rsync''' command generates an error such as follows. 

{{{
-bash-4.1$ rsync --progress -avh /tmp/source /destination/
sending incremental file list

...

rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (96 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
}}}

The CPU time consumed by '''rsync''' appears to be determined by the transmitted data volume regardless of whether '''--bwlimit''' has been specified or not. As remote synchronization involves transport encryption using '''ssh''', it consumes approximately twice as much CPU time as local synchronization. 

== Job Scheduling Order ==
=== The Wide-Area Coordinated Cluster System for Education and Research ===
The job scheduler allocates jobs in descending order of host number of the calculation nodes such as wsnd30, wsnd29, ..., and wsnd00.

{{{#!wiki comment
=== The Computer Systems (Cluster) for the Next Generation Simulation Technology Education ===
The job scheduler allocates the jobs in ascending order of host number of the calculation nodes, in principle, such as csnd02, csnd03, ..., csnd27, csnd00, and csnd01. Note that csnd00 and csnd01 are the nodes where a GPGPU is installed, and the jobs are allocated to the node at the end of the list. 
}}}