"en/WideAreaClusterSystem"の差分

TUT HPC Cluster Wiki: "en/WideAreaClusterSystem"の差分

ログイン

編集不可のページ History 添付ファイル

3と11のリビジョン間の差分 (その間の編集: 8回)

-  ⇤ ← 2017-04-22 18:43:28時点のリビジョン3 → 
  サイズ: 13013
  編集者: intergroup01
  コメント:
+   ← 2025-08-20 03:17:53時点のリビジョン11 → ⇥
  サイズ: 0
  編集者: yi041
  コメント:
-削除された箇所はこのように表示されます。
+追加された箇所はこのように表示されます。
 行 1:
-= Wide-Area Coordinated Cluster System for Education and Research =

'''Test operation commenced on August 1, 2014.'''

<<TableOfContents(maxdepth=5)>>

== Logging In ==

Tera''''''Term や putty などの SSH クライアントを利用して，開発サーバにログインして下さい．
開発サーバのホスト名は、学内利用者か学外利用者かによって異なります。

開発サーバはコンパイルや開発を行うためのサーバです．
開発サーバで大規模な計算処理を実行しないでください．
計算処理は [[../ClusterSystemUsage#A.2BMLgw5zDWW5.2BITGW5bNU-|Torque]] を介して、ジョブとしてクラスタ上で実行してください．

また、開発サーバは，実際には2台のサーバを用意して負荷分散しています．

=== 学内利用者の場合 ===

開発サーバのホスト名は wdev.edu.tut.ac.jp になります。
ユーザ名とパスワードは，情報メディア基盤センターから発行されたアカウントを入力して下さい。

{{{
$ ssh wdev.edu.tut.ac.jp
}}}

=== 学外利用者の場合 ===

開発サーバは lark.imc.tut.ac.jp になります。
[[https://hpcportal.imc.tut.ac.jp|アカウント登録システム]]で発行されたアカウント（par+数字7桁）を使ってください。
学外利用者は公開鍵認証によってユーザを認証します。
[[https://hpcportal.imc.tut.ac.jp/profile|プロファイル変更]]で、自分の公開鍵を登録してください。
公開鍵の作成方法は[[../SSHClient|SSHクライアントの使い方]]を参照して下さい。

{{{
$ ssh lark.imc.tut.ac.jp
}}}

== Queue Configuration ==

The following is the tentative configuration.
||Queue name||No. of available nodes||Timeout/job||Max. processes/node||Max. memory/node||Notes||
||wSrchq||30 nodes||1 hour||20||100GB|| ||
||wLrchq||30 nodes||336 hours||20||100GB|| ||

== System Configuration ==

=== Hardware Configuration ===

||Category||Host name||Model||CPU||Main memory capacity||Computing performance||Accelerator ||OS||
||Development processing server||wdev||HA8000-tc/HT210||Xeon E5-2680 v2 2.8GHz 10-core x 2||128GB||448GFLOPS||Xeon Phi||RHEL6.4||
||Calculation node (Xeon Phi processors installed)||wsnd00〜wsnd15||HA8000-tc/HT210||Xeon E5-2680 v2 2.8GHz 10-core x 2||128GB||448GFLOPS||Xeon Phi||RHEL6.4||
||Calculation node||wsnd16〜wsnd31||HA8000-tc/HT210||Xeon E5-2680 v2 2.8GHz 10-core x 2||128GB||448GFLOPS|| ||RHEL6.4||

* It is not possible to execute jobs using wsnd17 and wsnd31.

=== File System Configuration ===

||Home area||{{{/home/numeric_characters/user_name/}}}||教育用Windows端末と同じホーム領域が利用できます．教育用Windows端末からは {{{Z:\}}} として表示されます。||
||Work area||{{{/gpfs/work/user_name/}}}|| 教育用Windows端末からは {{{V:\}}} として表示されます。 ||
||Software area||{{{/common/}}}|| ||

The work area can also be viewed through {{{/work/user_name/}}}.

=== Compiler ===

||Compiler||Version||Installed directory||
||Intel||14.0.0 Build 20130728||/common/intel-2013SP1/||
||PGI||14.3-0||/common/pgi-14.3/||
||GNU||4.4.7||/usr/bin/||

=== Message Passing Interface (MPI) ===

||Library||Version||Installed directory||
||Intel MPI||14.0.0 Build 20130728||/common/intel-2013SP1/||
||Open MPI||1.6.5||/common/openmpi-1.6.5/||
||MPICH 3||3.1||/common/mpich-3.1/||
||MPICH 1||1.2.7p1||/common/mpich-1.2.7p1/||

=== Software Configuration ===

||Software name||Version||Description||Installed directory||
||||||||Structural analysis||
||ANSYS Multiphysics||14.5||Multiphysics analysis tool||/common/ansys14.5/||
||ANSYS CFX||14.5||General-purpose thermal-hydraulics software||/common/ansys14.5/||
||ANSYS Fluet||14.5||General-purpose thermal-hydraulics software||/common/ansys14.5/||
||ANSYS LS-DYNA||14.5||Crash analysis tool||/common/ansys14.5/||
||ANSYS HFSS||15.0.3||High frequency 3D electromagnetic field analysis software||/common/ansys_hfss-15.0.3/||
||ABAQUS||6.12||General-purpose non-linear finite element analysis program||/common/abaqus-6.12-3/||
||Patran||2013||Integrated CAE environment pre/post-processing software||/common/patran-2013/||
||DEFORM-3D||10.2||FEA-based 3D formation process simulation system||/common/deform-3d-10.2/||
||COMSOL||4.4||FEA-based general-purpose physics simulation system||/common/comsol44/||
||||||||Computational Materials Science||
||PHASE (Serial version)||2014.01|| First-principles pseudopotentials calculation software (Serial version)||/common/phase0-2014.01-serial/||
||PHASE (Parallel version)||2014.01|| First-principles pseudopotentials calculation software (Parallel version)||/common/phase0-2014.01-parallel/||
||PHASE-Viewer||3.2.0||Integrated GUI environment software||/common/phase-viewer-v320/||
||UVSOR (Serial version)||3.42||First-principles pseudopotentials dielectric-response analysis software (Serial version)||/common/uvsor-v342-serial/||
||UVSOR (Parallel version)||3.42||First-principles pseudopotentials dielectric-response analysis software (Parallel version)||/common/uvsor-v342-parallel/||
||OpenMX (Serial version)||3.7||First-principles quantum simulator based on the relative quantum mechanics bottom-up causation theory (Serial version)||/common/openmx-3.7/||
||OpenMX (Parallel version)||3.7||First-principles quantum simulator based on the relative quantum mechanics bottom-up causation theory (Parallel version)||/common/openmx-3.7/||
||||||||Computational chemistry||
||Gaussian||09 Rev.C.01||Electronic structure program||/common/gaussian09-C.01/||
||NWChem (Serial version)||6.3.2||A comprehensive and scalable open-source solution for large scale molecular simulations (Serial version)||/common/nwchem-6.3.2-serial/||
||NWChem (Parallel version)||6.3.2||A comprehensive and scalable open-source solution for large scale molecular simulations (Parallel version)||/common/nwchem-6.3.2-parallel/||
||GAMESS (Serial version)||2013.R1||A general ab initio quantum chemistry package (Serial version)||/common/gamess-2013.r1-serial/||
||GAMESS (Parallel version)||2013.R1||A general ab initio quantum chemistry package (Parallel version)||/common/gamess-2013.r1-parallel/||
||MPQC||3.0-alpha||Massively Parallel Quantum Chemistry Program||/common/mpqc-3.0.0a-2014.03.20/||
||Amber (Serial version)||12||Molecular Dynamics Package (Serial version)||/common/amber12-serial/||
||Amber (Parallel version)||12||Molecular Dynamics Package (Parallel version)||/common/amber12-parallel/||
||Amber''''''Tools (Serial version)||12||Set of several independently developed packages that work well by themselves, and with Amber itself (Serial version)||/common/amber12-serial/AmberTools/||
||Amber''''''Tools (Parallel version)||12||Set of several independently developed packages that work well by themselves, and with Amber itself (Parallel version)||/common/amber12-parallel/AmberTools/||
||CONFLEX (Serial version)||7||General-purpose molecular dynamics computation software (Serial version)||/common/conflex7/||
||CONFLEX (Parallel version)||7||General-purpose molecular dynamics computation software (Parallel version)||/common/conflex7/||
||CHEMKIN-PRO||15112||Detailed chemical reaction analysis support software||/common/chemkin-15112/||
||||||||Technical processing||
||MATLAB||R2013a||Numerical computing language||/common/matlab-R2013a/||

* You must be a Type A user to use ANSYS, ABAQUS, Patran, DEFORM-3D, COMSOL, GAUSSIAN, CHEMKIN-PRO, and MATLAB.
* To apply for registration as a Type A user, see:
http://imc.tut.ac.jp/research/form.


== Using the Software ==

For using the software, see [[en/ClusterSystemUsage|Using Cluster Systems]].

== Using the Xeon Phi Processor ==
Job execution using the Xeon Phi processor can be performed in native mode or offload mode. In native mode, the Xeon Phi processor is used as a single calculation node. The MPI program can be used as it is. In offload mode, a specific process within the source code can be offloaded to the Xeon Phi processor and executed. The Xeon Phi processor can be used in the same manner as GPGPU through OpenACC. 

=== Native Execution　===
==== Sample Source Program ====
sample_phi.c
{{{

#include <stdio.h>
#include <mpi.h>
int main(int argc, char **argv)
{
    int myid, nprocs;
    char hname[128]="";

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);

    gethostname(hname,sizeof(hname));
    if (myid==0)
        printf ("NUMBER OF PROCESSES: %3d\n", nprocs);
    printf("HELLO WORLD! (HOST NAME: %10s, MYID: %3d)\n", hname, myid);

    MPI_Finalize();

    return 0;
}

}}}
* The source content is the same as the sample source program for MPI (sample_c_mpi.c). Only the file name has been changed. 

==== Compiling ====
It is necessary to create an execution file for the Xeon CPU and another execution file for the Xeon Phi coprocessor. Use the Intel compiler.

'''Creating the execution file for the Xeon CPU'''

% mpiicc sample_phi.c -o sample_phi


'''Creating the execution file for the Xeon Phi coprocessor'''

% mpiicc -mmic sample_phi.c -o sample_phi.mic

'''Notes'''

Add “.mic” to the execution file for the Xeon Phi coprocessor.

Apart from “.mic”, the Names of the execution files for the Xeon CPU and the Xeon Phi coprocessor must be identical.


==== Sample Script to Submit Jobs ====

phi_native.sh

{{{
### sample

#!/bin/sh
#PBS -q wSrchq
#PBS -l nodes=3:ppn=2:Phi

MIC0_PROCS=3
MIC1_PROCS=1
source /common/torque/MIC/mkmachinefile.sh

cd $PBS_O_WORKDIR
mpirun -machinefile ${MACHINEFILE} -n ${PBS_NP} ./sample_phi

}}}


'''Note on #PBS -q'''

A job can be submitted to either the wSrchq or wLrchq queues.

'''Note on #PBS -l'''

Always specify :Phi in #PBS –l. This is the setting to specify the calculation node equipped with the Xeon Phi processors.

'''Note on MIC0_PROCS and MIC1_PROCS'''

Two Xeon Phi processors are installed on a single node in the system. '''MIC0_PROCS=''' and '''MIC1_PROCS=''' are used to specify the number of processes to invoke on the respective Xeon Phi processor on the calculation node. In the script file shown above, three calculation nodes are used by specifying nodes=3. One of the Xeon Phi processors on each node invokes 3 processes while the other invokes 1 process. The Xeon Phi’s capacity is 60 core/240 threads per processor. The value specified in '''MIC0_PROCS=''' and '''MIC1_PROCS=''' must be 240 or less.

'''Note on the number of parallel process'''

In the script file shown above invokes 18 processes in total and executes them in parallel. Assume that wsnd00, wsnd02, and wsnd03 are selected by '''nodes=3'''. To distinguish the two Xeon Phi processors on the calculation node, they are named '''Xeon Phi0''' and '''Xeon Phi1''', respectively. A total of 18 processes are invoked in this example: 2 processes in wsnd00, 3 processes in Xeon Phi0 in wsnd00， 1 process in Xeon Phi1 in wsnd00， 2 processes in wsnd02， 3 processes in Xeon Phi0 in wsnd02， 1 process in Xeon Phi1 in wsnd02， 2 processes in wsnd03，3 processes in Xeon Phi0 in wsnd03，and 1 process in Xeon Phi1 in wsnd03.

'''Others'''

Change the values for the following options as appropriate: the queues to submit the job, the values for nodes=，ppn=, MIC0_PROCS=, and MIC0_PROCS=, and the execution file name ( '''sample_phi''' in the above script file).


=== Offload Execution ===
==== Sample Source Program ====
tbo_sort.c

Location: /common/intel-2013SP1/composer_xe_2013_sp1.0.080/Samples/en_US/C++/mic_samples/LEO_tutorial/tbo_sort.c


==== Compiling ====
When compiling, always specify the '''-openmp option'''. Specify an Intel complier.
% icc -openmp tbo_sort.c -o tbo_sort

==== Sample Script to Submit Jobs ====
phi_offload.sh

{{{
### sample

#!/bin/sh
#PBS -q wSrchq
#PBS -l nodes=1:ppn=20:Phi

cd $PBS_O_WORKDIR
./tbo_sort

}}}

'''Note on #PBS -q'''

A job can be submitted to either the '''wSrchq''' or '''wLrchq''' queues.

'''Note on #PBS -l'''

There is no need to change '''#PBS -l nodes=1:ppn=20:Phi'''. This line means to use a single calculation node on Xeon Phi exclusively. 

'''Others'''

Change the queue to submit the job and the execution file name ('''tbo_sort''' in the above script file) as appropriate. 

=== Limitations ===
● Specify the IP address of Xeon Phi when using '''mpirun –machinefile'''. Usually, the value for '''–machinefile''' is generated automatically by mkmachinefile.sh.

● Only tcp is available as the MPI communication. tcp is used by default in mkmachinefile.sh. 

● Native execution is available on wsnd 00 to 09, 14, and 15 as of October 2015. The job scheduler automatically chooses the node unless otherwise specified explicitly.