HPC Project of the University of Parma and INFN Parma
User Guide (SLURM version)
[[https://www.hpc.unipr.it/dokuwiki/doku.php?id=calcoloscientifico:guidautente/slurm|{{ :calcoloscientifico:italian_flag.png?nolink&33 |Clicca qua per la versione italiana}}]] ** [[ calcoloscientifico:progetto | Project Description ]] ** (it) ===== Access / Login ===== In order to access the resources, you must be included in the LDAP database of the HPC management server. Requests for access or general assistance must be sent to: es_calcolo@unipr.it. Once enabled, the login is done through SSH on the login host: ssh @login.hpc.unipr.it Password access is allowed only within the University network (160.78.0.0/16). Outside this context it is necessary to use the University [[http://noc.unipr.it/public/vpn/home|VPN]] or access with public key authentication. ==== Access password-less between nodes ==== ** In order to use the cluster it is necessary to eliminate the need to use the password between nodes**, using public key authentication. It is necessary to generate on login.hpc.unipr.it the pair of keys, without passphrase, and add the public key in the authorization file (authorized_keys): Key generation. Accept the defaults by pressing enter: ssh-keygen -t rsa Copy of the public key into authorized_keys: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ==== External Access with public key Authentication ==== The key pair must be generated with the SSH client. The private key should be protected by an appropriate passphrase (it is not mandarory but recommended). The public key must be included in your authorized_keys file on the login host. If you use the SSH client for Windows **PuTTY** (http://www.putty.org), you need to generate the public and private key pair with PuTTYgen and save them into a file. The private key must be included in the Putty (or WinSCP) configuration panel: Configuration -> Connection -> SSH -> Auth -> Private key file for authentication The public key must be included in the .ssh/authorized_keys file on login.hpc.unipr.it Useful links for SSH clients configuration: [[ https://kb.iu.edu/d/aews | Linux, MacOS X, PuTTY ]], [[ https://kb.iu.edu/d/amzx | Windows SSH Secure Shell ]] The public key of the client (example ''client_id_rsa.pub'') must be inserted in the file ''~/.ssh/authorized_keys'' of login computer: Copy of the public key into authorized_keys: cat client_id_rsa.pub >> ~/.ssh/authorized_keys ==== File transfer ==== SSH is the only protocol for external communication and can also be used for file transfer. If you use a Unix-like client (Linux, MacOS X) you can use the command scp or sftp. On Windows systems, the most used tool is WinSCP (https://winscp.net/eng/docs/introduction). During the installation of WinSCP it is possible to import Putty profiles. SSH can also be used to mount a remote file-system using SshFS (see http://www.fis.unipr.it/dokuwiki/doku.php?id=calcoloscientifico:guidautente_slurm_en#sshfs) ===== Hardware ===== The current cluster is composed of the new computing nodes. {{:calcoloscientifico:hpcschema2018.png?200|}} **New computing nodes** * Cluster1 ([[ https://ark.intel.com/products/91766/Intel-Xeon-Processor-E5-2683-v4-40M-Cache-2_10-GHz | BDW]]) * 8 nodes with 2 Intel Xeon E5-2683v4 (2x16 cores, 2.1GHz, 40MB smartcache), 128 GB RAM (E4) * 9 nodes with 2 Intel Xeon E5-2680v4 (2x14 cores, 2.4GHz, 35MB smartcache), 128 GB RAM (DELL R730) * 1 nodes with 2 Intel Xeon E5-2683v4 (2x16 cores, 2.1GHz, 40MB smartcache), 1024 GB RAM (E4 - FAT MEM) * 1 nodes with 4 Intel Xeon E7-8880v4 (4x22 cores, 2.2GHz, 55MB smartcache), 512 GB RAM (HP - FAT CORES) * Cluster2 ([[http://www.nvidia.com/object/tesla-p100.html | GPU]]) * 2 nodes with 2 Intel Xeon E5-2683v4 (2x16 cores, 2.1GHz), 128 GB RAM, 7 GPU NVIDIA P100-PCIE-12GB (Pascal architecture). * Cluster3 ([[https://ark.intel.com/it/products/94035/Intel-Xeon-Phi-Processor-7250-16GB-1_40-GHz-68-core | KNL]]) * 4 nodes with 1 Intel Xeon PHI 7250 (1x68 cores, 1.4GHz, 16GB MCDRAM), 192 GB RAM. Nodes details: [[ calcoloscientifico:nodicalcolo | Node list ]] - [[ http://cm01.hpc.unipr.it/ganglia/ | Usage ]] (intranet only) Peak performance (double precision): 1 Node BDW -> 2x16 (cores) x 2.1 (GHz) x 16 (AVX2) = 1 TFlops, Max memory Bandwidth = 76.8 GB/s 1 GPU P100 -> 4.7 TFlops 1 node KNL -> 68 (cores) x 1.4 (GHz) x 32 (AVX512) = 3 TFlops, Max memory bandwidth = 115.2 GB/s ** Interconnection with [[https://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-edge-switches-100-series.html | Intel OmniPath ]] ** Peak performance: Bandwidth: 100 Gb/s, Latency: 100 ns. **[[ calcoloscientifico:benchmarks | Benchmarks: IMB, NBODY, HPL]]** ===== Software ===== The operating system for all types of nodes is CentOS 7.X. Environment Software (libraries, compilers e tools): [[ http://www.hpc.unipr.it/dokuwiki/doku.php?id=calcoloscientifico:softwareapplicativo | List ]] Some software components must be loaded in order to be used. To list the available modules: module avail To upload / download a module (example intel): module load intel module unload intel To list the loaded modules: module list ===== Storage ===== The login node and computing nodes share the following storage areas: ^ Mount Point ^ Env. Var. ^ Backup ^ Quota ^ Note ^ Support ^ ^ /hpc/home | $HOME | yes | 50 GB | Programs and data | SAN nearline | ^ /hpc/group (/hpc/account ?) | $GROUP | yes | 100 GB | Programs and data | SAN nearline | ^ /hpc/share | | | | Application software and database | SAN nearline | ^ /hpc/scratch | $SCRATCH | no | 1? TB, max 1 month | run-time data | SAN | ^ /hpc/archive | $ARCHIVE | no | | Archive | NAS/tape/cloud (1) | (1) Archive: foreseeen in 2019\\ [[ calcoloscientifico:priv:cluster:storage | Private Area ]] ===== Acknowledgement===== Remember to mention the project in the publications among the Acknowlegements: **// This research benefits from the HPC (High Performance Computing) facility of the University of Parma, Italy //** Old sentence, do no use: //Part of this research is conducted using the High Performance Computing (HPC) facility of the University of Parma. // The authors are requested to communicate the references of the publications, which will be listed on the site. ===== Job Submission with Slurm ===== The queues are scheduled with [[ https://slurm.schedmd.com | Slurm ]] Workload Manager. ==== Slurm Partitions ==== Work in progress ^Cluster ^Partition ^ job resources ^TIMELIMIT ^Max Running per user ^ | BDW | bdw | 2-256 core | 10-00:00:00 | | | KNL | knl | 2- core | 10-00:00:00 | | | GPU | gpu | 1-10 GPU ?? | 0-24:00:00 | 6 | | | vrt | 1 core |10-00:00:00 | | Global configurations: * Global Max job running per user : ?? * .. * Other partitions can be defined for special needs (ethrogeneous jobs, dedicated resources, ..) Private area [[calcoloscientifico:priv:cluster:pbspro| PBSpro]] - [[calcoloscientifico:priv:cluster:slurm| Slurm]] ==== Useful commands ==== Display the status of the queues in a synthetic way: sinfo Display the status of the individual queues in detail: scontrol show partition List of nodes and their status: sinfo -all Submission of a job: srun # interactive mode sbatch script.sh # batch mode squeue # Display jobs in the queue: sprio # show dynamic priority === Main options === This option selects the partition (queue) to use: **-p ** ( The default partition is bdw ?? ) Other options: * **-Nx**: where x is the number of chunk (cores group on the same node) * **-ny**: where y is the number of cores per each node (default 1) * **--gres=gpu:tesla:X**: where X is the number of GPU for each node (consumable resources) * ** --mem=**: requested memory for node * **--ntasks=Y**: where Y is the number of processes MPI for each node * **--cpus-per-task=Z**: where Z is the number of thread OpenMP for each process * **--exclusive**: allocate hosts exclusively (not shared with other jobs) Example of resource selection: -p bdw -N1 -n2 **-t ** Maximum execution time of the job. This data selects the queue to be used. ( Default: 0-00:72:00 verificare) Example: -t 0-00:30:00 **-A ** **--account=** Specifies the account to be charged for using resources. (Mandatory ??) See [[ calcoloscientifico:guidautente_slurm_en#teamwork | Teamwork paragraph ]] Example: -A T_HPC17A **-oe** redirects the standard error to standard output. **--mail-user=** The option --mail-user allows to indicate one or more e-mail addresses, separated by commas, that will receive the notifications of the queue manager. If the option is not specified, the queue system sends notifications to the user's university email address. In the case of guests, notifications are sent to the user's personal e-mail address. **--mail-type=** The option --mail-type allows to indicate the events that generate the sending of the notification: * **FAIL**: notification in case of interruption of the job * **BEGIN**: notification when job starts * **END**: notification when job stops * **NONE**: no notification * **ALL**: all notifications If the option is not specified, the code system sends only in case of interruption of work. Example: --mail-user=john.smith@unipr.it --mail-type=BEGIN,END ==== Priority ==== The priority (from queue to execution) is dynamically defined by three paramters: * Timelimit * Aging (waiting time in partition) * Fair share (amount of resources used in last 14 days) ==== Advance reservation ==== It is possible to define an advance reservation for teaching activities or special requests Advance reservation policy: ToDo For a request send an e-mail to es_calcolo@unipr.it ==== Accounting ==== Reporting Example: accbilling.sh -a -s 2018-01-01 -e 2018-04-10 accbilling.sh -u -s 2018-01-01 -e 2018-04-10 ===== Interactive jobs ===== Per verificare l'elenco delle risorse assegnate si può utilizzare la sottomissione interattiva con opzione -I. Una volta entrati in modo interattivo il comando **cat $SLURM_JOB_NODELIST** visualizza l'elenco delle risorse assegnate. Il comando **squeue -al** lista maggiori dettagli riguardo le risorse assegnate. srun -N -n -q -C -t -L cat $SLURM_JOB_NODELIST scontrol show job exit Examples: # 1 group (chunk) of 2 CPU type BDW and file system Scratch srun -N1 -n2 -p bdw -L SCRATCH # 2 chunks of 2 CPU type KNL and file system Scratch (they can stay on the same node) srun -N2 -n2 -p knl -L SCRATCH # The chunks must be on different nodes srun -N2 -n2 -p knl --scatter # 1 chunk with 2 GPU on GPU Cluster srun -N1 -p gpu --gres=gpu:2 -L SCRATCH # 2 chunks each with 2 GPU on different nodes srun -N2 --gres=gpu:2 -p gpu --scatter # --ntask=Y defines MPI how many processes need to be activated for each chunk srun -N2 -n1 –ntasks=1: -p bdw ===== Batch job ===== A shell script must be created that includes the SLURM options and the commands that must be executed on the nodes. to submit the job and related resource charge: sbatch -A scriptname.sh Each job is assigned a unique numeric identifier . At the end of the execution the two files containing stdout and stderr will be created in the directory from which the job was submitted. By default, the two files are named after the script with an additional extension: Stdout: .o Stderr: .e ==== Serial jobs, compiler GNU ==== Compilation of the example mm.c for the calculation of the product of two matrices: cp /hpc/share/samples/serial/mm.* . g++ mm.cpp -o mm Script ''mm.bash'' for the submission of the serial executable ''mm'': #!/bin/bash #< Request a chunk with a 1 CPU #SBATCH -p bdw -N1 -n32 #< Its declares that the job will last at most 30 minutes (days-hours:minutes:seconds) #SBATCH --time 0-00:30:00 #< Charge resources to own account #SBATCH $SBATCH_ACCOUNT #< Print the node name assigned cat $SLURM_JOB_NODELIST #< Enter the directory that contains the script cd "$SLURM_SUBMIT_DIR" #< Executes the program ./mm Submission: sbatch mm.bash See and the state: squeue To cancel the job in progress: scancel ==== Serial jobs, compiler Intel ==== Compiling the cpi_mc.c example for the calculation of Pi: cp /hpc/share/samples/serial/cpi/cpi_mc.c . module load intel icc cpi_mc.c -o cpi_mc_int Script ''cpi_mc.bash'' for the submission of the serial executable ''cpi_mc_int'': #!/bin/bash #< Print the node name assigned cat $SLURM_JOB_NODELIST #< Charge resources to own account #SBATCH $SBATCH_ACCOUNT #< Load the compiler module Intel module load intel #< Enter the directory that contains the script cd "$SLURM_SUBMIT_DIR" #< Executes the program N=10000000 ./cpi_mc_int -n $N Submission: sbatch cpi_mc.bash ==== Serial job, compiler PGI ==== Compiling the cpi_sqrt.c example for the computing of Pi: cp /hpc/share/samples/serial/cpi/cpi_sqrt.c . module load pgi pgcc cpi_sqrt.c -o cpi_sqrt_pgi Script ''cpi_sqrt_pgi.bash'' for the submission of the serial executable ''cpi_sqrt_pgi'': #!/bin/bash #< Options SLURM default. They can be omitted #SBATCH -p bdw -N1 -n32 #SBATCH --time 0-00:30:00 #< Charge resources to own account #SBATCH $SBATCH_ACCOUNT #< Print name node assigned cat $SLURM_JOB_NODELIST module load pgi #< Enter the directory that contains the script cd "$SLURM_SUBMIT_DIR" N=10000000 ./cpi_sqrt_pgi -n $N sbatch cpi_sqrt_pgi.bash ==== Job OpenMP with GNU 4.8 ==== cp /hpc/share/samples/omp/omp_hello.c . gcc -fopenmp omp_hello.c -o omp_hello Script ''omp_hello.bash'' with the request for 32 CPUs in exclusive use. #!/bin/bash #SBATCH -p bdw -N1 -n32 #SBATCH --exclusive #SBATCH -t 0-00:30:00 #SBATCH $SBATCH_ACCOUNT #< Merge strerr with stdout #SBATCH -oe cat $SLURM_JOB_NODELIST echo #OMP_NUM_THREADS : $OMP_NUM_THREADS cd "$SLURM_SUBMIT_DIR" ./omp_hello ==== Job OpenMP with Intel ==== ==== SLURM FATTO ==== module load intel cp /hpc/share/samples/omp/mm/omp_mm.cpp . Script ''mm_omp.bash'' with the request of 1 whole node with at least 32 cores: #!/bin/bash #SBATCH -p bdw_debug -N1 -n32 #SBATCH --time 0-00:30:00 #SBATCH -oe #SBATCH --account= cat $SLURM_JOB_NODELIST cd "$SLURM_SUBMIT_DIR" module load intel icpc -qopenmp omp_mm.cpp -o omp_mm # To change the number of threads: export OMP_NUM_THREADS=8 echo OMP_NUM_THREADS : $OMP_NUM_THREADS ./omp_mm ==== Job OpenMP with PGI ==== ===== FATTO SLURM ==== cp /hpc/share/samples/omp/mm/omp_mm.cpp . Script ''omp_mm_pgi.bash''. The BDW cluster consists of nodes with 32 cores. The OMP_NUM_THREADS variable is by default equal to the number of cores. If we want a different thread number we can indicate it in the row --cpus-per-task #!/bin/sh #SBATCH -p bdw_debug -N1 -n32 #SBATCH --cpus-per-task=4 #SBATCH --time 0-00:30:00 SBATCH -oe cat $SLURM_JOB_NODELIST cd "$SLURM_SUBMIT_DIR" module load pgi pgc++ -mp omp_mm.cpp -o omp_mm_pgi echo OMP_NUM_THREADS : $OMP_NUM_THREADS ./omp_mm_pgi sbatch -A omp_mm_pgi.bash ==== Job OpenMP with GNU 5.4 ==== ====SLURM FATTO ==== cp /hpc/share/samples/omp/cpi/* . sbatch -A cpi2_omp.bash python cpi2_omp.py ==== Job MPI, GNU OpenMPI ==== ==== SLURM FATTO ==== module load gnu openmpi cp /hpc/share/samples/mpi/mpi_hello.c . mpicc mpi_hello.c -o mpi_hello Script ''mpi_hello.sh '' for using GNU OpenMPI: #!/bin/bash # 4 chunk of 16 CPU each. Executes a process MPI for each CPU #SBATCH -p bdw_debug -N4 -n16 #SBATCH -n 16 #SBATCH --time 0-00:30:00 #SBATCH -oe echo "### SLURM_JOB_NODELIST ###" cat $SLURM_JOB_NODELIST echo "####################" module load gnu openmpi cd "$SLURM_SUBMIT_DIR" mpirun mpi_hello sbatch -A mpi_hello.bash ==== Job MPI with Intel MPI ==== ==== SLURM FATTO ==== module load intel intelmpi which mpicc cp /hpc/share/samples/mpi/mpi_mm.c . mpicc mpi_mm.c -o mpi_mm_int Script ''mpi_mm_int.sh '' for using Intel MPI: #!/bin/sh # 4 chunk of 16 CPU each. Executes one process MPI for each CPU #SBATCH -p bdw_debug -N4 -n16 #SBATCH -n 16 #SBACTH --time 0-00:30:00 #SBATCH -oe echo "### SLURM_JOB_NODELIST ###" cat $SLURM_JOB_NODELIST echo "####################" module load intel intelmpi cd "$SLURM_SUBMIT_DIR" mpirun mpi_mm_int ==== Job MPI with PGI ==== ==== SLURM FATTO ==== module load pgi openmpi which mpicc cp /hpc/share/samples/mpi/mpi_hello.c . mpicc mpi_hello.c -o mpi_hello_pgi Script ''mpi_hello_pgi.sh '' for using OpenMpi di PGI: #!/bin/sh #SBATCH -p bdw_debug -N4 -n16 #SBATCH --time 0-00:30:00 #SBATCH -oe echo "### SLURM_JOB_NODELIST ###" cat $SLURM_JOB_NODELIST echo "####################" NPUSER=$(cat $SLURM_JOB_NODELIST | wc -l) module load cuda pgi openmpi cd "$SLURM_SUBMIT_DIR" mpirun -hostfile $SLURM_JOB_NODELIST --npernode 1 mpi_hello_pgi ==== Job MPI + OpenMP with GNU OpenMPI ==== ==== SLURM FATTO ==== module load gnu openmpi cp -p /hpc/share/samples/mpi+omp/mpiomp_hello.c . mpicc -fopenmp mpiomp_hello.c -o mpiomp_hello_gnu Script mpiomp_hello_gnu for using OpenMPI di PGI: #!/bin/sh # 4 chunk of 16 CPU each, 1 process MPI for each chunk, 16 thread OpenMP for process #SBATCH -p bdw_debug -N4 -n16 #SBATCH -n 4 #SBATCH --cpus-per-task=16 # Number of threads OpenMP for each process MPI #SBATCH --time 0-00:30:00 #SBATCH -oe echo "### SLURM_JOB_NODELIST ###" cat $SLURM_JOB_NODELIST echo "####################" module load gnu openmpi cd "$SLURM_SUBMIT_DIR" mpirun mpiomp_hello_gnu ==== Job MPI + OpenMP with Intel MPI ==== ==== SLURM FATTO ==== module load intel intelmpi cp /hpc/share/samples/mpi+omp/mpiomp-hello.c . mpicc -qopenmp mpiomp_hello.c -o mpiomp_hello_int #!/bin/sh # 4 chunk of 16 CPU each, 1 process MPI for each chunk, 16 thread OpenMP for process #SBATCH -p bdw_debug -N4 -n16 #SBATCH -n 1 #SBATCH --cpus-per-task=16 # Number of threads OpenMP for each process MPI #SBATCH --time 0-00:30:00 #SBATCH -oe echo "### SLURM_JOB_NODELIST ###" cat $SLURM_JOB_NODELIST echo "####################" module load intel intelmpi cd "$SLURM_SUBMIT_DIR" mpirun mpiomp_hello_int ===== Use of cluster KNL ===== ==== SLURM FATTO SE VIENE ATTIVATO KNL ==== The compiler to use is Intel. The selection of the KNL cluster is done by specifying -p knl_ as required resources. The maximum number of cores (ncpus) selectable per node is 68. Each physical core includes 4 virtual cores with hyperthreading technology, for a total of 272 per node. #!/bin/sh # 4 whole nodes. Executes one process MPI for each node and 128 threads for process #SBATCH -p knl_debug -N4 -n1 #SBATCH -n 4 #SBATCH --cpus-per-task=128 # Number of threads OpenMP for each process MPI #SBATCH --time 0-00:30:00 #SBATCH -oe #SBATCH --time 0-00:30:00 #SBATCH -oe echo "### SLURM_JOB_NODELIST ###" cat $SLURM_JOB_NODELIST echo "####################" module load intel intelmpi cd "$SLURM_SUBMIT_DIR" cp /hpc/share/samples/mpi+omp/mpiomp_hello.c . mpicc -qopenmp mpiomp_hello.c -o mpiomp_hello_knl mpirun mpiomp_hello_knl ===== Use of cluster GPU ===== == SLURM FATTO MA DA RIVEDERE BENE == The GPU cluster consists of 2 machines with 7 GPUs each. The GPUs of a single machine are identified by an integer ID that goes from 0 to 6. The compiler to use is nvcc: [[http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html|NVIDIA CUDA Compiler]] Compilation example: cp /hpc/share/samples/cuda/hello_cuda.cu . module load cuda nvcc hello_cuda.cu -o hello_cuda The GPU cluster selection is done by specifying -p gpu_ and --gres = gpu: <0-6> among the required resources. Example of submission on 1 of the 7 GPUs available on a single node of the GPU cluster: #!/bin/sh # 1 node with 1 GPU #SBATCH -p gpu_debug -N1 #SBATCH --gres=gpu:tesla:1 #SBATCH --time 0-00:30:00 #SBATCH -oe echo "### SLURM_JOB_NODEFILE ###" cat $SLURM_JOB_NODEFILE echo "####################" module load cuda cd "$SLURM_SUBMIT_DIR" ./hello_cuda Example of submission of the N-BODY benchmark on all 7 GPUs available in a single node of the GPU cluster: #!/bin/sh # 1 node with 7 GPU #SBATCH -p gpu_debug -N1 #SBATCH --gres=gpu:tesla:7 #SBATCH --time 0-00:30:00 #SBATCH -oe echo "### SLURM_JOB_NODEFILE ###" cat $SLURM_JOB_NODEFILE echo "####################" module load cuda cd "$SLURM_SUBMIT_DIR" /hpc/share/tools/cuda-9.0.176/samples/5_Simulations/nbody/nbody -benchmark -numbodies 1024000 -numdevices=5 In the case of N-BODY, the number of GPUs to be used is specified using the -numdevices option (the specified value must not exceed the number of GPUs required with the ngpus option). In general, the GPU IDs to be used are derived from the value of the CUDA_VISIBLE_DEVICES environment variable. In the case of the last example we have: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6 ===== Teamwork ===== To share files among members of a group it is necessary to distinguish the type of activity In the ** interactive mode ** on the login node the command '' newgrp '' modifies the primary group and the permissions of the new files: newgrp The newgrp command on the HPC cluster also automatically executes the command to enter the group directory (/ hpc / group / ): cd "$GROUP" In the ** Batch mode ** you must indicate the group to be used with the following directive: #SBATCH --account= ===== Scaling test ===== ==== SLURM FATTO ==== To sequentially launch a series run battery in the same job, for example to check the scaling of an algorithm: cp /hpc/share/samples/serial/cpi/cpi_mc.c . gcc cpi_mc.c o cpi_mc Script launch_single.sh #!/bin/bash cd "$SLURM_SUBMIT_DIR" for N in $(seq 1000000 1000000 10000000) do CMD="./cpi_mc -n $N" echo "# $CMD" eval $CMD >> cpi_mc_scaling.dat done sbatch -A launch_single.sh The outputs of the different runs are written to the cpi_mc_scaling.dat file. To generate a scaling plot we can use the python matplotlib library: cp /hpc/share/samples/serial/cpi/cpi_mc_scaling.py . python cpi_mc_scaling.py ===== Job Array ===== === SLURM FATTO === Using a single SLURM script it is possible to subdue a battery of Jobs, which can be executed in parallel, specifying a different numerical parameter for each submissive job. The -J option specifies the numeric sequence of parameters. At each launch the value of the parameter is contained in the $ SLURM_ARRAY_TASK_ID variable Example: Starts N job for the computing of Pi with a number of intervals increasing from 100000 to 900000 with increment of 10000: cp /hpc/share/samples/serial/cpi/cpi_mc . gcc cpi_mc.c o cpi_mc Script slurm_launch_parallel.sh #!/bin/sh #SBATCH -J 100000-900000:10000 cd "SLURM_SUBMIT_DIR" CMD="./cpi_mc -n ${SLURM_ARRAY_TASK_ID}" echo "# $CMD" eval $CMD sbatch -A slurm_launch_parallel.sh Gather the outputs: grep -vh '^#' slurm_launch_parallel.sh.o*.* ===== Job MATLAB ===== === SLURM FATTO === [[https://www.mir.wustl.edu/Portals/0/Documents/Uploads/CHPC/PCT_Masterclass.pdf|Parallel Computing with MATLAB]] == Execution of a MATLAB serial program == cp /hpc/share/samples/matlab/pi_greco.m . Script matlab.sh #!/bin/sh #SBATCH -p bdw_debug -N1 -n1 #SBATCH --time 0-00:30:00 cd "$SLURM_SUBMIT_DIR" module load matlab matlab -nodisplay -r pi_greco sbatch -A matlab.sh == Execution of a parallel job with MATLAB == cp /hpc/share/samples/matlab/pi_greco_parallel.m . Script ''matlab_parallel.sh''. //**La versione di Matlab installata sul cluster consente l'utilizzo massimo di cores dello stesso nodo.**// qua bisogna specificare quanti core utilizzabili...ho messo 4 per il momento #!/bin/sh #SBATCH -p bdw_debug -N1 -n4 #SBATCH --time 0-00:30:00 cd "$SLURM_SUBMIT_DIR" module load matlab matlab -nodisplay -r pi_greco_parallel sbatch -A matlab_parallel.sh == Execution of a program MATLAB on GPU == cp /hpc/share/samples/matlab/matlabGPU.m . # ---- da fare ---- Script matlabGPU.sh #!/bin/bash #SBATCH -p bdw_debug -N1 -n1 #SBATCH --gres=gpu:1 #SBATCH --time 0-00:30:00 cd "$SLURM_SUBMIT_DIR" module load matlab cuda matlab -nodisplay -r matlabGPU.m sbatch -A matlabGPU.sh ===== Job MPI Crystal14 ===== Script '' crystal14.sh '' for submitting the MPI version of Crystal14. Requires 4 nodes from 8 cores and starts 8 MPI processes per node: #!/bin/sh #SBATCH --job-name="crystal14" #Job name #SBATCH -p bdw_debug -N4 -n8 #Resource request #SBATCH -n8 #SBATCH --time 0-168:00:00 # input files directory CRY14_INP_DIR='input' # output files directory CRY14_OUT_DIR='output' # input files prefix CRY14_INP_PREFIX='test' # input wave function file prefix CRY14_F9_PREFIX='test' source /hpc/share/applications/crystal14 We recommend creating a folder for each simulation. In each folder there must be a copy of the '' crystal14.sh '' script. The script contains the definition of four variables: * **CRY14_INP_DIR**: the input file or files must be in the 'input' subfolder of the current directory. To use the current directory, comment the line with the definition of the CRY14_INP_DIR variable. To change subfolder, change the value of the CRY14_INP_DIR variable. * **CRY14_OUT_DIR**: the output files will be created in the 'output' subfolder of the current folder. To use the current directory, comment the line with the definition of the CRY14_OUT_DIR variable. To change subfolder modify the value of the variable CRY14_OUT_DIR. * **CRY14_INP_PREFIX**: the file or input files have a prefix that must coincide with the value of the CRY14_INP_PREFIX variable. The string 'test' is purely indicative and does not correspond to a real case. * **CRY14_F9_PREFIX**: the input file, with extension 'F9', is the result of a previous processing and must coincide with the value of the variable CRY14_F9_PREFIX. The string 'test' is purely indicative and does not correspond to a real case. The '' crystal14.sh '' script includes, in turn, the system script '' / hpc / software / bin / hpc-pbs-crystal14 ''. The latter can not be changed by the user. == Submission of the shell script == Navigate to the folder containing '' crystal14.sh '' and run the following command to submit the script to the job scheduler: sbatch ./crystal14.sh == Analysis of files produced by Crystal14 during job execution == During execution of the job a temporary '' tmp '' folder is created which contains the two files: nodes.par machines.LINUX The '' nodes.par '' file contains the names of the nodes that participate in the parallel computing. The '' machines.LINUX '' file contains the names of the nodes that participate in the parallel computing with a multiplicity equal to the number of MPI processes started on the node. To locate the temporary folders produced by Crystal14 during the execution of the job, run the following command directly from the login node: eval ls -d1 /hpc/node/wn{$(seq -s, 81 95)}/$USER/crystal/* 2>/dev/null Be careful because the previous command contains the names of the currently available calculation nodes. This list and the corresponding command may change in the future. To check the contents of the files produced by Crystal14 during the execution of the job, the user can move to one of the folders highlighted by the previous command. At the end of the execution of the job, the two files '' machines.LINUX '' and '' nodes.par '' are deleted. The temporary folder '' tmp '' is deleted only if it is empty. It is therefore not necessary to log in with SSH to the nodes participating in the processing to check the contents of the files produced by Crystal14. ===== Job Gromacs ===== === INVARIATO === To define the GMXLIB environment variable, add the following lines to the file ''$HOME/.bash_profile'': GMXLIB=$HOME/gromacs/top export GMXLIB The path '' $ HOME / gromacs / top '' is purely indicative. Modify it according to your preferences. ==== Job Gromacs OpenMP ==== === SLURM FATTO === Script '' mdrun_omp.sh '' to exclusively request a node with 32 cores and start 16 OpenMP threads: #!/bin/sh #SBATCH -p bdw_debug -N1 -n32 #SBATCH --cpus-per-task=16 # Number of threads OpenMP #SBATCH --exclusive #SBATCH --time 0-24:00:00 test "$SLURM_ENVIRONMENT" = 'SLURM_BATCH' || exit cd "$SLURM_SUBMIT_DIR" module load gnu openmpi source '/hpc/share/applications/gromacs/5.1.4/mpi_bdw/bin/GMXRC' gmx mdrun -s topology.tpr -pin on This will initiate a single MPI process and will result in suboptimal performance. ==== Job Gromacs MPI ed OpenMP ==== === SLURM FATTO === Script '' mdrun_mpi_omp.sh '' to exclusively request a node with 32 cores and start 8 MPI processes (the number of OpenMP threads will be calculated automatically): #!/bin/sh #SBATCH -p bdw_debug -N2 -n32 #SBATCH -n 8 #SBATCH --exclusive #SBATCH --time 0-24:00:00 test "$SLURM_ENVIRONMENT" = 'SLURM_BATCH' || exit cd "$SLURM_SUBMIT_DIR" module load gnu openmpi source '/hpc/share/applications/gromacs/5.1.4/mpi_bdw/bin/GMXRC' NNODES=$(cat $SLURM_JOB_NODELIST | sort -u | wc -l) NPUSER=$(cat $SLURM_JOB_NODEFILE | wc -l) OMP_NUM_THREADS=$((OMP_NUM_THREADS/(NPUSER/NNODES))) mpirun gmx mdrun -s topology.tpr -pin on This will initiate multiple MPI processes and will achieve optimal performance. ===== Job Abaqus ===== ==== Job Abaqus MPI ==== === SLURM FATTO === Example script '' abaqus.sh '' to run Abacus on 1 node, 32 cores, 0 GPUs: #!/bin/bash # walltime --time : estimated execution time, max 240 hours (better an estimate for excess than effective) #SBATCH -p bdw_debug -N1 -n32 #SBATCH --time 0-240:00:00 cat $SLURM_JOB_NODELIST # Modules necessary for the execution of Abacus module load gnu intel openmpi cd "$SLURM_SUBMIT_DIR" abaqus j=testverita cpus=32 # j= nomefile.inp ==== Job Abaqus MPI with GPU ==== ==== SLURM FATTO SE ATTIVATO CLUSTER GPU ??? ==== Example script '' abaqus-gpu.sh '' to run Abacus on 1 node, 6 cores, 1 GPU: #!/bin/bash # walltime --time : estimated running time, max 240 hours (better than a slightly higher than actual estimate) #SBATCH -p gpu_dbg -N1 -n6 #SBATCH --gres=gpu:1 #SBATCH --time 0-00:30:00 cat $SLURM_JOB_NODELIST # Modules necessary for the execution of Abacus module load gnu intel openmpi cuda cd "$SLURM_SUBMIT_DIR" abaqus j=testverita cpus=6 gpus=1 # j= filename.inp ===== SSHFS ===== === DA FARE === To exchange data with a remote machine on which an ssh server is installed, you can use it [[ https://wiki.archlinux.org/index.php/Sshfs | SSHFS ]]. SSHFS is a file-system for Unix-like operating systems (MacOsX, Linux, BDS). This file system allows you to locally mount a folder located on a host running SSH server. This software implements the FUSE Kernel module. Currently it is only installed on login.pr.infn.it. Alternatively it can be installed on the remote linux machine to access its data on the cluster. To using it: mkdir remote # create the mount point sshfs @: remote # mount of remote file-system df -h # see mounted files system ls remote/ fusermount -u remote # umount the file system ===== VTune ===== VTune is a performance profiler from Intel and is available on the HPC cluster. General information from Intel: https://software.intel.com/en-us/get-started-with-vtune-linux-os Local Guide [[vtune]] (work in progress) ===== CINECA guides ===== * {{:calcoloscientifico:hpc-access.pdf| Access to HPC resources in Italy and Europe }} * [[https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.1%3A+MARCONI+UserGuide | Marconi UserGuide ]] * [[https://wiki.u-gov.it/confluence/display/SCAIUS/UG2.4%3A+Data+storage+and+FileSystems | Marconi storage]] * [[https://wiki.u-gov.it/confluence/display/SCAIUS/Migration+from+PBS+to+SLURM+@+Cineca | Migration from PBS to SLURM @ Cineca ]] ===== Altre risorse ===== [[ https://hpc.llnl.gov/training/tutorials | LLNL HPC tutorials ]]