HPC Kickstart

This page contains a virtual high performance computing (HPC, or more precisely, cluster computing) kickstart course. It is not part of the main Hands-on Scientific Computing flow, but is an expanded version of the “D” level material.

This page currently contains an online course from Aalto University (Aalto Scientific Computing), so the exact examples may not work on other clusters, but the theory and concepts will - you need to combine this outline with documentation from your own site.

In the future, this page will be adjusted to the best topics in the best order from all courses combined, which means various material may be mixed-and-matched so that the transitions are not perfect, but it will still have the best effect overall.

Introductory material

These can be used in whatever order suits you, or you can watch the intro and then go on.

  • Day 1 introduction (Video, Lecture)
  • HPC theory crash course: some background about high-performance and cluster computing, not strictly necessary to move on to the other material (and could even be watched at the end) (Video, Slides)
  • How to ask for help with supercomputers (Video, Slides )
  • Your future in scientific computing. (Video, Outline)

Main tutorials

“How to connect and use software/data” track:

  • Connecting to the cluster ( Video Reading `Q&A <>`__ )
    • Accounts, ssh, ssh keys, different operating systems, Jupyter, remote desktop environments
  • Data storage * Video, Reading, `Q&A <>`__ )
    • About storage, different storage locations and properties, quotas, access on other computers, remote access
  • Applications on the cluster (Video, Reading, `Q&A <>`__)
    • How to use other software, common applications, singularity containers, requesting new software
  • Software modules (Video, Reading, `Q&A <>`__)
    • The module command, searching for modules, loading modules, module versions, module collections.

“How to actually run stuff” track. This goes into detail about the batch system and accessing resources:

  • Interactive jobs (Video, Reading, `Q&A <>`__ )
    • Scheduling systems, Slurm, requesting resources, running jobs you can see directly.
  • Serial jobs (Video, Reading, `Q&A <>`__)
    • Jobs that run without your interaction, scripting jobs, checking output, viewing history, cancelling jobs.
  • Monitoring jobs (Video, Reading, `Q&A <>`__)
    • Checking actual resource usage of jobs (CPU/memory/GPU) while running and after finished, adjusting resource requirements, reducing resource wastage.
  • Parallel jobs (Video, Reading, `Q&A <>`__)
    • Types of parallelism, shared memory (OpenMP), message passing (MPI), multiprocessing, how to run each of them, monitoring performance (doesn’t cover writing new programs that can do this).
  • Array jobs (Video, Reading, `Q&A <>`__)
    • What is an array job, doing the same thing many times, serial job → array job, various tips and examples.
  • GPU jobs (Video, Reading, `Q&A <>`__)
    • GPU programs, machine learning frameworks, compiling CUDA code, requesting a GPU, monitoring efficiency, common efficiency traps.

Special topics

These special topics can be used in whatever order suits you, if they are relevant to your interests.

  • Scientific computing workflows: different ways of actually using computing resources. Recommended to put the cluster into perspective with other types of needs. (Video, Reading, `Q&A <>`__)
  • Currently available resources at CSC, Finland: The above material is mostly abut what you can find at one university on a cluster (though even bigger clusters use the same interface). This talks about other resources available at a national computing center (other countries will be somewhat similar). (Video, Reading, `Q&A <>`__)
  • Cluster etiquette: We learned what you can do, but what should you do to not annoy others on the cluster? See more in Research Software Hour (Video)
  • “How to tame the cluster”, mostly the same material as this whole course, compressed into one hour, with a complete example worked out. (Video)