Hands-on Scientific Computing

Hands-on SciComp is an self-study course and reference material for the practical side of data science and scientific computing - what you need to know but don’t learn in your studies. It is focused on what is needed as you transition from academic studies to real projects or data-based/computational research - including student projects.

What you find here isn’t unique, and you can find similar materials from Software Carpentry, CodeRefinery, and more — but those all are roughly designed as single, guided courses. Hands-on SciComp links to all of these materials and organizes them for self-study and determining where you need to start and where you need to end up.

Our material is divided into different levels, so that you can focus on what’s relevant to you. First, check how demanding your work is so that you can choose the right courses to focus on.

To earn credits or a certificate from completing this course, you can visit the course exercise page here.

Level dependencies
Level For who? Covers what?
A: Basics: What computing and how? Mini-level for everyone who’s doing science with your computer or may need to rely on computing resources later. What types of resources are available, when you’d use them, and how to get help. How to set up your computer to do scientific work. What comes next.
B: Related science skills Everyone publishing in a somewhat computational field. Making figures, papers, posters, and so on they way it’s done in computational fields.
C: Scientific computing (Linux and shell) Everyone who’s doing more than pointing and clicking single applications on your own computer or needs more computing power. In this level, you learn how to extend your power beyond your own computer or existing applications. Includes data management, scripting, Linux, and servers. Linux and the shell are a major point here: this is the defacto (and only) good way to increase power. Equal to the B level.
D: Clusters and high-performance computing Those who need more power than their own computer and need to move to a cluster, whether or not it’s highly parallelized. Computing on clusters and remote servers, more advanced Linux, more scripting, batch systems, HPC data management.
E: Scientific coding When you start writing your own software to do your research. Version control, how to manage code, software, and data even more. We don’t cover programming itself, just the untaught parts about how to use it as a researcher. Equal to the D level.
F: Advanced high performance computing Those who are programming the most demanding parallel scientific applications. MPI (message passing interface, a parallel programming framework), OpenMP (another one), GPU programming, etc. And anything more advanced.

We have material for different learning styles:

Video intro Material Local info Exercises
A short video introduction and demo to get you started and show you real examples. Reference material to read, which covers anything you may need. If your university has specific local information to supplement the general info, it can be found here. Practice exercises. In the future, doing these will allow you to get study credits.

A: Basics

  About Questions Video Intro Reading Aalto
A01 Introduction to scientific computing Get started with common scientific computing guidelines. Good computing practices for everyone regardless of skills Welcome, Researchers
A10 Configuring Linux for scientific work Linux is great for scientific work. This goes over some key things to install to get going. Software Carpentry set up material
A11 Configuring Mac for scientific work Get your Mac computer set up for scientific computing tasks. Software Carpentry tutorial for Shell, Git and Nano installations on a Mac. Software Carpentry set up material
A12 Configuring Windows for scientific work Get your Windows computer set up for scientific computing tasks. Software Carpentry Git Bash tutorial for Windows. Software Carpentry set up material

See the full list for more.

C: Linux and shell

  About Questions Video Intro Reading Aalto
C10 Basic shell Let’s face it: the linux command line is the basis of most data science if you are doing more than running other people’s programs. >How does the shell work? >When to use a CLI instead of a GUI? Get started with basic shell commands Software Carpentry shell-novice sections 1-4. The first part of our shell course is good too.
C23 Text editors and IDEs Your best friend is a good text editor - sometimes you just need to edit things quickly on some remote system. >Which tools to use for code development and editing? Get to know VS Code tutorial series Software Carpentry shell-novice, “Create a text file” part of section 3 and IDE tutorial by CodeRefinery.
C20 Shell scripting If you can do it on the Linux shell, you can automate it. >How to make use of shell scripting tools in repetitive task automation? From shell scripting tutorials see for example: 3-4, 7-10, 21, 23-25, 27 and 35-40 (these topics introduced also in reading material). Continue with the Science-IT Linux shell tutorial part 2.
C21 Version control for you Version control lets you track changes, go back in time, and collaborate on code and papers: an absolute requirement for scientific computing. >What is Git? >How to initialize a Git repository? Why bother with version control and an introduction for beginners to managing code on remote GitHub repositories CodeRefinery Introduction to version control
C22 SSH and remote access A short but important course: how to do work remotely. Different expert tips for making ssh better, too. >What does SSH mean and when to use it? Introduction to secure shell by Software Carpentry SSH for working on a remote machine. How to make ssh work better by Aalto Scicomp
C23 Make Automate the repetitive stuff with Make. >How can a Makefile be useful in your large project? Episodes on Make by Software Carpentry Short introduction on what is a Makefile and basic operations. For more information on Makefiles see GNU Make Manual

See the full list for more.

D: Clusters and High Performance Computing

  About Questions Video Intro Reading Aalto
D01 What is HPC? Before you can use larger resources, you need to understand the difference from your own computers HPC Intro
D20 Modules and software Using and installing software on a cluster is different from your own computer, because hundreds of people are sharing it. Modules are the solution. Triton tutorials: modules applications
D21 Batch systems On a cluster, you have to share resources with others. Slurm is one batch queuing system that makes it possible. Triton tutorials: interactive, serial, array
D22 HPC Storage Storage turns out to be just as important as computing power. There are different places available, each with different advantages. Triton tutorials: storage basics. More advanced: lustre, local storage, small files
D23 Parallel computing The point of a cluster is to run things in parallel. How does this work? Triton tutorials: parallel.
D24 Advanced shell scripting and automation Hands-on shell scripting, putting everything together to automate large computations on the cluster. Various courses, finishing the linux shell tutorial is a good start. The Advanced bash scripting guide is a classic.

See the full list for more.

E: Scientific coding

  About Questions Video Intro Reading Aalto
E60 Modular code development Break your large programs into small problems by separating aspects of desired functionality to different sub-modules. >How to divide code into independent modules? >What are pure functions like? Python example of breaking code into small components CodeRefinery lesson on Modular code development
E61 Software testing It is important to ensure that your program performs effectively and without failures. Adding tests for your software can save a lot of your time later. >How to test code on different levels? >What kind of testing tools are there? Software testing fundamentals by Software Carpentry CodeRefinery lesson
E62 Profiling Code efficiency is critical especially in HPC. Learn to measure the performance of your programs. >What is profiling used for? Profiling Python code with cProfile Profiling tools for Linux and profiling for C and Python Triton profiling guide
E63 Debugging Detect, investigate and resolve bugs. >How to debug different types of errors? Debugging strategies For now see Triton’s debugging guide
E02 Software Licensing Sharing your work can be very beneficial. Take a look at social coding and software licensing. >What is free software? >Why should you share your code? Brief introduction to differences between open and closed source software CodeRefinery lesson
E04 Documentation Document your project so other people can easily use the code and even contribute to it. >What should be included in a documentation? Documentation with Sphinx CodeRefinery lesson on documentation
E03 Reproducible research How different tools can improve reproducibility. >Which tools can help with reproducibility? What is reproducible research Lesson by CodeRefinery

See the full list for more.

F: Advanced high performance computing

  About Questions Video Intro Reading Aalto
Fxx Parallel programming computers This is an academic course taught in the CS department. It mainly covers OpenMP and CUDA. Usually taught in 5th period (Apr-May), search MyCourses/Oodi for CS-E4580.
Fxx GPU Programming This was an advanced guest course, useful if you want to know how to program GPU applications. Materials here.
Fxx MPI Programming This was an advanced guest course, useful if you want to know internals of MPI or program MPI applications. Materials here.
Fxx HTCondor Condor allows you to use many workstations as a high throughput cluster, ideal for mid-range embarrassingly parallel problems. Materials here.

See the full list for more.