Introduction to MPI

Overview

Questions

What is MPI?
Why should I run my simulations in parallel?
How can I execute scripts in parallel?

Objectives

Describe MPI.
Explain how MPI can provide faster performance on HPC systems.
Show how to write a single program that can execute in serial or parallel.
Demonstrate how to execute that program with mpirun.
Explain how strong scaling enables higher performance at the cost of some efficiency.

Introduction

MPI (message passing interface) is a library that enables programs to execute in parallel. MPI is commonly available on HPC (high performance computing) clusters. HOOMD-blue uses MPI to execute on many CPUs and/or GPUs. Using more resources in parallel can provide higher performance than the same simulation run on one CPU core or one GPU.

The Simulation Script

This tutorial executes the Lennard-Jones particle simulation from a previous tutorial. See Introducing Molecular Dyamics for a complete description of this code.

You can also run HPMC and other types of simulations in HOOMD-blue using MPI. All operations in HOOMD-blue support MPI unless otherwise noted in their documentation.

[1]:

%pycat lj_performance.py

import hoomd

# Initialize the simulation.
device = hoomd.device.CPU()
sim = hoomd.Simulation(device=device)
sim.create_state_from_gsd(filename='random.gsd')

# Set the operations for a Lennard-Jones particle simulation.
integrator = hoomd.md.Integrator(dt=0.005)
cell = hoomd.md.nlist.Cell()
lj = hoomd.md.pair.LJ(nlist=cell)
lj.params[('A', 'A')] = dict(epsilon=1, sigma=1)
lj.r_cut[('A', 'A')] = 2.5
integrator.forces.append(lj)
nvt = hoomd.md.methods.NVT(kT=1.5, filter=hoomd.filter.All(), tau=1.0)
integrator.methods.append(nvt)
sim.operations.integrator = integrator

# Run a short time before measuring performance.
sim.run(100)

# Run the simulation and print the performance.
sim.run(1000)
print(sim.tps)

lj_performance.py is a file in the same directory as this notebook. Due to the way MPI launches parallel jobs, the code must be in a file instead of a notebook cell. %pycat is an IPython magic command that displays the contents of a Python file with syntax highlighting.

Compare this script to the one used in Introducing Molecular Dyamics. The only difference is the addition of print(sim.tps) which prints the performance in time steps per second. The same script can be run in serial or in parallel on different numbers of CPU cores.

Run the simulation with MPI

Use the MPI launcher mpirun to execute this script in parallel on any number of CPU cores given by the -n option.

Your HPC cluster may use a different launcher that may take different arguments, such as mpiexec, ibrun, or jsrun. See your cluster’s documentation to find the right launcher to use.

In Jupyter, the “!” magic command is equivalent to typing the given command in a shell.

[2]:

!mpirun -n 1 python3 lj_performance.py

270.2858299680495

[3]:

!mpirun -n 2 python3 lj_performance.py

notice(2): Using domain decomposition: n_x = 1 n_y = 1 n_z = 2.
445.91139383875304
445.91139383875304

[4]:

!mpirun -n 4 python3 lj_performance.py

notice(2): Using domain decomposition: n_x = 1 n_y = 2 n_z = 2.
761.6418866478846
761.6418866478846
761.6418866478846
761.6418866478846

The simulation runs faster on more CPU cores. This sequence of simulations demonstrates strong scaling where a simulation of a fixed number of particles executes in less time on more parallel resources. You can compute the efficiency of the scaling by taking the performance on n cores divided by the performance on one core divided by n. Strong scaling rarely nears 100% efficiency, so it will use more total resources than executing with -n 1. However, a -n 1 simulation may take months to complete while a -n 64 one takes days - making your research much more productive even it if uses a moderately larger amount of HPC resources.

The print is executed the same number of times given to -n. The next section of this tutorial explains this in more detail.

Summary

In this section, you have executed a HOOMD-blue simulation script on 1, 2, and 4 CPU cores using MPI and observed the performance. The next section of this tutorial explains how HOOMD-blue splits the domain of the simulation and how you should structure your scripts.