Improving Performance

Overview

Questions

How can I write custom actions to be as efficient as possible?

Objectives

Mention common means for improving performance.
Demonstrate using the local snapshot API for increased performance.

Boilerplate Code

[1]:

from numbers import Number

import hoomd
import hoomd.md as md
import numpy as np

cpu = hoomd.device.CPU()
sim = hoomd.Simulation(cpu, seed=1)

# Create a simple cubic configuration of particles
N = 12  # particles per box direction
box_L = 50  # box dimension

snap = hoomd.Snapshot(cpu.communicator)
snap.configuration.box = [box_L] * 3 + [0, 0, 0]
snap.particles.N = N**3
x, y, z = np.meshgrid(*(np.linspace(-box_L / 2, box_L / 2, N, endpoint=False),)
                      * 3)
positions = np.array((x.ravel(), y.ravel(), z.ravel())).T
snap.particles.position[:] = positions
snap.particles.types = ['A']
snap.particles.typeid[:] = 0

sim.create_state_from_snapshot(snap)

sim.state.thermalize_particle_momenta(hoomd.filter.All(), 1.)

lj = md.pair.LJ(nlist=md.nlist.Cell(buffer=0.4))
lj.params[('A', 'A')] = {'epsilon': 1., 'sigma': 1.}
lj.r_cut[('A', 'A')] = 2.5
integrator = md.Integrator(methods=[md.methods.NVE(hoomd.filter.All())],
                           forces=[lj],
                           dt=0.005)

sim.operations += integrator


class GaussianVariant(hoomd.variant.Variant):

    def __init__(self, mean, std):
        hoomd.variant.Variant.__init__(self)
        self.mean = mean
        self.std = std

    def __call__(self, timestep):
        return rng.normal(self.mean, self.std)


energy = GaussianVariant(0.1, 0.001)
sim.run(0)
rng = np.random.default_rng(1245)

General Guidelines

When trying to improve the performance of custom actions, the first step is always to profile the class. Python comes with profiling tools that can be used to determine bottlenecks in a custom action’s performance. In addition, there are many external visualization and profiling tools available. However, after profiling here are some tips that should help improve performance.

State.get_snapshot aggregates data across MPI ranks and is \(O(n)\) to construct and setting the state to a new snapshot \(O(n)\) as well. However, hoomd.State.cpu_local_snaphshot or hoomd.State.gpu_local_snapshot are on order \(O(1)\) to construct and modifying data in a local snapshot is \(O(1)\) as well.
HOOMD-blue makes use of properties heavily. Since users can change the system state in Python at any point, we must recompute many of these quantities every time they are queried. If you are using something like hoomd.md.pair.LJ.energies multiple times, it will be more performant to first store the values and then use that copy.
Avoid for loops for numerical calculation. Try to utilize NumPy broadcasting or existing functions in NumPy or Scipy on the CPU or CuPy on the GPU.

Improve InsertEnergyUpdater

As an example, we will improve the performance of the InsertEnergyUpdater. Specifically we will change to use the cpu_local_snapshot to update particle velocity. We will use the %%timeit magic function for timing the simulation’s run time before and after our optimization. To highlight the differnce, we will run the updater every timestep.

[2]:

class InsertEnergyUpdater(hoomd.custom.Action):

    def __init__(self, energy):
        self._energy = energy

    @property
    def energy(self):
        return self._energy

    @energy.setter
    def energy(self, new_energy):
        if isinstance(new_energy, Number):
            self._energy = hoomd.variant.Constant(new_energy)
        elif isinstance(new_energy, hoomd.variant.Variant):
            self._energy = new_energy
        else:
            raise ValueError("energy must be a variant or real number.")

    def act(self, timestep):
        snap = self._state.get_snapshot()
        if snap.communicator.rank == 0:
            particle_i = rng.integers(snap.particles.N)
            mass = snap.particles.mass[particle_i]
            direction = self._get_direction()
            magnitude = np.sqrt(2 * self.energy(timestep) / mass)
            velocity = direction * magnitude
            old_velocity = snap.particles.velocity[particle_i]
            new_velocity = old_velocity + velocity
            snap.particles.velocity[particle_i] = velocity
        self._state.set_snapshot(snap)

    @staticmethod
    def _get_direction():
        theta, z = rng.random(2)
        theta *= 2 * np.pi
        z = 2 * (z - 0.5)
        return np.array([
            np.sqrt(1 - (z * z)) * np.cos(theta),
            np.sqrt(1 - (z * z)) * np.sin(theta), z
        ])

[3]:

energy_action = InsertEnergyUpdater(energy)
energy_operation = hoomd.update.CustomUpdater(action=energy_action, trigger=1)
sim.operations.updaters.append(energy_operation)

[4]:

%%timeit
sim.run(100)

84.7 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

We now show the profile for the optimized code which uses the cpu_local_snapshot for updating velocities.

[5]:

class InsertEnergyUpdater(hoomd.custom.Action):

    def __init__(self, energy):
        self._energy = energy

    @property
    def energy(self):
        return self._energy

    @energy.setter
    def energy(self, new_energy):
        if isinstance(new_energy, Number):
            self._energy = hoomd.variant.Constant(new_energy)
        elif isinstance(new_energy, hoomd.variant.Variant):
            self._energy = new_energy
        else:
            raise ValueError("energy must be a variant or real number.")

    def attach(self, simulation):
        self._state = simulation.state
        self._comm = simulation.device.communicator

    def detach(self):
        del self._state
        del self._comm

    def act(self, timestep):
        part_tag = rng.integers(self._state.N_particles)
        direction = self._get_direction()
        energy = self.energy(timestep)
        with self._state.cpu_local_snapshot as snap:
            # We restrict the computation to the MPI
            # rank containing the particle if applicable.
            # By checking if multiple MPI ranks exist first
            # we can avoid for checking inclusion of a tag id
            # in an array.
            if (self._comm.num_ranks <= 1 or part_tag in snap.particles.tag):
                i = snap.particles.rtag[part_tag]
                mass = snap.particles.mass[i]
                magnitude = np.sqrt(2 * energy / mass)
                velocity = direction * magnitude
                old_velocity = snap.particles.velocity[i]
                new_velocity = old_velocity + velocity
                snap.particles.velocity[i] = new_velocity

    @staticmethod
    def _get_direction():
        theta, z = rng.random(2)
        theta *= 2 * np.pi
        z = 2 * (z - 0.5)
        return np.array([
            np.sqrt(1 - (z * z)) * np.cos(theta),
            np.sqrt(1 - (z * z)) * np.sin(theta), z
        ])

[6]:

# Create and add our modified custom updater
sim.operations -= energy_operation
energy_action = InsertEnergyUpdater(energy)
energy_operation = hoomd.update.CustomUpdater(action=energy_action, trigger=1)
sim.operations.updaters.append(energy_operation)

[7]:

%%timeit
sim.run(100)

19.9 ms ± 800 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As can be seen the simulation with the new EnergyInsertUpdater about an order of magnitude faster with a system size of \(12^3 = 1728\), by virtue of the local snapshot modification having \(O(1)\) time complexity. At larger system sizes this change will grow to be even more substantial.

This concludes the tutorial on custom actions in Python. For more information see the API documentation.