Improving Performance¶
Overview¶
Questions¶
How can I write custom actions to be as efficient as possible?
Objectives¶
Mention common means for improving performance.
Demonstrate using the local snapshot API for increased performance.
Boilerplate Code¶
[1]:
import numbers
import hoomd
import numpy
cpu = hoomd.device.CPU()
simulation = hoomd.Simulation(cpu, seed=1)
# Create a simple cubic configuration of particles
N = 12 # particles per box direction
box_L = 50 # box dimension
snap = hoomd.Snapshot(cpu.communicator)
snap.configuration.box = [box_L] * 3 + [0, 0, 0]
snap.particles.N = N**3
x, y, z = numpy.meshgrid(
*(numpy.linspace(-box_L / 2, box_L / 2, N, endpoint=False),) * 3
)
positions = numpy.array((x.ravel(), y.ravel(), z.ravel())).T
snap.particles.position[:] = positions
snap.particles.types = ["A"]
snap.particles.typeid[:] = 0
simulation.create_state_from_snapshot(snap)
simulation.state.thermalize_particle_momenta(hoomd.filter.All(), 1.0)
lj = hoomd.md.pair.LJ(nlist=hoomd.md.nlist.Cell(buffer=0.4))
lj.params[("A", "A")] = {"epsilon": 1.0, "sigma": 1.0}
lj.r_cut[("A", "A")] = 2.5
integrator = hoomd.md.Integrator(
methods=[hoomd.md.methods.ConstantVolume(hoomd.filter.All())], forces=[lj], dt=0.005
)
simulation.operations += integrator
class GaussianVariant(hoomd.variant.Variant):
def __init__(self, mean, std):
hoomd.variant.Variant.__init__(self)
self.mean = mean
self.std = std
def __call__(self, timestep):
return rng.normal(self.mean, self.std)
energy = GaussianVariant(0.1, 0.001)
simulation.run(0)
rng = numpy.random.default_rng(1245)
General Guidelines¶
When trying to improve the performance of custom actions, the first step is always to profile the class. Python comes with profiling tools that can be used to determine bottlenecks in a custom action’s performance. In addition, there are many external visualization and profiling tools available. However, after profiling here are some tips that should help improve performance.
State.get_snapshot
aggregates data across MPI ranks and is \(O(n)\) to construct, and setting the state to a new snapshot is \(O(n)\) as well. However,hoomd.State.cpu_local_snaphshot
orhoomd.State.gpu_local_snapshot
are on order \(O(1)\) to construct and modifying data in a local snapshot is \(O(1)\) as well.HOOMD-blue makes use of properties heavily. Since users can change the system state in Python at any point, we must recompute many of these quantities every time they are queried. If you are using something like
hoomd.md.pair.LJ.energies
multiple times, it will be more performant to first store the values and then use that copy.Avoid for loops for numerical calculation. Try to utilize NumPy broadcasting or existing functions in NumPy or Scipy on the CPU or CuPy on the GPU.
Improve InsertEnergyUpdater¶
As an example, we will improve the performance of the InsertEnergyUpdater
. Specifically we will change to use the cpu_local_snapshot
to update particle velocity. We will use the %%timeit
magic function for timing the simulation’s run time before and after our optimization. To highlight the difference, we will run the updater every timestep.
[2]:
class InsertEnergyUpdater(hoomd.custom.Action):
def __init__(self, energy):
self._energy = energy
@property
def energy(self):
return self._energy
@energy.setter
def energy(self, new_energy):
if isinstance(new_energy, numbers.Number):
self._energy = hoomd.variant.Constant(new_energy)
elif isinstance(new_energy, hoomd.variant.Variant):
self._energy = new_energy
else:
message = "energy must be a variant or real number."
raise ValueError(message)
def act(self, timestep):
snap = self._state.get_snapshot()
if snap.communicator.rank == 0:
particle_i = rng.integers(snap.particles.N)
mass = snap.particles.mass[particle_i]
direction = self._get_direction()
magnitude = numpy.sqrt(2 * self.energy(timestep) / mass)
velocity = direction * magnitude
old_velocity = snap.particles.velocity[particle_i]
new_velocity = old_velocity + velocity
snap.particles.velocity[particle_i] = new_velocity
self._state.set_snapshot(snap)
@staticmethod
def _get_direction():
theta, z = rng.random(2)
theta *= 2 * numpy.pi
z = 2 * (z - 0.5)
return numpy.array(
[
numpy.sqrt(1 - (z * z)) * numpy.cos(theta),
numpy.sqrt(1 - (z * z)) * numpy.sin(theta),
z,
]
)
[3]:
energy_action = InsertEnergyUpdater(energy)
energy_operation = hoomd.update.CustomUpdater(action=energy_action, trigger=1)
simulation.operations.updaters.append(energy_operation)
[4]:
%%timeit
simulation.run(100)
56.6 ms ± 918 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
We now show the profile for the optimized code which uses the cpu_local_snapshot
for updating velocities.
[ ]:
class InsertEnergyUpdater(hoomd.custom.Action):
def __init__(self, energy):
self._energy = energy
@property
def energy(self):
return self._energy
@energy.setter
def energy(self, new_energy):
if isinstance(new_energy, numbers.Number):
self._energy = hoomd.variant.Constant(new_energy)
elif isinstance(new_energy, hoomd.variant.Variant):
self._energy = new_energy
else:
message = "energy must be a variant or real number."
raise ValueError(message)
def attach(self, simulation):
self._state = simulation.state
self._comm = simulation.device.communicator
def detach(self):
del self._state
del self._comm
def act(self, timestep):
part_tag = rng.integers(self._state.N_particles)
direction = self._get_direction()
energy = self.energy(timestep)
with self._state.cpu_local_snapshot as snap:
# We restrict the computation to the MPI
# rank containing the particle if applicable.
# By checking if multiple MPI ranks exist first
# we can avoid checking the inclusion of a tag id
# in an array.
if self._comm.num_ranks <= 1 or part_tag in snap.particles.tag:
i = snap.particles.rtag[part_tag]
mass = snap.particles.mass[i]
magnitude = numpy.sqrt(2 * energy / mass)
velocity = direction * magnitude
old_velocity = snap.particles.velocity[i]
new_velocity = old_velocity + velocity
snap.particles.velocity[i] = new_velocity
@staticmethod
def _get_direction():
theta, z = rng.random(2)
theta *= 2 * numpy.pi
z = 2 * (z - 0.5)
return numpy.array(
[
numpy.sqrt(1 - (z * z)) * numpy.cos(theta),
numpy.sqrt(1 - (z * z)) * numpy.sin(theta),
z,
]
)
[6]:
# Create and add our modified custom updater
simulation.operations -= energy_operation
energy_action = InsertEnergyUpdater(energy)
energy_operation = hoomd.update.CustomUpdater(action=energy_action, trigger=1)
simulation.operations.updaters.append(energy_operation)
[7]:
%%timeit
simulation.run(100)
17.5 ms ± 590 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As can be seen the simulation with the new EnergyInsertUpdater is about an order of magnitude faster with a system size of \(12^3 = 1728\), by virtue of the local snapshot modification having \(O(1)\) time complexity. At larger system sizes this change will grow to be even more substantial.
This concludes the tutorial on custom actions in Python. For more information see the API documentation.