Organizing Data



  • How can I organize data from many simulations?

  • How do I relate the parameters of the simulation to the data?


  • Define a data space that organizes simulation output into directories based on state point parameters.

  • Demonstrate how to use signac to create a data space.

  • Initialize a data space with hard particle Monte Carlo simulations at a selected volume fractions.

  • Show how to store computed results in the job document.

Boilerplate code

import itertools
import math

import gsd.hoomd
import hoomd
import numpy

Research question

The Introducing HOOMD-blue tutorial shows how to execute a single simulation of hard octahedra and how they self-assemble into a crystal structure. You might want to answer the question “At what volume fraction is the phase transition from fluid to crystal?”. One way to find out is to execute simulations at many volume fractions and examine the resulting equilibrium structures. When performing such a study, you may want to explore simulations at different system sizes, repeat the simulation with different random number seeds, or examine the effects of changing other parameters.

The unique set of parameters for each simulation is a state point which you can represent in a Python dictionary:

statepoint = dict(N_particles=128, volume_fraction=0.6, seed=20)
{'N_particles': 128, 'volume_fraction': 0.6, 'seed': 20}

In your own research, you will execute different types of simulation with different parameters. Follow the example provided in this tutorial and apply the same concepts organize and execute the simulations for your work.

Data space

Each simulation you execute will generate several output files. Store these in a directory uniquely assigned to each state point. The collection of directories is a data space. Use signac to automatically name and create the directories.

import signac

A signac project represents the entire data space stored on disk with associated metadata. The method init_project creates a signac project in the current working directory by placing a signac.rc file with the project metadata and a workspace directory to hold the directories of the data space. The name argument is required with signac 1.x, but the value of the name is used only to populate signac.rc.

Create the project:

project = signac.init_project(name="octahedra-assembly-project")
!cat signac.rc
project = octahedra-assembly-project
schema_version = 1

A signac job is a container that holds the state point, assigned directory, and a job document.

job = project.open_job(statepoint)
{'N_particles': 128, 'volume_fraction': 0.6, 'seed': 20}

The job document is a persistent dictionary where you can record the job’s status.


The first file for each simulation is the initial condition. Here is the initialization code from the Introducing HOOMD-blue tutorial, encapsulated in a function that takes a signac job as an argument:

def init(job):
    # Place a number of particles as indicated by the signac job's state point.
    K = math.ceil(job.statepoint.N_particles**(1 / 3))
    spacing = 1.2
    L = K * spacing
    x = numpy.linspace(-L / 2, L / 2, K, endpoint=False)
    position = list(itertools.product(x, repeat=3))
    position = position[0:job.statepoint.N_particles]
    orientation = [(1, 0, 0, 0)] * job.statepoint.N_particles

    snapshot = gsd.hoomd.Snapshot()
    snapshot.particles.N = job.statepoint.N_particles
    snapshot.particles.position = position
    snapshot.particles.orientation = orientation
    snapshot.particles.typeid = [0] * job.statepoint.N_particles
    snapshot.particles.types = ['octahedron'] = [L, L, L, 0, 0, 0]

    # Write `lattice.gsd` to the signac job's directory.
    with'lattice.gsd'), mode='xb') as f:

    # Set the 'initialized' item in the job document.
    job.document['initialized'] = True

The init function uses job.statepoint.N_particles to access the state point parameter and job.fn to construct a filename in the assigned directory. init also sets the 'initialized' item in the job document to True which will be used in the next section of the tutorial.

Call init to initialize signac jobs at various volume fractions in the data space:

for volume_fraction in [0.4, 0.5, 0.6]:
    statepoint = dict(N_particles=128, volume_fraction=volume_fraction, seed=20)
    job = project.open_job(statepoint)

This tutorial initializes only three jobs in the data space to keep the execution time and output short. In your own research, signac can help you organize and execute as many jobs as you need.

signac places the data space in a directory named workspace. Here are the files the loop generated:

!ls workspace/*
lattice.gsd              signac_job_document.json signac_statepoint.json

lattice.gsd              signac_job_document.json signac_statepoint.json

lattice.gsd              signac_job_document.json signac_statepoint.json

Each directory now contains the lattice.gsd file created by init as well as a signac_statepoint.json and signac_job_document.json files created by signac. The directory assigned to each signac job is a hash of the state point and is generated automatically by signac.


In this section of the tutorial, you created a data space with directories to store the simulation results for a number of state points. So far, the directory for each simulation contains only the initial configuration file lattice.gsd.

The remaining sections in this tutorial show you how to execute a workflow on this data space that randomizes, compresses, and equilibrates each simulation.

This tutorial only teaches the basics of signac. Read the signac documentation to learn how to loop through all signac jobs, search, filter, and much more.