Organizing Data¶
Overview¶
Questions¶
How can I organize data from many simulations?
How do I relate the parameters of the simulation to the data?
Objectives¶
Define a data space that organizes simulation output into directories based on state point parameters.
Demonstrate how to use signac to create a data space.
Initialize a data space with hard particle Monte Carlo simulations at selected volume fractions.
Show how to store computed results in the job document.
Boilerplate Code¶
[1]:
import itertools
import math
import gsd.hoomd
import numpy
Research Question¶
The Introducing HOOMD-blue tutorial shows how to execute a single simulation of hard octahedra and how they self-assemble into a crystal structure. You might want to answer the question “At what volume fraction is the phase transition from fluid to crystal?”. One way to find out is to execute simulations at many volume fractions and examine the resulting equilibrium structures. When performing such a study, you may want to explore simulations at different system sizes, repeat the simulation with different random number seeds, or examine the effects of changing other parameters.
The unique set of parameters for each simulation is a state point which you can represent in a Python dictionary:
[3]:
statepoint = dict(N_particles=128, volume_fraction=0.6, seed=20)
statepoint
[3]:
{'N_particles': 128, 'volume_fraction': 0.6, 'seed': 20}
In your own research, you will execute different types of simulation with different parameters. Follow the example provided in this tutorial and apply the same concepts to organize and execute the simulations for your work.
Data Space¶
Each simulation you execute will generate several output files. Store these in a directory uniquely assigned to each state point. The collection of directories is a data space. Use signac to automatically name and create the directories.
[4]:
import signac
A signac project represents the entire data space stored on disk with associated metadata. The method init_project
creates a signac project in the current working directory by placing a .signac/config
file with the project metadata and a workspace
directory to hold the directories of the data space.
Create the project:
[5]:
project = signac.init_project()
[6]:
!cat .signac/config
schema_version = 2
[7]:
!ls -l workspace
total 0
A signac job is a container that holds the state point, assigned directory, and a job document.
[8]:
job = project.open_job(statepoint)
[9]:
job.statepoint
[9]:
{'N_particles': 128, 'volume_fraction': 0.6, 'seed': 20}
The job document is a persistent dictionary where you can record the job’s status.
[10]:
job.document
[10]:
{}
The first file for each simulation is the initial condition. Here is the initialization code from the Introducing HOOMD-blue tutorial, encapsulated in a function that takes a signac job as an argument:
[11]:
def init(job):
# Place a number of particles as indicated by the signac job's state point.
K = math.ceil(job.statepoint.N_particles ** (1 / 3))
spacing = 1.2
L = K * spacing
x = numpy.linspace(-L / 2, L / 2, K, endpoint=False)
position = list(itertools.product(x, repeat=3))
position = position[0 : job.statepoint.N_particles]
orientation = [(1, 0, 0, 0)] * job.statepoint.N_particles
frame = gsd.hoomd.Frame()
frame.particles.N = job.statepoint.N_particles
frame.particles.position = position
frame.particles.orientation = orientation
frame.particles.typeid = [0] * job.statepoint.N_particles
frame.particles.types = ['octahedron']
frame.configuration.box = [L, L, L, 0, 0, 0]
# Write `lattice.gsd` to the signac job's directory.
with gsd.hoomd.open(name=job.fn('lattice.gsd'), mode='x') as f:
f.append(frame)
# Set the 'initialized' item in the job document.
job.document['initialized'] = True
The init
function uses job.statepoint.N_particles
to access the state point parameter and job.fn
to construct a filename in the assigned directory. init
also sets the 'initialized'
item in the job document to True
which will be used in the next section of the tutorial.
Call init
to initialize signac jobs at various volume fractions in the data space:
[12]:
for volume_fraction in [0.4, 0.5, 0.6]:
statepoint = dict(N_particles=128, volume_fraction=volume_fraction, seed=20)
job = project.open_job(statepoint)
job.init()
init(job)
This tutorial initializes only three jobs in the data space to keep the execution time and output short. In your own research, signac can help you organize and execute as many jobs as you need.
signac places the data space in a directory named workspace
. Here are the files the loop generated:
[13]:
!ls workspace/*
workspace/59363805e6f46a715bc154b38dffc4e4:
lattice.gsd signac_job_document.json signac_statepoint.json
workspace/972b10bd6b308f65f0bc3a06db58cf9d:
lattice.gsd signac_job_document.json signac_statepoint.json
workspace/c1a59a95a0e8b4526b28cf12aa0a689e:
lattice.gsd signac_job_document.json signac_statepoint.json
Each directory now contains the lattice.gsd
file created by init
as well as a signac_statepoint.json
and signac_job_document.json
files created by signac. The directory assigned to each signac job is a hash of the state point and is generated automatically by signac.
Summary¶
In this section of the tutorial, you created a data space with directories to store the simulation results for a number of state points. So far, the directory for each simulation contains only the initial configuration file lattice.gsd.
The remaining sections in this tutorial show you how to execute a workflow on this data space that randomizes, compresses, and equilibrates each simulation.
This tutorial only teaches the basics of signac. Read the signac documentation to learn how to loop through all signac jobs, search, filter, and much more.