HelixMC Tutorial¶

Release: 0.9 May 13, 2016

This tutorial demonstrates how to install and run simple calculations with HelixMC, and breifly summarizes available examples and bp-step paramerter database. For details on the classes and functions availble, please see HelixMC Reference.

Install¶

Currently HelixMC has only been tested on Linux. It should run on other Unix-like system (e.g. Mac OS X). The following packages are also required by HelixMC. The versions we used are given in parathesis.

• Python (2.7.3)
• Numpy (1.6.1)
• Matplotlib (1.1.0)
• A working C/C++ compiler. Here we used GCC (4.6.3)

The easiest way to setup the python environment is to use latest Enthought Python Distribution (http://www.enthought.com/).

The easiest way to install is to use pip install:

$pip install helixmc  Alternatively, one can download the source code from the latest GitHub repository. Simply run: $ git clone https://github.com/fcchou/HelixMC.git


After this, you can instal HelixMC using setup.py:

$python setup.py build$ sudo python setup.py install


Instead of installing using setup.py, you can just add your HelixMC folder into the system’s $PATH and $PYTHONPATH. In bash this can be done by adding the following lines to your ~/.bashrc:

export PATH=$PATH:<HelixMC Path> export PYTHONPATH=$PYTHONPATH:<HelixMC Path>


Then build the Cython extension. Under the helixmc/ folder, run:

$python _cython_build.py build_ext --inplace  Note that this requires you to have Cython installed. Otherwise you can choose to build the c source file, then you do not need Cython: $ python _c_build.py build_ext --inplace


Now you should be all set. To test the install, simply run:

$helixmc-run --help  This should output the help information for helixmc-run application. Run HelixMC¶ The helixmc-run application wraps the classes and functions of HelixMC to allow simple MC job submissions from command line. A detailed help for all options of helixmc-run can be obtained by running: $ helixmc-run --help


Now we demonstrate a simple example for helixmc-run.

First, run the following command to kick out a MC run:

\$ helixmc-run -params DNA_default.npz -n_bp 100 -n_step 10 -seq GCCG \


Here, -params give the input database file that contains bp-step parameters curated from PDB (by default it searches the helixmc database folder if it does not find the input file). -n_bp is the total number of bp in the helix. -n_step is the number of MC steps. -seq gives the sequence of the nucleic acids (ATCG for DNA and AUCG for RNA). -force is the applied z-direction stretching force. -compute_fuller_link tells HelixMC to compute and store the linking number using Fuller’s approximation [R1]. -out_frame option will make HelixMC save the final frame to disk as test_run.npz in this case.

Depending on your machine it will take a few seconds to run. After completion you should see something like:

Total time = 1.941772
Accept rate = 0.787879


It is advisable to check the Accept rate of each run, and make sure it is not too low. As a rule of thumb, if Accept rate < 0.1, this means most MC moves are rejected, and you will need use a higher-than-normal number of MC steps to acheive the same level of sampling.

Now we can analyze the output data. Open a Python session and enters the following:

>>> import numpy as np
>>> from helixmc.pose import HelixPose


The observables for each frame are stored in MC_data.npz. Normally the coordinates and reference frames of the last bp are recorded. If -compute_fuller_link or -compute_exact_link is used, the twist and writhe of the helix will also be stored (note that link = twist + writhe).

For example I can compute the average z-extension and the average link as follows:

>>> data = np.load('MC_data.npz')
>>> data.files
['coord_terminal', 'twist', 'writhe', 'frame_terminal']
>>> data['coord_terminal'][:,2]  # 2 for the z-elements
array([ 309.06198311,  317.92717085,  320.17158221,  304.42561971,
319.07461907,  306.94162915,  314.7566295 ,  319.04106375,
322.42125203,  325.72718993])
>>> np.average(data['coord_terminal'][:,2])  # avg. z-extension in Å
315.95487393228649
60.648749666780688


Remember we stored the final frame of the simulation to test_run.npz. We will now plot the helix using that:

>>> pose = HelixPose('test_run.npz')
>>> pose.plot_centerline()  # plot the centerline
>>> pose.plot_helix()  # plot the entire helix


You should see something similar to the following

This is the end of the example. For more examples, check the examples/ folder in HelixMC, which is briefly summarized below.

Other Examples¶

Here is a list of examples in the examples/ folder.

force_ext: This is just the example above. This is for link-contrained simulation, similar to the torsioal-trap single-molecule experiment [R2]. Simulation of Z-DNA using helixmc-run. Check the if the Fuller’s approximation is correct in certain criteria. How to use helixmc.fitfxn to fit simulation or experiment data to simple analytical models. More examples for plotting the helices. How to perform alternative evaluation of bending persistence length using the method suggested by Olson et al. [R3]. Examples on curating bp-step parameters from PDB.

Base-pair Step Parameters Database¶

In the helixmc/data/ folder, several different bp-step parameter sets are given. These datasets were all extracted from structures in Protein Data Bank (PDB, http://www.pdb.org/), with different selection and filtering. The list below summarizes these data.

DNA_default: B-DNA data from structures with resolution (Rs) <= 2.8 Å, excluding protein-binding models. A-DNA + B-DNA, Rs <= 2.8 Å, including protein-binding models. B-DNA, Rs <= 2.0 Å, excluding protein-binding models. RNA, Rs <= 2.8 Å, excluding protein-binding models. RNA, Rs <= 2.8 Å, including protein-binding models. RNA, Rs <= 2.0 Å, excluding protein-binding models. Z-DNA, Rs <= 2.8 Å, including protein-binding models. Unfiltered datasets (no filtering of histogram outliers). Single 6D Gaussian built from DNA_default. Single 6D Gaussian built from RNA_default. Chimera dataset with mean from DNA_gau and covariance from RNA_gau. Chimera dataset with mean from RNA_gau and covariance from DNA_gau. Manually refitted datasets to match experimental measurements. Multi-bp datasets derived from the 2.8_all pdb lists.

Note that Gaussian dataset (*gau*.npy) must be loaded with -gaussian_params tag in helixmc-run command line (instead of -params). Also Gaussian dataset does not support sequence specific simulations.

The corresponding lists of PDB models being used are given in the helixmc/data/pdb_list/ folder.

These datasets are in npy/npz format (Numpy array/archive). For the npz files, the data for different bp-steps of different sequences were separated into different arrays in the file. For B-DNA and RNA, parameter sets with Rise >= 5.5 Å or Twist <= 5° were thrown away as outliers. Then, parameter sets with values beyond 4 standard deviations away from the mean for any of the 6 bp-step parameters were also removed. For B-DNA (except DNA_2.8_all, where the protein binding makes A-DNA and B-DNA unseparable), we further clustered the data using k-means algorithm to separate the A-DNA and B-DNA data. Note that these filtering steps are skipped in the unfiltered datasets.

For Z-DNA, we only considered two types of bp-steps: CG and GC. We used the following selection criteria: Twist <= -30° for GC, and -30° < Twist <= 5° for CG. For CG bp-steps, we further filtered the data by only keeping parameter sets with 4.5 Å <= Rise < 6.3 Å. Parameter sets with values beyond 4 standard deviation away from the mean were then removed, similar to the above cases.

See also examples/bp_database/ for a detailed example for the curation of DNA_2.0_noprot.npz.

References¶

 [R1] Fuller FB (1978) Decomposition of the linking number of a closed ribbon: A problem from molecular biology. PNAS 75: 3557-3561.
 [R2] Lipfert J, Kerssemakers JWJ, Jager T, Dekker NH (2010) Magnetic torque tweezers: measuring torsional stiffness in DNA and RecA-DNA filaments. Nature Methods 7: 977–980.
 [R3] Olson WK, Colasanti AV, Czapla L, Zheng G (2008) Insights into the Sequence-Dependent Macromolecular Properties of DNA from Base-Pair Level Modeling. In: Voth GA, editor. Coarse-Graining of Condensed Phase and Biomolecular Systems: CRC Press. pp. 205-223.