Minimise computational costs

From Relion
Jump to navigation Jump to search

By increasing the understanding of RELION's computational costs, this page intends to provide users with the information to minimise the computational costs of their MAP refinements.

Understanding the algorithm

Expectation

The expectation step is the "alignment" step: here each experimental image is compared to projections of the reference map in all orientations. Consequently, this is the most expensive step in terms of CPU. CPU costs increase with increasingly fine angular or translational sampling rates (linearly with the number of orientations sampled, see NrHiddenVariableSamplingPoints in the stdout file. If classifying, the CPU costs also increase linearly with the number of references used. In terms of memory (RAM): the expectation step also may be quiet costly, in particular if large images are used. Scaling behaviour is somewhat complicated, as more data is being kept in memory as the resolution increases. For Niko's recoated rotavirus data, we used 2x downsized images of 400x400 pixels, which still fitted into our 8x2Gb machines. Using the original 800x800 pixel images did not.

If you ever get the following error in your stderr file: Allocate: No space left then you know you've run out of memory. In that case, first check whether you are running the intended number of MPI jobs on each node. You can monitor memory usage and number of MPI jobs on your nodes by logging into them and using the "top" command. Often, it takes some work to setup your job submission system for handling hybrid parallelization, i.e. jobs that use both MPI and threads. See the installation page for more details on how to do this.


The total wallclock time needed to run the expectation step may be greatly reduced using parallel computing. This has been implemented at 2 different levels: MPI (message passing interface) is used to communicate between different computing nodes (separate computers that are connected to each other using cables), while so-called threads are used to parallelize tasks among the multiple cores of modern multi-core computers. Threads have the advantage of sharing the memory of one computer (so that memory does not need to be replicated for each thread). MPI has the advantage of scalability: one can always buy more computers and links them together in a larger cluster, while there is a maximum on the number of cores on one computer one can buy. The recommended way to run RELION (in particular for 3D refinements where memory requirements are larger than in 2D) is to use as many threads as there are cores on your nodes, and then run one MPI process on each node. For the 3D auto-refine option, be aware that the two independent half data sets are refined on two half-sets of the slaves, while a single master node directs everything. Therefore, it is most efficient to use an odd number of nodes, and the minimum number of nodes to use is 3.

Maximization

The maximization step is the "reconstruction" step. This step is typically much faster than the expectation step. However, it is not parallelized very well. The only parallelization implemented is that multiple reconstructions (e.g. in case of classification, or the two independent reconstructions for gold-standard FSCs) are performed in parallel. Implementation of threads in the FFTW library yielded limited speed-ups in release 1.1, but this implementation was removed from release 1.2 due to instabilities.

Although not very slow, the maximization step does take quite a bit of memory, scaling cubicly with the image size. It could be that you don't have memory problems in the expectation step, but that you run out of memory in the maximization step. The only solution to this is to use smaller (downscaled) images. You can downscale your images in the RELION Preprocessing procedure.

Tips to increase speed

Use smaller particles

We often use relatively high magnifications in the microscope and end up with large boxed particles that require a lot of time to read from disc and move around through the MPI messages. Therefore, things can be speeded up significantly by downscaling your particles, especially for the initial 2D and 3D classification runs where one often only needs lower resolutions. Remember that the maximum attainable resolution from any run is two times the pixel size, so a down-sampled pixel size of 4.0 Angstrom can still give you 8.0 Angstrom reconstructions/2D class averages. This is usually more than enough to get a good separation of suitable particles from junk particles, and even separate out main conformational variability.

You can downsample your images using the 'Rescale particles?' option on the 'extract' tab of the 'Particle extraction' job-type. After you have done your initial 2D and 3D classifications and you decide to go back to the original-scale (or less down-sampled) particle boxes. First, you'll need to re-extract your particles with the larger boxes (using a different 'Extract rootname' on the 'I/O' tab of the 'Particle extraction' job-type. Second, you will probably want to use only a subset of classified particles, and you will need to modify the STAR file with these selected down-scaled particles. There are a few things to remember here:

  1. The _rlnDetectorPixelSize is modified in the re-scaling procedure. If the original pixel size on the detector was 14 micron, and you downscaled the data 3x, then the down-scaled particle STAR files with have a value of 42 micron for the rlnDetectorPixelSize. If you go back to the original sized particles, you will need to replace the column with these values.
  2. The rlnOriginX and rlnOriginY are in pixels, not in Angstroms. Therefore, when going back to the original data from the 3x down-scaled particles, you will need to multiple all origin offsets by 3.
  3. The particle stacks have a new rootname now, which you will need to change in the STAR file. The sed command below assumes that the number of particles in each stack hasn't changed!

Assuming you have saved the selected downscaled particles in a file called selected_down3.star; the original pixel size of the detector was 14 micron; you downscaled by a factor of 3; the rlnDetectorPixelSize is the 12th column and rlnOriginX and rlnOriginY are the 18th and 19th columns in the selected_down3.star; and the downsampled and original particles are stored in stacks with names ending in particles_down3.mrcs and particles_down1.mrcs, respectively, you could modify the STAR file with the following command:

cat selected_down3.star | awk '{if (NF<3) {print} else {$12=14.0; $18=3*$18; $19=3*$19; print} }' | sed 's|particles_down3.mrcs|particles_down1.mrcs|g' > selected_down1.star

You could then use those particles for further refinement. Also, do not forget to change the pixel size (in Angstrom) on the 'General' jobtype on the main GUI.