Calculate 2D class averages

From Relion
Jump to navigation Jump to search

I/O tab

CTF tab

  • The pixel size (in Angstrom) should be the same as the one used to estimate the CTF parameters (unless you have rescaled your data afterwards, in which case the same scale factor should be applied)
  • If no CTF correction is to be performed, make sure you phase-flipped your data during preprocessing. See the Prepare input files page.
  • Tell the program whether the data have been phase flipped or not.

Optimisation tab

  • The number of classes is the most important parameter of the 2D classification procedure. Often one performs multiple calculations with different values.
  • Often 25-50 iterations are necessary before the refinement converges to a stable solution. Note there is currently no convergence criterion implemented, so the user is responsible for monitoring the convergence.
  • The regularisation parameter determines the relative weight between the experimental data and the prior. Bayes' law dictates it should be 1, but sometimes better results are obtained using slightly higher values. Whereas 3D refinements (in cryo) may require values of 2-4, 2D classifications seem to go better with values of 1-2.
  • The particle diameter (in Angstroms) serves to define a soft spherical mask that will be applied to the reference images to reduce their background noise.
  • Optionally, one may also apply a soft mask with zeros in the solvent area to the experimental particles. This reduces noise so that classifications may be made more reliably. However, masking the experimental data also introduces correlations between the Fourier components, which are not described in the statistical model. Often using zero-particle-masks in classification runs yields better results, although it may hamper resolution somewhat.
  • Because 2D (and 3D) classification may suffer from overfitting (especially for very weak data), there is an option to limit the resolution in the E-step (i.e. the alignment). If one sees signs of overfitting (i.e. hairy features extending in the solvent around a particle), we typically set this to values in the range of 20-5A, often starting with lower resolutions in initial runs and then progressively become more permissive in subsequent runs as the data set becomes cleaner.

Sampling tab

  • In-plane angular sampling rates of 5 degrees are enough for most applications.
  • Translational search ranges may depend on how well-centered the particles were picked, but often 6 pixels will do the job (translational searches in subsequent iterations are centered at the optimal translation in the previous one, so that particles may "move" much more than the original search range during the course of an entire refinement. Note that pre-centering prior to RELION refinement is not necessary, and also not recommended (it often messes up the Gaussian distribution of origin offsets).

Compute tab

As of RELION-2.0, there are more computation-options accessible from the GUI. These are:

  • Combine iterations through disc? This option was implemented when some network cards on our cluster were buggy and large messages often failed. Large files are written to disc and that way, the MPI nodes speak to each other. If you have reasonably fast and reliable network connections, it may be better to set this option to "No", as it will be quite slow (although that depends on the speed of your disc access).
  • Use parallel disc I/O? If set to Yes, all MPI slaves will read their own images from disc. Otherwise, only the master will read images and send them through the network to the slaves. Parallel file systems like gluster of fhgfs are good at parallel disc I/O. NFS may break with many slaves reading in parallel.
  • Number of pooled particles: Particles are processed in individual batches by MPI slaves. During each batch, a stack of particle images is only opened and closed once to improve disk access times. All particle images of a single batch are read into memory together. The size of these batches is at least one particle per thread used. The nr_pooled_particles parameter controls how many particles are read together for each thread. If it is set to 3 and one uses 8 threads, batches of 3x8=24 particles will be read together. This may improve performance on systems where disk access, and particularly metadata handling of disk access, is a problem. It has a modest cost of increased RAM usage.
  • Pre-read all particles into RAM? If set to Yes, all particle images will be read into computer memory, which will greatly speed up calculations on systems with slow disk access. However, one should of course be careful with the amount of RAM available. Because particles are read in double-precision, it will take ( N * box_size * box_size * 8 / (1024 * 1024 * 1024) ) Giga-bytes to read N particles into RAM. For 100 thousand 200x200 images, that becomes 30Gb, or 120 Gb for the same number of 400x400 particles. Remember that running a single MPI slave on each node that runs as many threads as available cores will have access to all available RAM.
  • Use GPU acceleration? If set to yes, the program will run CUDA-code to accelerate computations on NVIDIA grpahics cards. Note that cards supporting CUDA compute 3.5+ are supported. It is typically recommended to run 1 MPI process on each card. The option 'Which GPUs to use' can be used to specify which MPI process is run on which card, e.g. "0:1:2:3" means that the slaves 1-4 will run on cards 0-3. Note that the master (mpi-rank 0) will not use any GPU: it will merely orchestrate the calculations. You can run multiple threads within each MPI process to further accelerate the calculations. We often use for example 4 threads.

Running tab

  • It is unlikely one needs many threads for 2D class averaging, as this typically takes only modest amounts of memory. Still, depending on the computer cluster setup, it may be worth using some in order to keep the total number of MPI processes not too high (which would cause a lot of message-passing).