Filling in the GUI

For 3D classifications, select the run-type of 3D classification from the drop-down menu at the top of the GUI.

I/O tab

Apart from the initial reference map, the number of classes is the second most important parameter of this procedure. Often one performs multiple calculations with different values.

If the reference was not reconstructed from the input images in either XMIPP or RELION, you may assume it is not on the absolute greyscale. However, as also mentioned in the important note for 3D classification on the Prepare input files section, it is highly recommended to use a consensus model that comes from the data themselves in order to classify structural heterogeneity.

Provide the correct symmetry point group. Note there are various settings for icosahedral symmetry, also see the Conventions. Make sure your input map is in the one you provide here.

The pixel size (in Angstrom) should be the same as the one used to estimate the CTF parameters (unless you rescaled the images afterwards, in which case the same scale factor should be applied to the pixel size).

If no CTF correction is to be performed, make sure you phase-flipped your data during preprocessing. See the Prepare input files page.

Some data sets have very-low resolution features that are not accounted for in the linear CTF model (with ~10% amplitude contrast). This will sometimes lead to too strong low-resolution features in the reconstructed maps. Separation based on these very low-resolution features may then hamper separation of distinct structural states. Therefore, it may be useful to ignore the CTFs (i.e. set them to one) until their first maximum. In several cases, this has led to successful classification of structurally heterogeneous data that could not be classified using the full CTF-correction. If desired, full CTF-correction can then be applied subsequently during separate refinements of the distinct classes.

Successful classification often requires starting from a very strongly low-pass filtered map. If your input map is not low-pass filtered, it may be filtered internally using the Initial low-pass filter option. Typically, one filters as much as possible, i.e. before the reference becomes a feature-less blob that can no longer be refined. For example, we use 60 Angstroms for ribosomes and GroEL.

Often 25-50 iterations are necessary before the refinement converges to a stable solution. Note there is currently no convergence criterion implemented, so the user is responsible for monitoring the convergence. Jobs may be killed if they converge before their maximum number of iterations has been reached, or if the opposite happens a previous run may be continued.

The regularisation parameter determines the relative weight between the experimental data and the prior. Bayes' law dictates it should be 1, but better results are often obtained using slightly higher values (e.g. 2-4), especially when dealing with cryo-data. (For negative stain data we often observe that lower values, e.g. 1-2, are better.)

The particle diameter (in Angstroms) serves to define a soft spherical mask that will be applied to the references to reduce their background noise. Note that a (preferably soft) user-provided mask (1=protein, 0=solvent) may also be used for highly non-spherical particles. Be careful though not to mask away any unexpected signal and always use a soft mask, i.e. one with values between 0 and 1 at the protein/solvent boundary.

Optionally, one may also apply a soft mask with zeros in the solvent area to the experimental particles. This reduces noise so that classifications may be made more reliably. However, masking the experimental data also introduces correlations between the Fourier components, which are not described in the statistical model. Often using zero-particle-masks in classification runs yields better results, although it may hamper resolution somewhat.

Because 2D (and 3D) classification may suffer from overfitting (especially for very weak data), there is an option to limit the resolution in the E-step (i.e. the alignment). If one sees signs of overfitting (i.e. hairy features extending in the solvent around a particle), we typically set this to values in the range of 20-5A, often starting with lower resolutions in initial runs and then progressively become more permissive in subsequent runs as the data set becomes cleaner.

CPU requirement will increase rapidly with increased angular samplings (but in contrast to ML3D implementations memory requirements will not!). Therefore, 3D classification is often performed at relatively coarse angular sampling, e.g. 7.5 degrees for ribosomes. Ultimately this will however depend on the nature of the heterogeneity one wants to classify.

If fine angular sampling are required, one could run using coarse samplings initially and then restart a previous run (see the Running_RELION page) with a finer angular sampling combined with local angular searches.

Translational search ranges may depend on how well-centered the particles were picked, but often 6 pixels will do the job (translational searches in subsequent iterations are centered at the optimal translation in the previous one, so that particles may "move" much more than the original search range during the course of an entire refinement. Note that pre-centering prior to RELION refinement is not necessary, and also not recommended (it often messes up the Gaussian distribution of origin offsets).

If one uses multi-core nodes, the use of myltiple threads (as many threads as cores on a machine) is recommended because the shared-memory parallelisation increases the amount of memory available per process. MPI is typically used for more scalable parallelisation over the different nodes. (In terms of CPU usage, MPI parallelisation is a bit more efficient than threads.)

Please see the Classification example for a full example of how to perform 3D classification in RELION.