Classify 3D structural heterogeneity: Difference between revisions

From Relion
Jump to navigation Jump to search
Line 1: Line 1:
= Filling in the GUI =
= Filling in the GUI =
For 3D refinements, select the run-type of <code>3D reconstruction</code> from the drop-down menu at the top of the GUI.
For 3D classifications, select the run-type of <code>3D classification</code> from the drop-down menu at the top of the GUI.


== I/O tab ==
== I/O tab ==


* See the [[Prepare input files]] page on how to prepare your input.  
* See the [[Prepare input files]] page on how to prepare your input data.  


* The pixel size (in Angstrom) should be the same as the one used to estimate the CTF parameters.
* Also see the notes of the reference map on the [[Prepare input files]] page.


* See the notes of the reference map on the [[Prepare input files]] page.
* Apart from the initial reference map, the number of classes is the second most important parameter of this procedure. Often one performs multiple calculations with different values.


* If the reference was not reconstructed from the input images in either XMIPP or RELION, you may assume it is not on the absolute greyscale. However, as also mentioned in the [[Prepare_input_files#Important_note_for_3D_classification|important note for 3D classification on the Prepare input files]] section, it is highly recommended to use a consensus model that comes from the data themselves in order to classify structural heterogeneity.  
* If the reference was not reconstructed from the input images in either XMIPP or RELION, you may assume it is not on the absolute greyscale. However, as also mentioned in the [[Prepare_input_files#Important_note_for_3D_classification|important note for 3D classification on the Prepare input files]] section, it is highly recommended to use a consensus model that comes from the data themselves in order to classify structural heterogeneity.  


* Note there are various settings for icosahedral symmetry, also see the [[Conventions]]. Make sure your input map is in the one you provide here.
* Provide the correct symmetry point group. Note there are various settings for icosahedral symmetry, also see the [[Conventions]]. Make sure your input map is in the one you provide here.


== CTF tab ==
== CTF tab ==
* The pixel size (in Angstrom) should be the same as the one used to estimate the CTF parameters (unless you rescaled the images afterwards, in which case the same scale factor should be applied to the pixel size).


* If no CTF correction is to be performed, make sure you phase-flipped your data during preprocessing. See the [[Prepare input files]] page.
* If no CTF correction is to be performed, make sure you phase-flipped your data during preprocessing. See the [[Prepare input files]] page.
* If the particles have been phase flipped, tell the program about this.


* Some data sets have very-low resolution features that are not accounted for in the linear CTF model (with ~10% amplitude contrast). This will sometimes lead to too strong low-resolution features in the reconstructed maps. Separation based on these very low-resolution features may then hamper separation of distinct structural states. Therefore, it may be useful to ignore the CTFs (i.e. set them to one) until their first maximum. In several cases, this has led to successful classification of structurally heterogeneous data that could not be classified using the full CTF-correction. If desired, full CTF-correction can then be applied subsequently during separate refinements of the distinct classes.
* Some data sets have very-low resolution features that are not accounted for in the linear CTF model (with ~10% amplitude contrast). This will sometimes lead to too strong low-resolution features in the reconstructed maps. Separation based on these very low-resolution features may then hamper separation of distinct structural states. Therefore, it may be useful to ignore the CTFs (i.e. set them to one) until their first maximum. In several cases, this has led to successful classification of structurally heterogeneous data that could not be classified using the full CTF-correction. If desired, full CTF-correction can then be applied subsequently during separate refinements of the distinct classes.
* Intensity correction corrects for distinct grey-scale intensities among the signal in the data, e.g. because due to distinct SNRs among the micrographs. This option is only effective if the data is provided in a STAR file that contains multiple unique strings for the rlnMicrographName label (see the [[Prepare input files]] page.


== Optimisation tab ==
== Optimisation tab ==


* Successful classification often requires starting from a very strongly low-pass filtered map. If your input map is not low-pass filtered, it may be filtered internally using the <code>Initial low-pass filter</code> option. Typically, one filters ''as much as possible'', i.e. before the reference becomes a feature-less blob that can no longer be refined.
* Successful classification often requires starting from a very strongly low-pass filtered map. If your input map is not low-pass filtered, it may be filtered internally using the <code>Initial low-pass filter</code> option. Typically, one filters ''as much as possible'', i.e. before the reference becomes a feature-less blob that can no longer be refined. For example, we use 80 Angstroms for ribosomes and 60 Angstroms for GroEL.  


* Often 25-50 iterations are necessary before the refinement converges to a stable solution. Note there is currently no convergence criterion implemented, so the user is responsible for monitoring the convergence. Jobs may be killed if they converge before their maximum number of iterations has been reached, or if the opposite happens a [[Running RELION#Continuing an old run | previous run may be continued]].
* Often 25-50 iterations are necessary before the refinement converges to a stable solution. Note there is currently no convergence criterion implemented, so the user is responsible for monitoring the convergence. Jobs may be killed if they converge before their maximum number of iterations has been reached, or if the opposite happens a [[Running RELION#Continuing an old run | previous run may be continued]].
   
   
* '''The number of classes is the most important parameter'''. Often one performs multiple calculations with different values.
* The regularisation parameter determines the relative weight between the experimental data and the prior. Bayes' law dictates it should be 1, but better results are often obtained using slightly higher values (e.g. 2-4), especially when dealing with cryo-data.
* The regularisation parameter determines the relative weight between the experimental data and the prior. Bayes' law dictates it should be 1, but better results are often obtained using slightly higher values (e.g. 2-4), especially when dealing with cryo-data.


* The particle diameter (in Angstroms) serves to define a soft spherical mask that will be applied to the experimental images to reduce their background noise. If solvent flattening is set to Yes, then also the references will be masked using the same spherical mask (or using a user-provided one under the <code>solvent mask</code> option).
* The particle diameter (in Angstroms) serves to define a soft spherical mask that will be applied to the references to reduce their background noise. Note that a (preferably soft) user-provided mask (1=protein, 0=solvent) may also be used for highly non-spherical particles. Be careful though not to mask away any unexpected signal and always use a soft mask, i.e. one with values between 0 and 1 at the protein/solvent boundary.


== Sampling tab ==
== Sampling tab ==


* CPU requirement will increase rapidly with increased angular samplings (but in contrast to ML3D/MLF3D implementations memory requirements will not!). Therefore, 3D classification is often performed at relatively coarse angular sampling, e.g. 7.5 degrees. Ultimately this will however depend on the nature of the heterogeneity one wants to classify.
* CPU requirement will increase rapidly with increased angular samplings (but in contrast to ML3D implementations memory requirements will not!). Therefore, 3D classification is often performed at relatively coarse angular sampling, e.g. 7.5 degrees for ribosomes. Ultimately this will however depend on the nature of the heterogeneity one wants to classify.


* For 3D classification with relatively coarse angular one typically does not perform local angular searches.
* If fine angular sampling are required, one could run using coarse samplings initially and then restart a previous run (see the [[Running_RELION]] page) with a finer angular sampling combined with local angular searches.


* Translational search ranges may depend on how well-centered the particles were picked, but often 10 pixels will do the job (translational searches in subsequent iterations are centered at the optimal translation in the previous one, so that particles may "move" much more than the original search range during the course of an entire refinement. Note that pre-centering prior to RELION refinement is not necessary, and also not recommended (it often messes up the Gaussian distribution of origin offsets).
* Translational search ranges may depend on how well-centered the particles were picked, but often 6 pixels will do the job (translational searches in subsequent iterations are centered at the optimal translation in the previous one, so that particles may "move" much more than the original search range during the course of an entire refinement. Note that pre-centering prior to RELION refinement is not necessary, and also not recommended (it often messes up the Gaussian distribution of origin offsets).


== Running tab ==
== Running tab ==

Revision as of 11:44, 19 July 2012

Filling in the GUI

For 3D classifications, select the run-type of 3D classification from the drop-down menu at the top of the GUI.

I/O tab

  • Apart from the initial reference map, the number of classes is the second most important parameter of this procedure. Often one performs multiple calculations with different values.
  • If the reference was not reconstructed from the input images in either XMIPP or RELION, you may assume it is not on the absolute greyscale. However, as also mentioned in the important note for 3D classification on the Prepare input files section, it is highly recommended to use a consensus model that comes from the data themselves in order to classify structural heterogeneity.
  • Provide the correct symmetry point group. Note there are various settings for icosahedral symmetry, also see the Conventions. Make sure your input map is in the one you provide here.

CTF tab

  • The pixel size (in Angstrom) should be the same as the one used to estimate the CTF parameters (unless you rescaled the images afterwards, in which case the same scale factor should be applied to the pixel size).
  • If no CTF correction is to be performed, make sure you phase-flipped your data during preprocessing. See the Prepare input files page.
  • If the particles have been phase flipped, tell the program about this.
  • Some data sets have very-low resolution features that are not accounted for in the linear CTF model (with ~10% amplitude contrast). This will sometimes lead to too strong low-resolution features in the reconstructed maps. Separation based on these very low-resolution features may then hamper separation of distinct structural states. Therefore, it may be useful to ignore the CTFs (i.e. set them to one) until their first maximum. In several cases, this has led to successful classification of structurally heterogeneous data that could not be classified using the full CTF-correction. If desired, full CTF-correction can then be applied subsequently during separate refinements of the distinct classes.

Optimisation tab

  • Successful classification often requires starting from a very strongly low-pass filtered map. If your input map is not low-pass filtered, it may be filtered internally using the Initial low-pass filter option. Typically, one filters as much as possible, i.e. before the reference becomes a feature-less blob that can no longer be refined. For example, we use 80 Angstroms for ribosomes and 60 Angstroms for GroEL.
  • Often 25-50 iterations are necessary before the refinement converges to a stable solution. Note there is currently no convergence criterion implemented, so the user is responsible for monitoring the convergence. Jobs may be killed if they converge before their maximum number of iterations has been reached, or if the opposite happens a previous run may be continued.
  • The regularisation parameter determines the relative weight between the experimental data and the prior. Bayes' law dictates it should be 1, but better results are often obtained using slightly higher values (e.g. 2-4), especially when dealing with cryo-data.
  • The particle diameter (in Angstroms) serves to define a soft spherical mask that will be applied to the references to reduce their background noise. Note that a (preferably soft) user-provided mask (1=protein, 0=solvent) may also be used for highly non-spherical particles. Be careful though not to mask away any unexpected signal and always use a soft mask, i.e. one with values between 0 and 1 at the protein/solvent boundary.

Sampling tab

  • CPU requirement will increase rapidly with increased angular samplings (but in contrast to ML3D implementations memory requirements will not!). Therefore, 3D classification is often performed at relatively coarse angular sampling, e.g. 7.5 degrees for ribosomes. Ultimately this will however depend on the nature of the heterogeneity one wants to classify.
  • If fine angular sampling are required, one could run using coarse samplings initially and then restart a previous run (see the Running_RELION page) with a finer angular sampling combined with local angular searches.
  • Translational search ranges may depend on how well-centered the particles were picked, but often 6 pixels will do the job (translational searches in subsequent iterations are centered at the optimal translation in the previous one, so that particles may "move" much more than the original search range during the course of an entire refinement. Note that pre-centering prior to RELION refinement is not necessary, and also not recommended (it often messes up the Gaussian distribution of origin offsets).

Running tab

  • If one uses multi-core nodes, the use of myltiple threads (as many threads as cores on a machine) is recommended because the shared-memory parallelisation increases the amount of memory available per process. MPI is typically used for more scalable parallelisation over the different nodes. (In terms of CPU usage, MPI parallelisation is a bit more efficient than threads.)

An example: 10k ribosome test data set

Please see the Classification example for a full example of how to perform 3D classification in RELION.