Recommended procedures: Difference between revisions

From Relion
Jump to navigation Jump to search
No edit summary
 
(39 intermediate revisions by the same user not shown)
Line 1: Line 1:
The following is what we typically do for each new data set for which we have a decent initial model. If you don't have an initial model: perform RCT, tomography+sub-tomogram averaging, or ''(if you really need to)'' common-lines procedures in a different program. For examples of how we used RELION, see [http://www2.mrc-lmb.cam.ac.uk/groups/scheres/structures.html our list of structures].
The following is what we typically do for each new data set for which we have a decent initial model. If you don't have an initial model: perform RCT, tomography+sub-tomogram averaging, or common-lines/stochastic procedures in a different program. For examples of how we used RELION, see [http://www2.mrc-lmb.cam.ac.uk/groups/scheres/structures.html our list of structures].




== Getting organised ==
== Getting organised ==


First of all, make sure you have access to a computing cluster. RELION may yield excellent results, but it does take some serious computing power. Perhaps smaller data sets may still be analysed on a single multi-core (e.g. a 16/32-core) Linux machine, but most data sets will benefit from access to at least 64 reasonably up-to-date Linux cores. (RELION does compile on a Mac (after minor tweeking), but there are not many OSX clusters out there...).
First of all, make sure you have access to a computing cluster. RELION may yield excellent results, but it does take some serious computing power. As of version 2.0, RELION has also been GPU-accelerated. This means you can do without a computer cluster, provided you have a workstation with (preferably 2 or more) suitable GPUs. RELION also compiles on a Mac.


Save all your micrographs in one or more subdirectories of the project directory (from where you'll launch the RELION GUI). We like to call these directories "Micrographs/" if all micrographs are in one directory, or "Micrographs/15jan13/" and "Micrographs/23jan13/" if they are in different directories (e.g. because they were collected on different dates). If you for some reason do not want to place your micrographs inside the RELION project directory, then inside the project directory you can also make a symbolic link to the directory where your micrographs are stored.
Save all your micrographs in one or more subdirectories of the project directory (from where you'll launch the RELION GUI). We like to call these directories "Micrographs/" if all micrographs are in one directory, or "Micrographs/15jan13/" and "Micrographs/23jan13/" if they are in different directories (e.g. because they were collected on different dates). If you for some reason do not want to place your micrographs inside the RELION project directory, then inside the project directory you can also make a symbolic link to the directory where your micrographs are stored.
If you have recorded any movies (e.g. from your direct-electron detector), then store each movie next to a single-frame micrograph that is the average of that movie. You will do your CTF estimation, particle picking and initial refinements and classifications using the average micrograph, and only use the actual movies in the later stages (see below). The naming convention is very important: strictly called the average micrograph with whatever name you like, but with a .mrc extension (e.g. mic001.mrc), and then call you movie with the same name; PLUS and underscore; PLUS a movie-identifier that you always keep the same; PLUS a .mrcs extension (e.g. mic001_movie.mrcs). See [[Process_movies | processing movies]] for more details.


== Particle selection & preprocessing==
== Particle selection & preprocessing==


We typically start by estimating the CTFs for all micrographs from the corresponding Tab in the GUI. Note that RELION uses Niko Grigorieff's [http://emlab.rose2.brandeis.edu/ctf ctffind3]] to do this. As of version-1.3, it is recommended to use STAR files to tell RELION on which micrographs to operate. This can be made using the "Micrograph inspection" tab on the GUI, where the input can be a linus wildcard (e.g. Micrographs/*.mrc) and the output a STAR file (e.g. all_micrographs.star). This utility also allows you to manually pick particles in your micrographs. After you have estimated the CTFs, you can also input the resulting STAR file with all CTF parameters for all micrographs into the "Micrograph inspection" utility in order to have a convenient viewer of the corresponding power spectra and CTF models. Before relion-1.3, our favourite programs for particle picking were [http://www2.mrc-lmb.cam.ac.uk/research/locally-developed-software/image-processing-software/ Ximdisp] and [http://blake.bcm.edu/emanwiki/EMAN2/Programs/e2boxer e2boxer.py]. Both output coordinates files can still be read directly by relion. Be careful at this stage: you are probably better at getting rid of bad/junk particles than any of the classification procedures below! So spend a decent amount of time on selecting good particles, be it manually or (semi-)automatically. BTW: if you cannot see your particles this probably means they are not there. In that case: don't bother to use RELION, or any other single-particle reconstruction program. You'll be better off spending your time on improving your sample.
We typically start by estimating the CTFs for all micrographs from the corresponding Tab in the GUI. Note that RELION uses Niko Grigorieff's [http://emlab.rose2.brandeis.edu/ctf ctffind3]] (or ctffind4 or gctf in relion-2.0) to do this.  
 
Manual particle picking may be performed using the "Micrograph inspection (v1.4) or "Manual picking (v2.0) job-types from the GUI. Before relion-1.3, our favourite programs for particle picking were [http://www2.mrc-lmb.cam.ac.uk/research/locally-developed-software/image-processing-software/ Ximdisp] and [http://blake.bcm.edu/emanwiki/EMAN2/Programs/e2boxer e2boxer.py]. Both output coordinates files can still be read directly by relion. Be careful at this stage: you are probably better at getting rid of bad/junk particles than any of the classification procedures below! So spend a decent amount of time on selecting good particles, be it manually or (semi-)automatically. BTW: if you cannot see your particles this probably means they are not there. In that case: don't bother to use RELION, or any other single-particle reconstruction program. You'll be better off spending your time on improving your sample.


Also from version 1.3, RELION implements reference-based automated particle picking. Typically, one first manually picks a subset of the available micrographs to obtain several hundreds to a few thousand particles. With these particles, one then performs an initial 2D classification run (see below). From the resulting class averages one selects the best and most representative views to be used as references in the autopicking program. Note that the reference-based auto-picking will work best when the class averages are on the same intensity-scale as the signal in your data: therefore it's best to generate the references from the data themselves, or at least from a similar data set. E.g. do not use negative-stain class averages to pick a cryo-EM data set: this will not work very well. For a 4kx4k micrograph and say ~10 references, the auto-picking will take approximately half an hour per micrograph. There are two parameters to be optimised: a threshold (higher value means fewer, better particles) and a minimum inter-particle distance. Because re-running half-an-hour calculations for every trial of these parameters would be too time-consuming, you may write out intermediate figure-of-merit (FOM) maps for each reference. After these have been written out, one can re-calculate new coordinate files in several seconds with different threshold and inter-particle distance parameters. However, because the FOM maps are many large files, one cannot run the autopicking program in parallel when writing out FOM maps (it could bring your file system down). Therefore, it is recommended to: 1) write the FOM maps only for a few (good and bad) micrographs in an initial, sequential run; 2) re-read those FOM maps in subsequent (very fast) runs to find the best threshold and inter-particle distance for those micrographs; 3) delete the FOM maps; and 4) run the autopicking in parallel for all micrographs using the optimised parameters (but without reading/writing FOM maps).  
Also from version 1.3, RELION implements reference-based automated particle picking. Typically, one first manually picks a subset of the available micrographs to obtain several hundreds to a few thousand particles. With these particles, one then performs an initial 2D classification run (see below). From the resulting class averages one selects the best and most representative views to be used as references in the autopicking program. Note that the reference-based auto-picking will work best when the class averages are on the same intensity-scale as the signal in your data: therefore it's best to generate the references from the data themselves, or at least from a similar data set. E.g. do not use negative-stain class averages to pick a cryo-EM data set: this will not work very well. For a 4kx4k micrograph and say ~10 references, the auto-picking will take approximately half an hour per micrograph. There are two parameters to be optimised: a threshold (higher value means fewer, better particles) and a minimum inter-particle distance. Because re-running half-an-hour calculations for every trial of these parameters would be too time-consuming, you may write out intermediate figure-of-merit (FOM) maps for each reference. After these have been written out, one can re-calculate new coordinate files in several seconds with different threshold and inter-particle distance parameters. However, because the FOM maps are many large files, one cannot run the autopicking program in parallel when writing out FOM maps (it could bring your file system down). Therefore, it is recommended to: 1) write the FOM maps only for a few (good and bad) micrographs in an initial, sequential run; 2) re-read those FOM maps in subsequent (very fast) runs to find the best threshold and inter-particle distance for those micrographs; 3) delete the FOM maps; and 4) run the autopicking in parallel for all micrographs using the optimised parameters (but without reading/writing FOM maps).  


After picking the particles, we use extract, normalize and invert contrast (if necessary to get white particles) using the extract and operate Tabs. When extracting autopicked particles, from version-1.3 one can also perform a sorting based on the remaining density after subtraction of the reference image from the extracted particles. The sorting program will write an additional column with a particle Z-score to the particles STAR file. The above-mentioned display program can then be used to display all the particles in this STAR file based on the Z-score value in that column. The same display program may then be used to select only the good particles and write out a new STAR file with those.  
After picking the particles, we extract, normalize and invert contrast (if necessary to get white particles) the particles. When extracting autopicked particles, from version-1.3 one can also perform a sorting based on the remaining density after subtraction of the reference image from the extracted particles. The sorting program will write an additional column with a particle Z-score to the particles STAR file. The above-mentioned display program can then be used to display all the particles in this STAR file based on the Z-score value in that column. The same display program may then be used to select only the good particles and write out a new STAR file with those.  


If you have few particles per micrograph, at this stage you may also want to consider grouping them, as indicated on this [[FAQs#How_can_I_group_micrographs_in_order_to_have_more_particles_per_group.3F | FAQ]].
If you have few particles per micrograph, at this stage you may also want to consider grouping them, as indicated on this [[FAQs#How_can_I_group_micrographs_in_order_to_have_more_particles_per_group.3F | FAQ]].
Line 22: Line 26:
== 2D class averaging ==
== 2D class averaging ==


We like to '''[[Calculate 2D class averages]]''' to get rid of bad/junk particles in the data set. Apart from choosing a suitable particle diameter (make sure you don't cutt off any real signal, but try to minimise the noise around your particle as well), the most important parameters are the number of classes (K) and the regularization parameter T. For cryo-EM we typically have at least 150-250 particles per class, so with 3,000 particles we would not use more than K=20 classes. Also, to limit computational costs, we rarely use more than say 150 classes even for large data sets. For negative stain, one can use fewer particles per class, say at least 50-100. For cryo-EM, we typically use T=2; while for negative stain we use values of 1-2. We typically do not touch the default sampling parameters.
We like to '''[[Calculate 2D class averages]]''' to get rid of bad/junk particles in the data set. Apart from choosing a suitable particle diameter (make sure you don't cutt off any real signal, but try to minimise the noise around your particle as well), the most important parameters are the number of classes (K) and the regularization parameter T. For cryo-EM we typically have at least 100-200 particles per class, so with 3,000 particles we would not use more than K=30 classes. Also, to limit computational costs, we rarely use more than say 250 classes even for large data sets. For negative stain, one can use fewer particles per class, say at least 25-50. For cryo-EM, we typically use T=2; while for negative stain we use values of T=1-2. We typically do not touch the default sampling parameters, perhaps with the exception of large icosahedral viruses where we sometimes use finer angular samplings.


Most 2D class averaging runs yield some classes that are highly populated (look for the data_model_classes table in the model.star files for class occupancies) and these classes typically show nice, relative high-resolution views of your complex in different orientations. Besides these good classes, there are often also many bad classes: these are typically bad/junk particles. Because junk particles do not average well together there are often few particles in each bad class, and the resolution of the corresponding class average is thus very low. These classes will look very ugly! We then use awk (see the [[FAQs#How_can_I_select_images_from_a_STAR_file.3F | FAQs page]] to make a smaller STAR file, from which all the bad classes are excluded. The reasoning behind this is that if particles do not average well with the others in 2D class averaging, they will also cause trouble in 3D refinement. From version-1.3, RELION implements a display program that may be used to display all the classes in a model.star file. In this program, one can sort the classes based on how many particles are in each class or on how accurately particles in the classes can be aligned. The display program can also be used to select the good classes and write out STAR files with only the particles in the selected classes, thereby providing a more convenient alternative to the above-mentioned awk command. To use this, just click the "Display" button on the GUI, and select a _model.star file from the run you want to examine. It is often useful to sort the images on rlnClassDistribution (clicking "reverse sort?" will put the largest classes at the top of the display.) Left-click to select classes (in a red box); right-click for a pop-up menu with lots of things, e.g. options to display or save STAR files with the inidivual particles inside the selected classes.  
Most 2D class averaging runs yield some classes that are highly populated (look for the data_model_classes table in the model.star files for class occupancies) and these classes typically show nice, relative high-resolution views of your complex in different orientations. Besides these good classes, there are often also many bad classes: these are typically bad/junk particles or particles with very close neighbours that prevent alignment. Because bad particles do not average well together there are often few particles in each bad class, and the resolution of the corresponding class average is thus very low. These classes will look very ugly! You can select select the good classes and discard the bad ones using the 'Subset selection' jobtype.


From version 1-3, RELION will also sort the particles after 2D (or 3D) classification (or refinement). This will add an additional column with a Z-score for each particle to the output data.star file (low values mean clean difference images between particles and references, high values mean not-clean). The relion_display program can then be used to display the images sorted on this Z-score. Thereby, junk particles that were somehow still assigned to selected classes in the procedure described above, can still be discarded at this point. The same display program may also be used to re-group particlesin a convenient manner. This works by displaying a model.star file in the display program, and ticking the "<code>Regroup selected particles in number of groups: X</code>" option, and providing the desired number of groups (X).  
A much faster (admittedly less powerful than, but often complementary to, 2D classification) way to select particles is to use 'Particle sorting'. The sorting program will add an additional column with a Z-score for each particle to the output data.star file. Low values mean clean difference images between particles and references, high values mean not-clean differences. The 'Subset selection' functionality can then be used to display the images sorted on this Z-score. Thereby, junk particles that were still in the data set can still be discarded at this point.  


Depending on how clean our data is, we some times repeat the process of 2D-class averaging to select good particles 2 or 3 times. Be patient, as 2D class averaging is somewhat slow in RELION... However, having a clean data set is an important factor in getting good 3D classification results.
Depending on how clean our data is, we some times repeat the process of 2D-class averaging (possibly combined with sorting) to select good particles 2 or 3 times. Having a clean data set is an important factor in getting good 3D reconstructions.


== 3D classification ==
== 3D classification ==


Once we're happy with our data cleaning in 2D, we almost always '''[[Classify 3D structural heterogeneity]]'''. Remember: ALL data sets are heterogeneous! It is therefore always worth checking to what extent this is the case in your data set. At this stage we use our initial model for the first time. Remember, if it is not reconstructed from the same data set in RELION or XMIPP, it is probably NOT on the correct grey scale. Also, if it is not reconstructed with CTF correction in RELION or it is not made from a PDB file, then one should probably also set "Has reference been CTF corrected?" to No.  We prefer to start from relatively harsh initial low-pass filters (often 40-60 Angstrom), and typically perform 25 iterations with a regularization factor T=4 for cryo-EM; and T=2-4 for negative stain. (But remember: classifying stain is often a pain due to variations in staining.) For cryo-EM, we prefer to have at least (on average) 5,000-10,000 particles per class. For negative stain, fewer particles per class may be used. We typically do not touch the default sampling parameters, except perhaps for icosahedral viruses where we use finer angular samplings.
Once we're happy with our data cleaning in 2D, we almost always '''[[Classify 3D structural heterogeneity]]'''. Remember: ALL data sets are heterogeneous! It is therefore always worth checking to what extent this is the case in your data set. At this stage we use our initial model for the first time. Remember, if it is not reconstructed from the same data set in RELION or XMIPP, it is probably NOT on the correct grey scale. Also, if it is not reconstructed with CTF correction in RELION or it is not made from a PDB file, then one should probably also set "Has reference been CTF corrected?" to No.  We prefer to start from relatively harsh initial low-pass filters (often 40-60 Angstrom), and typically perform 25 iterations with a regularization factor T=4 for cryo-EM; and T=2-4 for negative stain. (But remember: often ''classifying stain is a pain'', due to variations in staining.) For cryo-EM, we prefer to have at least (on average) 5,000-10,000 particles per class. For negative stain, fewer particles per class may be used. We typically do not touch the default sampling parameters, except perhaps for icosahedral viruses where we use finer angular samplings.
 
After classification, we select particles for each structural state of interest. Similarly-looking classes may be considered as one structural state at this point. Difference maps (after alignment of the maps in for example Chimera) are a useful tool to decide whether two maps are similar or not. In some cases, most often with large data sets, one may choose to further classify separate classes in an additional classification run.


After classification, we use the same awk command as above to generate separate STAR files for each structural state of interest. Similarly-looking classes may be considered as one structural state at this point. Difference maps (after alignment of the maps in for example Chimera) are a useful tool to decide whether two maps are similar or not. In some cases, most often with large data sets, one may choose to further classify separate classes in an additional classification run.
An overview of our favourite ways of classifying structurally heterogeneous data sets is available [ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relionreview2016.pdf here].


== 3D refinement ==
== 3D refinement ==


The 3D classes of interest are each refined separately using the '''[[Refine_a_structure_to_high-resolution | 3D-auto-refine procedure]]'''. We often use the refined map of the corresponding class as the initial model (or sometimes the original initial model) and we start refinement again from a rather harsh initial low-pass filter, often 40-60 Angstroms. We typically do not touch the default sampling parameters, except for icosahedral viruses where we may start from 3.7 degrees angular sampling and we perform local searches from 0.9 degrees onwards. After 3D refinement, we sharpen the map based on the unfiltered maps that are written out from release 1.2 onwards, as explained on the [[Analyse_results#Getting_higher_resolution_and_map_sharpening | Analyse results]] page.
Each of the 3D classes of interest may be refined separately using the '''[[Refine_a_structure_to_high-resolution | 3D-auto-refine procedure]]'''. We often use the refined map of the corresponding class as the initial model (or sometimes the original initial model) and we start refinement again from a rather harsh initial low-pass filter, often 40-60 Angstroms. We typically do not touch the default sampling parameters, except for icosahedral viruses where we may start from 3.7 degrees angular sampling and we perform local searches from 0.9 degrees onwards. After 3D refinement, we sharpen the map and calculate solvent-mask corrected resolution estimates using 'Post-processing', as explained on the [[Analyse_results#Getting_higher_resolution_and_map_sharpening | Analyse results]] page. '''You will probably like this map and resolution estimate much better than the one that comes straight out of the refinement, so don't skip this step...'''


Sometimes, in our hands often with ribosomes, we actually first refine all images against a single model in the 3D auto-refine procedure, and after that perform 3D classification with a fine angular sampling (e.g. 1.8 degrees) and local angular searches (e.g. with a search range of 5 degrees). This only work if the orientations of particles assigned based on a single "consensus" reference are not very different from the orientations assigned based on the properly classified maps (which is usually the case for ribosomes). After this classification, we then again perform a 3D auto-refine on each of the relevant classes.
Alternatively, we first combine all suitable particles in a single "consensus" 3D auto-refine procedure. We then use the result to perform movie-processing (see below) on this large data set, and perform extensive classification with the resulting shiny particles afterwards. This procedure only works if the different structures in the data are not too dissimilar.


== Movie refinement ==
== Movie refinement ==


We now collect all our data as movies on our fast direct-electron camera, and boost the resolution of our final map by '''[[Process_movies | processing movies]]'''.
We now collect all our data as movies on our fast direct-electron camera, and boost the resolution of our final map by '''[[Process_movies | processing movies]]'''.  
 
== Post-processing ==


As of version 1.2, RELION has a new program for semi-automated postprocessing, called <code>relion_postprocess</code>. It may be used for automated masking, MTF and B-factor sharpening. It will re-estimate resolution after masking (because inside RELION refinement the resolution is always estimated without masking, which slightly under-estimates the true resolution) by a procedure described in Chen et al., Ultramicroscopy, in press. The resulting map is sharpened and filtered at the correct resolution, and may thus be used directly for visualization in your favourite 3D viewer. See the [http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Analyse_results#Getting_higher_resolution_and_map_sharpening Analyse Results section] for more details. '''You will probably like this map and resolution estimate much better than the one that comes straight out of the refinement...'''
This movie-processing procedure was originally introduced for the refinement of relatively large particles (e.g. ribosomes or icosahedral viruses), for which orientation may still be determined more-or-less accurately in running averages of only a few (e.g. 3, 5 or 7) movie frames. As of the 1.3 release, RELION implements a movie-processing approach that is also suitable for smaller particles (say sub-MDa). It is called '''[[http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Process_movies#Particle_polishing particle polishing]]''', and it builds on a run of the original movie-processing procedure, as described [[Process_movies|here]]. However, this movie-processing step may be performed much faster by omitting the rotational searches from the movie-particle alignments, as may be selected from the Movie-tab on the 3D auto-refine job-type. The particle polishing program outputs a STAR file and new stacks with particles in which the movie-frames have been aligned and a resolution-dependent radiation damage model has been employed. These ''polished'' or ''shiny'' particles have increased signal-to-noise ratios compared to the original averaged particles. This means they may probably be re-classified and refined better than the original particles. Therefore, whereas the original movie-processing approach (e.g. for ribosomes) was often performed at the very end of the image processing (when a single homogeneous subset of the data had been identified), the particle polishing approach is often performed earlier on and on larger data sets (when junk particles have been discarded, but possibly BEFORE a single conformation has been classified out), which are subsequently classified using the polished particles. In any case, after particle polishing, one should ALWAYS re-run the 3D auto-refinement in order to get the highest possible resolution.


== Afterwards ==
== Local-resolution estimation ==


If this is useful for you, please cite RELION (either [http://dx.doi.org/10.1016/j.jmb.2011.11.010 Scheres (2012) JMB] or [http://dx.doi.org/10.1016/j.jsb.2012.09.006 Scheres (2012) JSB]). The relevance of the 0.143 criterion for the gold-standard FSCs used in RELION is described in [http://dx.doi.org/10.1038/nmeth.2115 Scheres & Chen (2012) Nat. Meth.] Movie processing is described in [http://www2.mrc-lmb.cam.ac.uk/groups/scheres/publications.html Bai et al 2013 eLife].
Global resolution is often not so relevant, as unresolved structural heterogeneity or limited orientational accuracies may blur the maps in some regions more than in others. To assess local resolution variations, as of version 1.3 RELION has a wrapper to the [http://resmap.sourceforge.net ResMap] program from Alp Kucukelbir. You will need to install ResMap yourself, but you will then be able to launch it through the RELION GUI, which will use the two unfil.mrc half-maps from a 3D-auto-refine run to estimate the local resolution. The results from the ResMap program may then be visualised using the Volume Data -> Surface Color option in [http://www.cgl.ucsf.edu/chimera/ UCSF Chimera ].

Latest revision as of 15:01, 27 September 2016

The following is what we typically do for each new data set for which we have a decent initial model. If you don't have an initial model: perform RCT, tomography+sub-tomogram averaging, or common-lines/stochastic procedures in a different program. For examples of how we used RELION, see our list of structures.


Getting organised

First of all, make sure you have access to a computing cluster. RELION may yield excellent results, but it does take some serious computing power. As of version 2.0, RELION has also been GPU-accelerated. This means you can do without a computer cluster, provided you have a workstation with (preferably 2 or more) suitable GPUs. RELION also compiles on a Mac.

Save all your micrographs in one or more subdirectories of the project directory (from where you'll launch the RELION GUI). We like to call these directories "Micrographs/" if all micrographs are in one directory, or "Micrographs/15jan13/" and "Micrographs/23jan13/" if they are in different directories (e.g. because they were collected on different dates). If you for some reason do not want to place your micrographs inside the RELION project directory, then inside the project directory you can also make a symbolic link to the directory where your micrographs are stored.

If you have recorded any movies (e.g. from your direct-electron detector), then store each movie next to a single-frame micrograph that is the average of that movie. You will do your CTF estimation, particle picking and initial refinements and classifications using the average micrograph, and only use the actual movies in the later stages (see below). The naming convention is very important: strictly called the average micrograph with whatever name you like, but with a .mrc extension (e.g. mic001.mrc), and then call you movie with the same name; PLUS and underscore; PLUS a movie-identifier that you always keep the same; PLUS a .mrcs extension (e.g. mic001_movie.mrcs). See processing movies for more details.

Particle selection & preprocessing

We typically start by estimating the CTFs for all micrographs from the corresponding Tab in the GUI. Note that RELION uses Niko Grigorieff's ctffind3] (or ctffind4 or gctf in relion-2.0) to do this.

Manual particle picking may be performed using the "Micrograph inspection (v1.4) or "Manual picking (v2.0) job-types from the GUI. Before relion-1.3, our favourite programs for particle picking were Ximdisp and e2boxer.py. Both output coordinates files can still be read directly by relion. Be careful at this stage: you are probably better at getting rid of bad/junk particles than any of the classification procedures below! So spend a decent amount of time on selecting good particles, be it manually or (semi-)automatically. BTW: if you cannot see your particles this probably means they are not there. In that case: don't bother to use RELION, or any other single-particle reconstruction program. You'll be better off spending your time on improving your sample.

Also from version 1.3, RELION implements reference-based automated particle picking. Typically, one first manually picks a subset of the available micrographs to obtain several hundreds to a few thousand particles. With these particles, one then performs an initial 2D classification run (see below). From the resulting class averages one selects the best and most representative views to be used as references in the autopicking program. Note that the reference-based auto-picking will work best when the class averages are on the same intensity-scale as the signal in your data: therefore it's best to generate the references from the data themselves, or at least from a similar data set. E.g. do not use negative-stain class averages to pick a cryo-EM data set: this will not work very well. For a 4kx4k micrograph and say ~10 references, the auto-picking will take approximately half an hour per micrograph. There are two parameters to be optimised: a threshold (higher value means fewer, better particles) and a minimum inter-particle distance. Because re-running half-an-hour calculations for every trial of these parameters would be too time-consuming, you may write out intermediate figure-of-merit (FOM) maps for each reference. After these have been written out, one can re-calculate new coordinate files in several seconds with different threshold and inter-particle distance parameters. However, because the FOM maps are many large files, one cannot run the autopicking program in parallel when writing out FOM maps (it could bring your file system down). Therefore, it is recommended to: 1) write the FOM maps only for a few (good and bad) micrographs in an initial, sequential run; 2) re-read those FOM maps in subsequent (very fast) runs to find the best threshold and inter-particle distance for those micrographs; 3) delete the FOM maps; and 4) run the autopicking in parallel for all micrographs using the optimised parameters (but without reading/writing FOM maps).

After picking the particles, we extract, normalize and invert contrast (if necessary to get white particles) the particles. When extracting autopicked particles, from version-1.3 one can also perform a sorting based on the remaining density after subtraction of the reference image from the extracted particles. The sorting program will write an additional column with a particle Z-score to the particles STAR file. The above-mentioned display program can then be used to display all the particles in this STAR file based on the Z-score value in that column. The same display program may then be used to select only the good particles and write out a new STAR file with those.

If you have few particles per micrograph, at this stage you may also want to consider grouping them, as indicated on this FAQ.

If you experience any type of problem with RELION when using particles that were extracted (and/or preprocessed) by another program, then before reporting problems to us, PLEASE first try using the entire CTF estimation and particle extraction procedures through the RELION GUI. Re-doing your preprocesing inside RELION is very fast (it's fully parallelized); it is the most convenient way to prepare the correct STAR input files for you; and it prepares the images as is best for RELION (which may differ from what is best for your other program). It is therefore very likely your least painful route to the best structure you could get out of RELION...

2D class averaging

We like to Calculate 2D class averages to get rid of bad/junk particles in the data set. Apart from choosing a suitable particle diameter (make sure you don't cutt off any real signal, but try to minimise the noise around your particle as well), the most important parameters are the number of classes (K) and the regularization parameter T. For cryo-EM we typically have at least 100-200 particles per class, so with 3,000 particles we would not use more than K=30 classes. Also, to limit computational costs, we rarely use more than say 250 classes even for large data sets. For negative stain, one can use fewer particles per class, say at least 25-50. For cryo-EM, we typically use T=2; while for negative stain we use values of T=1-2. We typically do not touch the default sampling parameters, perhaps with the exception of large icosahedral viruses where we sometimes use finer angular samplings.

Most 2D class averaging runs yield some classes that are highly populated (look for the data_model_classes table in the model.star files for class occupancies) and these classes typically show nice, relative high-resolution views of your complex in different orientations. Besides these good classes, there are often also many bad classes: these are typically bad/junk particles or particles with very close neighbours that prevent alignment. Because bad particles do not average well together there are often few particles in each bad class, and the resolution of the corresponding class average is thus very low. These classes will look very ugly! You can select select the good classes and discard the bad ones using the 'Subset selection' jobtype.

A much faster (admittedly less powerful than, but often complementary to, 2D classification) way to select particles is to use 'Particle sorting'. The sorting program will add an additional column with a Z-score for each particle to the output data.star file. Low values mean clean difference images between particles and references, high values mean not-clean differences. The 'Subset selection' functionality can then be used to display the images sorted on this Z-score. Thereby, junk particles that were still in the data set can still be discarded at this point.

Depending on how clean our data is, we some times repeat the process of 2D-class averaging (possibly combined with sorting) to select good particles 2 or 3 times. Having a clean data set is an important factor in getting good 3D reconstructions.

3D classification

Once we're happy with our data cleaning in 2D, we almost always Classify 3D structural heterogeneity. Remember: ALL data sets are heterogeneous! It is therefore always worth checking to what extent this is the case in your data set. At this stage we use our initial model for the first time. Remember, if it is not reconstructed from the same data set in RELION or XMIPP, it is probably NOT on the correct grey scale. Also, if it is not reconstructed with CTF correction in RELION or it is not made from a PDB file, then one should probably also set "Has reference been CTF corrected?" to No. We prefer to start from relatively harsh initial low-pass filters (often 40-60 Angstrom), and typically perform 25 iterations with a regularization factor T=4 for cryo-EM; and T=2-4 for negative stain. (But remember: often classifying stain is a pain, due to variations in staining.) For cryo-EM, we prefer to have at least (on average) 5,000-10,000 particles per class. For negative stain, fewer particles per class may be used. We typically do not touch the default sampling parameters, except perhaps for icosahedral viruses where we use finer angular samplings.

After classification, we select particles for each structural state of interest. Similarly-looking classes may be considered as one structural state at this point. Difference maps (after alignment of the maps in for example Chimera) are a useful tool to decide whether two maps are similar or not. In some cases, most often with large data sets, one may choose to further classify separate classes in an additional classification run.

An overview of our favourite ways of classifying structurally heterogeneous data sets is available here.

3D refinement

Each of the 3D classes of interest may be refined separately using the 3D-auto-refine procedure. We often use the refined map of the corresponding class as the initial model (or sometimes the original initial model) and we start refinement again from a rather harsh initial low-pass filter, often 40-60 Angstroms. We typically do not touch the default sampling parameters, except for icosahedral viruses where we may start from 3.7 degrees angular sampling and we perform local searches from 0.9 degrees onwards. After 3D refinement, we sharpen the map and calculate solvent-mask corrected resolution estimates using 'Post-processing', as explained on the Analyse results page. You will probably like this map and resolution estimate much better than the one that comes straight out of the refinement, so don't skip this step...

Alternatively, we first combine all suitable particles in a single "consensus" 3D auto-refine procedure. We then use the result to perform movie-processing (see below) on this large data set, and perform extensive classification with the resulting shiny particles afterwards. This procedure only works if the different structures in the data are not too dissimilar.

Movie refinement

We now collect all our data as movies on our fast direct-electron camera, and boost the resolution of our final map by processing movies.

This movie-processing procedure was originally introduced for the refinement of relatively large particles (e.g. ribosomes or icosahedral viruses), for which orientation may still be determined more-or-less accurately in running averages of only a few (e.g. 3, 5 or 7) movie frames. As of the 1.3 release, RELION implements a movie-processing approach that is also suitable for smaller particles (say sub-MDa). It is called [particle polishing], and it builds on a run of the original movie-processing procedure, as described here. However, this movie-processing step may be performed much faster by omitting the rotational searches from the movie-particle alignments, as may be selected from the Movie-tab on the 3D auto-refine job-type. The particle polishing program outputs a STAR file and new stacks with particles in which the movie-frames have been aligned and a resolution-dependent radiation damage model has been employed. These polished or shiny particles have increased signal-to-noise ratios compared to the original averaged particles. This means they may probably be re-classified and refined better than the original particles. Therefore, whereas the original movie-processing approach (e.g. for ribosomes) was often performed at the very end of the image processing (when a single homogeneous subset of the data had been identified), the particle polishing approach is often performed earlier on and on larger data sets (when junk particles have been discarded, but possibly BEFORE a single conformation has been classified out), which are subsequently classified using the polished particles. In any case, after particle polishing, one should ALWAYS re-run the 3D auto-refinement in order to get the highest possible resolution.

Local-resolution estimation

Global resolution is often not so relevant, as unresolved structural heterogeneity or limited orientational accuracies may blur the maps in some regions more than in others. To assess local resolution variations, as of version 1.3 RELION has a wrapper to the ResMap program from Alp Kucukelbir. You will need to install ResMap yourself, but you will then be able to launch it through the RELION GUI, which will use the two unfil.mrc half-maps from a 3D-auto-refine run to estimate the local resolution. The results from the ResMap program may then be visualised using the Volume Data -> Surface Color option in UCSF Chimera .