PreProcessing: Difference between revisions

From Relion
Jump to navigation Jump to search
Line 23: Line 23:


* Select a few (2-5?) micrographs: e.g. including at least one with a higher defocus and one with a lower defocus.  
* Select a few (2-5?) micrographs: e.g. including at least one with a higher defocus and one with a lower defocus.  
* Make a STAR file with only these micrographs (you can use the relion_display program to do this: e.g. display the STAR file with all micrographs, sort on defocus, select the first and the last one, write out a new STAR file with the selected micrographs)
* Make a STAR file with only these micrographs. You can use the relion_display program to do this: e.g. display the STAR file with all micrographs, sort on defocus, select the first and the last one, write out a new STAR file with the selected micrographs (right-mouse click and <code>Save selected classes</code>).
* On this subset STAR file, run the autopicker as input in sequential mode (1 MPI proc. on the Running tab) with <code>"Write FOM maps? : Yes "</code>
* On this subset STAR file, run the autopicker as input in sequential mode (1 MPI proc. on the Running tab) with <code>"Write FOM maps? : Yes "</code>
* Use the display program to load the micrographs you selected for picking (tick <code>Pick particles..</code> on the display window, and use the same <code>picking rootname</code>. Decide on better values for the picking threshold and inter-particle distance for each of the selected micrographs, and try to find a compromise for all of them.
* Use the display program to load the micrographs you selected for picking (tick <code>Pick particles..</code> on the display window, and use the same <code>picking rootname</code>. Decide on better values for the picking threshold and inter-particle distance for each of the selected micrographs, and try to find a compromise for all of them.

Revision as of 12:22, 27 November 2013

As of version 1.1, the RELION GUI implements a semi-automated data preprocessing procedure. This procedures consists of three steps, each of which may be performed independently. The advantage of this procedure is ease-of-use (the output STAR files and images may directly be used for refinement) as well as speed (all time-consuming steps are MPI-parallelized).

If you have already performed your preprocessing in a different package (and you don't want to repeat it, despite the ease and speed of the Preprocessing procedure outlined below), then please see the Prepare_input_files page for more information.

Filling in the GUI

To run the semi-automated preprocessing, select the Preprocessing run-type from the drop-down menu at the top of the GUI.

I/O tab

Give a rootname for your output files and the general parameters of your microscope and detector setup. From version 1-3 on, also select a particle diameter (in Angstroms!) that comfortably contains your particle (you do not want to cut out any signal here).

CTFFIND tab

The relion_preprocess program will launch the CTFFIND3 executable as indicated on the GUI. See Niko Grigorieff's web site for details on CTFFIND3. Use wildcards (filenames with * or ?) to select multiple micrographs (in MRC format!) to work on, and then just provide all usual CTFFIND3 parameters. From version-1.3 it is recommended to use STAR files to indicate which micrographs to work on. A new jiffy called relion_star_all_micrographs may be used to generate a STAR file from linux wildcards to their filenames. In earlier versions you could only use such wildcards (filenames with * or ?) to select multiple micrographs. All micrographs should be in MRC format! Note that the micrographs should be in a subdirectory (e.g. called Micrographs/) of the project directory, i.e. the directory from where you are launching the GUI. If this is not the case, make a symbolic link to the directory where your micrographs are. A micrographname_ctffind3.com CTFFIND3 script will be written and executed for each micrograph. The CTFFIND3 results are written to micrographname_ctffind3.log files, and the typical MRC-format images with the Thon rings of the model and the micrograph are written as micrographname.ctf files. Inspect all of these in your favorite image viewer to confirm that the CTF estimation has gone OK (from version-1.3 you can use the relion_display program to do this, and even sort these images on the FOM reported by CTFFIND3 or on the defocus values). For those micrographs where CTFFIND3 did not find the correct CTF parameters, one could re-run CTFFIND3 with different parameters (either from the GUI using a STAR file with those micrographs that had gone wrong, or by editing and executing individual .com files for individual micrographs). The final STAR file that is written out by the whole procedure will search for the line containing Final Values in the micrographname.log file. One may therefore also edit these numbers to provide any user-informed value for the rlnDefocusU, rlnDefocusV and rlnDefocusAngle values if CTFFIND3 is somehow not capable of finding the right solution.

In some cases there are parts of the micrograph that are unsuitable for CTF estimation (e.g. labels on film). If that is the case, one may use the Estimate CTF on window size (pix) option to select the size of a squared window, placed at the center of the micrograph, that will be used for CTF-estimation. Note that windowed micrographs are automatically deleted after CTF estimation to safe disc space.

autopick tab

As of version 1.3, RELION implements reference-based automated particle picking. The autopicking works using procedures that are inspired by Alan Roseman's findEM. However, because it is based on probabilities (with Gaussian pdfs and thus squared difference terms) unlike findEM, the auto-picking in RELION is very sensitive to the intensity-scale of the references used for picking. Therefore, one would optimally search the micrographs with 2D class averages that were generated from the same (or a similar quality) data set. It is therefore not a good idea to use projections from an atomic model, or negative stain class averages to search for particles in your cryo-EM micrographs. For this reason, we typically manually pick (a few hundreds to several thousands) particles from a subset of the recorded micrographs, use these to calculate 2D class averages in a preliminary 2D classification run, and then use the best classes from that run to autopick in all micrographs.

Both the selection of micrographs on which to pick and the classes to use as references should be given as input STAR files. If the classes were calculated using CTF correction, then the input STAR file with the micrographs should also contain CTF information for each of the micrographs, and the same CTF settings should be used on this tab as were used in the 2D classification run. By limiting the resolution of the autopicking, for example to 25 Angstroms, one can prevent the pitfalls of "Einstein from noise". In general: if you cannot see the particles, then they are probably NOT there. In that case you'd better spend your time on making better samples and/or grids rather than on running RELION or any other single-particle analysis program. For most cases an in-plane rotational sampling of 5 degrees does the job.

There are two important parameters to optimise: a "picking threshold" and a "minimum inter-particle distance" (in Angstroms). The threshold ranges from 0 (pick everything as a particle) to 1 (pick very few particles). The inter-particle distance is often set to values of 50-75% of the particle diameter (but this may depend on the shape of your particles). Because it is very hard to predict the optimal values for these parameters (especially for the threshold), these typically need some tweeking. The program's computational cost scales linearly with the number of references (and the in-plane sampling) used. For a typical 4k x 4k micrograph, a 5-degree sampling (which is enough for most data sets) and ~10 references, we see computation times of around half an hour. That may be too slow to allow for a lot of tweeking of the parameters. Therefore, the option exists to write to disc so-called figure-of-merit (FOM) maps. These are intermediate results from which the actual particle positions are picked. Therefore, one can write out these maps in a first run, and then re-read the same maps in a series of subsequent runs (each of which would only take a few seconds) in order to find the best parameters. There is one drawback here: the FOM maps are very large files and there are many of them. Because your file system may become overloaded, the option to write FOM maps is not available in the parallel version of the autopicking program. Thereby, the recommended way of using this procedure becomes:

  • Select a few (2-5?) micrographs: e.g. including at least one with a higher defocus and one with a lower defocus.
  • Make a STAR file with only these micrographs. You can use the relion_display program to do this: e.g. display the STAR file with all micrographs, sort on defocus, select the first and the last one, write out a new STAR file with the selected micrographs (right-mouse click and Save selected classes).
  • On this subset STAR file, run the autopicker as input in sequential mode (1 MPI proc. on the Running tab) with "Write FOM maps? : Yes "
  • Use the display program to load the micrographs you selected for picking (tick Pick particles.. on the display window, and use the same picking rootname. Decide on better values for the picking threshold and inter-particle distance for each of the selected micrographs, and try to find a compromise for all of them.
  • Re-run the autopicking program, now with "Write FOM maps? : No " and "Read FOM maps? : Yes ". This will go very quick. If you left the display program of the micrographs open, you can just right-mouse click to select Reload coordinates.
  • Repeat the previous step until you have found values for the threshold and inter-particle distance that are a good compromise for all selected micrographs.
  • Delete the temporary FOM maps (rm Micrographs/*.spi) to free disk space
  • Then, finally run the autopicking program on a STAR file containing all micrographs (plus their CTF information if necessary) with "Write FOM maps? : No " and "Read FOM maps? : No " using as many MPI processes as deemed necessary.

extract tab

Again use wildcards (filenames with * or ?) to select all coordinate files for particle extraction, and then just provide a particle box size. The names of the coordinate files have to start with the same name as the micrographs, and may then optionally have additional characters e.g. mic1023_sideviews.coord for micrograph mic1023.mrc. Accepted formats for the coordinate files are XMIPP(2.4), EMAN BOXER, and XIMDISP. As of version 1.3, the preferred (and native) format for coordinate files is STAR.

Also as of version 1.3, RELION implements a particle sorting routine. It calculates difference images between extracted particles and their aligned (and CTF-convoluted) references, and bases Z-scores on the characteristics of these difference images (such as mean, standard deviation, skewness, excess kurtosis and rotational symmetry). The sorting program will add an extra column to the particle STAR file with the resulting Z-score. The relion_display program may then be used to display all particles, ordered on this column in the STAR file. By doing so, "good" particles tend to be on the top of the display, while "bad" particles tend to be at the bottom. The display program can then be used to select only the good ones and write these out in a new STAR file (right-mouse click and Save selected images).

operate tab

The extracted particles may be re-scaled, re-windowed, normalized, and have their contrast inverted (in that order). Keep re-scaled and re-windowed image sizes even numbers. Always normalize your particles, and use a reasonable radius for the circle around your particles outside of which the standard deviation and average values for the noise are calculated. If there are white or black artefacts on the micrographs (e.g. caused by dust or hot/dead pixels), these may be removed by using a positive value for the dust removal options. All black/white pixels with values above the given parameter times the standard deviation of the noise are replaced by random values from a Gaussian distribution. For cryo-EM data, values around 3.5-5 are often useful. Make sure you do not erase part of the true signal.

Running tab

The relion_preprocess program is MPI-parallelized: each MPI node will estimate CTF and extract particles for a subset of the selected micrographs/coordinate files.

Dividing your data into groups

If you use the above-explained semi-automated preprocessing procedure, your data will be divided into as many groups as there are micrographs. During refinement, for each group a different noise spectrum and signal scale factor is estimated independently. To get robust noise and signal estimates, make sure each group contains at least ~10-20 particles. If you have very few particles per micrograph, then you may want to combine multiple micrographs into one group (i.e. use the same rlnMicrographName for particles coming from multiple micrographs). If you do so, make sure you join micrographs with similar apparent signal-to-noise ratios. Often this means with similar defocus values, but do note that each particle may still have its own defocus values (CTF corrections are done per-particle, not per group).

As of version 1.3, particles may also be regrouped in a convenient manner using the relion_display program after an initial 2D classification run with the original groups. This works by displaying a model.star file in the display program, and ticking the "Regroup selected particles in number of groups: X" option, and providing the desired number of groups.