PreProcessing

From Relion
Jump to navigation Jump to search

As of version 1.1, the RELION GUI implements a semi-automated data preprocessing procedure. This procedures consists of three steps, each of which may be performed independently. The advantage of this procedure is ease-of-use (the output STAR files and images may directly be used for refinement) as well as speed (all time-consuming steps are MPI-parallelized).

If you have already performed your preprocessing in a different package (and you don't want to repeat it, despite the ease and speed of the Preprocessing procedure outlined below), then please see the Prepare_input_files page for more information.

Filling in the GUI

To run the semi-automated preprocessing, select the Preprocessing run-type from the drop-down menu at the top of the GUI.

I/O tab

Give a rootname for your output files and the general parameters of your microscope and detector setup. From version 1-3 on, also select a particle diameter (in Angstroms!) that comfortably contains your particle (you do not want to cut out any signal here).

CTFFIND tab

The relion_preprocess program will launch the CTFFIND3 executable as indicated on the GUI. See Niko Grigorieff's web site for details on CTFFIND3. Use wildcards (filenames with * or ?) to select multiple micrographs (in MRC format!) to work on, and then just provide all usual CTFFIND3 parameters. Note that the micrographs should be in a subdirectory (e.g. called Micrographs/) of the project directory, i.e. the directory from where you are launching the GUI. If this is not the case, make a symbolic link to the directory where your micrographs are. A micrographname.com CTFFIND3 script will be written and executed for each micrograph. The CTFFIND3 results are written to micrographname.log files, and the typical MRC-format images with the Thon rings of the model and the micrograph are written as micrographname.ctf files. Inspect all of these in your favorite image viewer to confirm that the CTF estimation has gone OK. If not, one could rerun (all, one or a selection of multiple micrographs) using different CTFFIND3 parameters in the RELION GUI, or one may edit the micrographname.com CTFFIND3 script and rerun it for individual micrographs. The final STAR file that is written out by the whole procedure will search for the line containing Final Values in the micrographname.log file. One may therefore also edit these numbers to provide any arbitrary value for the rlnDefocusU, rlnDefocusV and rlnDefocusAngle values.

In some cases there are parts of the micrograph that are unsuitable for CTF estimation (e.g. labels on film). If that is the case, one may use the Estimate CTF on window size (pix) option to select the size of a squared window, placed at the center of the micrograph, that will be used for CTF-estimation. Note that windowed micrographs are automatically deleted after CTF estimation to safe disc space.

extract tab

Again use wildcards (filenames with * or ?) to select all coordinate files for particle extraction, and then just provide a particle box size. The names of the coordinate files have to start with the same name as the micrographs, and may then optionally have additional characters e.g. mic1023_sideviews.coord for micrograph mic1023.mrc. Accepted formats for the coordinate files are XMIPP(2.4), EMAN BOXER, and XIMDISP.

operate tab

The extracted particles may be re-scaled, re-windowed, normalized, and have their contrast inverted (in that order). Keep re-scaled and re-windowed image sizes even numbers. Always normalize your particles, and use a reasonable radius for the circle around your particles outside of which the standard deviation and average values for the noise are calculated. If there are white or black artefacts on the micrographs (e.g. caused by dust or hot/dead pixels), these may be removed by using a positive value for the dust removal options. All black/white pixels with values above the given parameter times the standard deviation of the noise are replaced by random values from a Gaussian distribution. For cryo-EM data, values around 3.5-5 are often useful. Make sure you do not erase part of the true signal.

Running tab

The relion_preprocess program is MPI-parallelized: each MPI node will estimate CTF and extract particles for a subset of the selected micrographs/coordinate files.

Dividing your data into groups

If you use the above-explained semi-automated preprocessing procedure, your data will be divided into as many groups as there are micrographs. During refinement, for each group a different noise spectrum and signal scale factor is estimated independently. To get robust noise and signal estimates, make sure each group contains at least ~10-20 particles. If you have very few particles per micrograph, then you may want to combine multiple micrographs into one group (i.e. use the same rlnMicrographName for particles coming from multiple micrographs). If you do so, make sure you join micrographs with similar apparent signal-to-noise ratios. Often this means with similar defocus values, but do note that each particle may still have its own defocus values (CTF corrections are done per-particle, not per group).