Prepare input files: Difference between revisions

From Relion
Jump to navigation Jump to search
No edit summary
Line 83: Line 83:
== Reference images ==
== Reference images ==


2D class averaging is typically performed in an unsupervised manner, i.e. without user-provided references. 3D reconstruction does require a (single) 3D reference structure. This map should be provided in MRC or SPIDER format, and it should have the same dimensions as the input images. Take care that the pixel size (in Angstroms) matches that of the experimental images, as currently an internal magnification correction is '''not''' implemented. Because the Gaussian model used to calculate probabilities is based on the squared differences between the experimental images and projections of the reference, the absolute intensity scale (or grey-scale) of the reference map is relevant. However, RELION may correct for the greyscale internally at relatively small computational costs.
2D class averaging is typically performed in an unsupervised manner, i.e. without user-provided references. 3D classification or refinement does require a (single) 3D reference structure. This map should be provided in MRC or SPIDER format, and it should have the same dimensions as the input images. Take care that the pixel size (in Angstroms) matches that of the experimental images, as currently an internal magnification correction is '''not''' implemented. Because the Gaussian model used to calculate probabilities is based on the squared differences between the experimental images and projections of the reference, the absolute intensity scale (or grey-scale) of the reference map is relevant. However, RELION may correct for the greyscale internally at relatively small computational costs.


To limit model bias it is generally recommended to strongly low-pass filter your initial reference model. The Optimisation tab in the GUI has an entry to set an initial low-pass filter.
To limit model bias it is generally recommended to strongly low-pass filter your initial reference model. The Optimisation tab in the GUI has an entry to set an initial low-pass filter.

Revision as of 10:53, 19 July 2012

Experimental images

RELION will work best if your data are

  • Even-sized (Odd dimensions are not implemented)
  • Clean from false particles (no images are discarded during refinement).
    • Xmipp implements an image sorting utility called xmipp_sort_by_statistics that is very handy in the cleaning of a data set.
  • Unmasked (masking is performed internally)
  • Non-interpolated (prevent any prior rotations/translations: use the originally scanned pixel values)
    • If downscaling is necessary because of memory issues: use a window-operation in Fourier-space, not a convolution in real-space (e.g. with a rectangle/B-spline).
  • Uncorrected for CTF (this is done internally)
    • If your data have previously been phase-flipped, that's OK: just tell RELION about it
    • Actually, if you are not planning to correct for CTFs inside RELION (e.g. for negative stain data), phase-flipping is recommended.
    • If your data have previously been pre-Wiener filtered or pre-multiplied by their CTF, that's a bad thing to do: go back to the original data.
  • Normalised Make sure the average density in the background area is (approximately) zero!!. Also, the standard deviation in the noise should be (approximately) one.

And then, just like with any other refinement program, you might save yourself lots of trouble if your data have:

  • high signal-to-noise ratios (get the best possible structure by taking great care in sample preparation and data collection)

Recommended preprocessing procedure

  1. Get a MRC/SPIDER/IMAGIC stack of your clean, unfiltered, unmasked and non-CTF-corrected images (phase-flipping alone is OK). Often direct extraction from the original micrographs is the easiest and best option.
  2. If you want to correct for the CTF inside RELION (highly recommended for cryo-EM data, often less necessary in case of negative stain data) then create an input STAR file with the necessary CTF parameters for each particle
  3. Use the relion_preprocess command-line program (available from version 1.1 onwards) to re-scale and re-window your original stack to even-sized particles (if necessary) and to normalize them. The output MRC stack and STAR file from this program may directly be used for 2D or 3D refinement in RELION.

RELION reads the following image file formats:

  • MRC individual images (with extension .mrc)
  • MRC stacks (with extension .mrcs) (recommended)
  • SPIDER individual images (with extension .spi)
  • SPIDER stacks (with extension .spi)
  • IMAGIC stacks (with extensions .hed and .img)

Preparation of the images is explained on the Preprocess images page. Further note that images should be square (i.e. xdim=ydim).

If no CTF-correction is to be performed inside RELION, then a stack of images may be used directly as input (command line option --i). In that case, it is recommended that the images are CTF-phase flipped before refinement. If CTF-correction is to be performed inside RELION (recommended for cryo-data), then besides the images themselves, also metadata regarding the CTFs needs to be provided. In that case, the input to RELION is done using a STAR file (see below).

Metadata STAR files

The STAR file format is explained on the Conventions page. STAR files are easily readable plain text files, for which shell utilities like awk are very convenient. However, because not all users will be equally proficient in shell scripting, RELION comprises several shell script implementations to provide some basic operations with STAR files. See the STAR file utilities page for a description of these utilities, of which relion_star_loopheader, relion_star_datablock_stack and relion_star_datablock_singlefiles are used below.

Generate STAR files from separate stacks for each micrograph

If the input images are in a separate stack for each micrograph, then one could use the following commands to generate the input STAR file:

relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > my_images.star
relion_star_datablock_stack 4 mic1.mrcs mic1.mrcs 10000 10500 30 200 2 0.1  >> my_images.star
relion_star_datablock_stack 3 mic2.mrcs mic2.mrcs 21000 20500 25 200 2 0.1  >> my_images.star
relion_star_datablock_stack 2 mic3.mrcs mic3.mrcs 16000 15000 35 200 2 0.1  >> my_images.star

(Where the three stacks contain respectively 4, 3 and 2 images.) This would result in this STAR file that could be used directly as input into RELION. Note the rlnMicrographName label, and the repetition of the micrograph names on the datablock lines, which will lead to the inclusion of a unique rlnMicrographName for each micrograph. By doing so, distinct noise spectra will be estimated for each micrograph.

Generate STAR files from particles in single-file format

If the input images are in single-file format in distinct directories for each micrograph, then the commands would be:

relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > my_images.star
relion_star_datablock_singlefiles "mic1/*.spi" mic1 16000 15000 35 200 2 0.1  >> my_images.star
relion_star_datablock_singlefiles "mic2/*.spi" mic2 16000 15000 35 200 2 0.1  >> my_images.star
relion_star_datablock_singlefiles "mic3/*.spi" mic3 16000 15000 35 200 2 0.1  >> my_images.star

And the result would be this equivalent STAR file.

Generate STAR files from XMIPP-style CTFDAT files

To generate a STAR file from an XMIPP-style ctfdat file, one could use:

relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
relion_star_datablock_ctfdat all_images.ctfdat>>  all_images.star

Generate STAR files from FREALIGN-style .par files

To generate a STAR file from a FREALIGN-style .par file, one could use:

relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
awk '{if ($1!="C") {print $1"@./my/abs/path/bigstack.mrcs", $8, $9, $10, $11, " 80 2.0 0.1"}  }' < frealign.par >> all_images.star

Assuming the voltage is 80kV, the spherical aberration is 2.0 and the amplitude contrast is 0.1. Also, a single stack is assumed called: /my/abs/path/bigstack.mrcs.

Reference images

2D class averaging is typically performed in an unsupervised manner, i.e. without user-provided references. 3D classification or refinement does require a (single) 3D reference structure. This map should be provided in MRC or SPIDER format, and it should have the same dimensions as the input images. Take care that the pixel size (in Angstroms) matches that of the experimental images, as currently an internal magnification correction is not implemented. Because the Gaussian model used to calculate probabilities is based on the squared differences between the experimental images and projections of the reference, the absolute intensity scale (or grey-scale) of the reference map is relevant. However, RELION may correct for the greyscale internally at relatively small computational costs.

To limit model bias it is generally recommended to strongly low-pass filter your initial reference model. The Optimisation tab in the GUI has an entry to set an initial low-pass filter.

Important note for 3D classification

Although for 3D classification an external reference may work OK (as illustrated in the fully guided 3D classification example), one often gets better results by starting classification from a consensus model that was generated from the structurally heterogeneous data set itself. To that purpose, one may refine (for multiple iterations) the external reference (as a single class) against the entire data set. The resulting model may then be used to generate (low-pass filtered!) random seeds and classify the data using multiple classes.

Note that the option Ref. map is on absolute greyscale? in the GUI should then probably be set to No for the initial single-reference refinement, and then to Yes for the actual classification run.