Prepare input files: Difference between revisions

From Relion
Jump to navigation Jump to search
No edit summary
 
(35 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__TOC__
__TOC__
= Experimental images =
=== RELION will work best if your data are ===


= Recommended image preprocessing =
'''The procedures outlined below are suited if you have already done your preprocessing in another program package and you want to use the stack(s) of particles created by that package. If you are starting fresh from a set of raw frames, you may find the new (as of release-1.1) [[PreProcessing|preprocess run-type inside the RELION GUI]] more convenient.'''
== General comments on data quality ==
RELION will work best if your input images are:
* '''Even-sized''' (Odd dimensions are not implemented)
* '''Even-sized''' (Odd dimensions are not implemented)
* '''Clean''' from false particles (no images are discarded during refinement).  
* '''Square''' i.e. <code>xdim=ydim</code>
** Xmipp implements an image sorting utility called <code>xmipp_sort_by_statistics</code> that is ''very'' handy in the cleaning of a data set.  
* '''Clean''' from false particles (note that RELION does NOT discard any images during refinement).  
* '''Unmasked''' (masking is performed internally)
* '''Unmasked''' (masking is performed internally)
* '''Non-interpolated''' (prevent any prior rotations/translations: use the originally scanned pixel values)
* '''Non-interpolated''' (prevent any prior rotations/translations: use the originally scanned pixel values)
Line 15: Line 19:
* '''Normalised''' ''Make sure the average density in the background area is (approximately) zero!!''. Also, the standard deviation in the noise should be (approximately) one.
* '''Normalised''' ''Make sure the average density in the background area is (approximately) zero!!''. Also, the standard deviation in the noise should be (approximately) one.


And then, just like with any other refinement program, you might save yourself lots of trouble if your data have:
== Negative stain data ==
 
* '''high signal-to-noise''' ratios (get the best possible structure by taking great care in sample preparation and data collection)
 
=== Recommended preprocessing procedure ===


# Get a MRC/SPIDER/IMAGIC stack of your clean, unfiltered, unmasked and non-CTF-corrected images (phase-flipping alone is OK). Often direct extraction from the original micrographs is the easiest and best option.  
The use of CTF phase-flipped particles without further CTF correction inside RELION is the recommended procedure for negative stain data:
#  If you want to correct for the CTF inside RELION (highly recommended for cryo-EM data, often less necessary in case of negative stain data) then create an input STAR file with the necessary CTF parameters for each particle
# Get a MRC/SPIDER/IMAGIC stack of your clean, unfiltered, unmasked and phase-flipped particles. Often direct extraction from phase-flipped micrographs is the easiest and best option. Alternatively, extract the particles from the raw micrographs and perform phase-flipping on the extracted particles. Use your favourite package to do this, e.g. EMAN(2), XMIPP, IMAGIC, SPIDER, ''etc''. 
# Use the <code>relion_preprocess</code> command-line program (available from version 1.1 onwards) to re-scale and re-window your original stack to even-sized particles (if necessary) and to normalize them. The output MRC stack and STAR file from this program may directly be used for 2D or 3D refinement in RELION.
# Input this stack into the <code>relion_preprocess</code> command-line program (available from version 1.1 onwards) to re-scale and re-window your original particles to even-size (if necessary) and to normalize them. The output MRC stack (or STAR file) can be directly used as input for RELION refinement.


RELION reads the following image file formats:
== cryo-EM data ==


* MRC individual images (with extension <code>.mrc</code>)
For cryo-EM data, full CTF-correction (including amplitudes and phases) inside RELION is highly recommended. Therefore, the recommended procedure involves telling RELION about the CTF parameters of each particle:
* MRC stacks (with extension <code>.mrcs</code>) (''recommended'')
# Get one or multiple MRC/SPIDER/IMAGIC stack(s) of your clean, unfiltered, unmasked and non-CTF-corrected (only phase-flipping is also OK). Often direct extraction from the raw micrographs is the easiest and best option.  
* SPIDER individual images (with extension <code>.spi</code>)
# Create a STAR file that contains the necessary CTF parameters (rlnDefocusU, rlnDefocusV, rlnDefocusAngle, rlnVoltage, rlnSphericalAberration, rlnAmplitudeContrast and rlnMicrographName) for all particles. The next section explains how to do this.
* SPIDER stacks (with extension <code>.spi</code>)
# Input this STAR file into the <code>relion_preprocess</code> command-line program (available from version 1.1 onwards) to re-scale and re-window your original particles to even-size (if necessary) and to normalize them. The output STAR file can be directly used as input for RELION refinement.
* IMAGIC stacks (with extensions <code>.hed</code> and <code>.img</code>)
Preparation of the images is explained on the [[Preprocess images]] page. Further note that images should be square (i.e. <code>xdim=ydim</code>).


'''If no CTF-correction is to be performed inside RELION, then a stack of images may be used directly as input (command line option <code>--i</code>). In that case, it is recommended that the images are CTF-phase flipped before refinement. If CTF-correction is to be performed inside RELION (recommended for cryo-data), then besides the images themselves, also metadata regarding the CTFs needs to be provided. In that case, the input to RELION is done using a STAR file (see below).'''
== Dividing your data into groups ==
For refinement inside RELION you may divide your data into groups, so that a different noise spectrum and signal scale factor is estimated for each group independently. This is typically not necessary for refinements with negative stain data, but highly recommended when using cryo-EM data. Usually, each micrograph comprises one group. Therefore, the division into groups is based on the rlnMicrographName column in the input STAR file. However, to get robust noise and signal estimates, make sure each group contains at least ~10-20 particles. If you have very few particles per micrograph, then you may want to combine multiple micrographs into one group (i.e. use the same rlnMicrographName for particles coming from multiple micrographs). If you do so, make sure you join micrographs with similar apparent signal-to-noise ratios. Often this means with similar defocus values, but do note that each particle may still have its own defocus values (CTF corrections are done per-particle, not per group).


== Metadata STAR files ==
= Creating input STAR files =


The STAR file format is explained on the [[Conventions#The_STAR_format | Conventions]] page. STAR files are easily readable plain text files, for which shell utilities like <code>awk</code> are very convenient. However, because not all users will be equally proficient in shell scripting, RELION comprises several shell script implementations to provide some basic operations with STAR files. See the [[STAR file utilities]] page for a description of these utilities, of which <code>relion_star_loopheader</code>, <code>relion_star_datablock_stack</code> and <code>relion_star_datablock_singlefiles</code> are used below.
The STAR file format is explained on the [[Conventions#The_STAR_format | Conventions]] page. STAR files are easily readable plain text files, for which shell utilities like <code>awk</code> are very convenient. However, because not all users will be equally proficient in shell scripting, RELION comprises several shell script implementations to provide some basic operations with STAR files. See the [[STAR file utilities]] page for a description of these utilities, of which <code>relion_star_loopheader</code>, <code>relion_star_datablock_stack</code> and <code>relion_star_datablock_singlefiles</code> are used below.


=== Generate STAR files from separate stacks for each micrograph ===
== Generate STAR files from separate stacks for each micrograph ==
If the input images are in a separate stack for each micrograph, then one could use the following commands to generate the input STAR file:
If the input images are in a separate stack for each micrograph, then one could use the following commands to generate the input STAR file:


<code>
 
  relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > my_images.star
  relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > my_images.star
  relion_star_datablock_stack 4 mic1.mrcs mic1.mrcs 10000 10500 30 200 2 0.1  >> my_images.star
  relion_star_datablock_stack 4 mic1.mrcs mic1.mrcs 10000 10500 30 200 2 0.1  >> my_images.star
  relion_star_datablock_stack 3 mic2.mrcs mic2.mrcs 21000 20500 25 200 2 0.1  >> my_images.star
  relion_star_datablock_stack 3 mic2.mrcs mic2.mrcs 21000 20500 25 200 2 0.1  >> my_images.star
  relion_star_datablock_stack 2 mic3.mrcs mic3.mrcs 16000 15000 35 200 2 0.1  >> my_images.star
  relion_star_datablock_stack 2 mic3.mrcs mic3.mrcs 16000 15000 35 200 2 0.1  >> my_images.star
</code>
 


(Where the three stacks contain respectively 4, 3 and 2 images.) This would result in [[this STAR file]] that could be used directly as input into RELION. Note the rlnMicrographName label, and the repetition of the micrograph names on the datablock lines, which will lead to the inclusion of a unique rlnMicrographName for each micrograph. By doing so, distinct noise spectra will be estimated for each micrograph.
(Where the three stacks contain respectively 4, 3 and 2 images.) This would result in [[this STAR file]] that could be used directly as input into RELION. Note the rlnMicrographName label, and the repetition of the micrograph names on the datablock lines, which will lead to the inclusion of a unique rlnMicrographName for each micrograph. By doing so, distinct noise spectra will be estimated for each micrograph.


=== Generate STAR files from particles in single-file format ===
== Generate STAR files from particles in single-file format ==


If the input images are in single-file format in distinct directories for each micrograph, then the commands would be:
If the input images are in single-file format in distinct directories for each micrograph, then the commands would be:
<code>
 
  relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > my_images.star
  relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > my_images.star
  relion_star_datablock_singlefiles "mic1/*.spi" mic1 16000 15000 35 200 2 0.1  >> my_images.star
  relion_star_datablock_singlefiles "mic1/*.spi" mic1 16000 15000 35 200 2 0.1  >> my_images.star
  relion_star_datablock_singlefiles "mic2/*.spi" mic2 16000 15000 35 200 2 0.1  >> my_images.star
  relion_star_datablock_singlefiles "mic2/*.spi" mic2 16000 15000 35 200 2 0.1  >> my_images.star
  relion_star_datablock_singlefiles "mic3/*.spi" mic3 16000 15000 35 200 2 0.1  >> my_images.star
  relion_star_datablock_singlefiles "mic3/*.spi" mic3 16000 15000 35 200 2 0.1  >> my_images.star
</code>
 


And the result would be [[this equivalent STAR file]].
And the result would be [[this equivalent STAR file]].


=== Generate STAR files from XMIPP-style CTFDAT files ===
== Generate STAR files from XMIPP-style CTFDAT files ==


To generate a STAR file from an XMIPP-style <code>ctfdat</code> file, one could use:
To generate a STAR file from an XMIPP-style <code>ctfdat</code> file, one could use:
<code>
 
  relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
  relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
  relion_star_datablock_ctfdat all_images.ctfdat>>  all_images.star
  relion_star_datablock_ctfdat all_images.ctfdat>>  all_images.star
</code>


=== Generate STAR files from FREALIGN-style .par files ===
== Generate STAR files from FREALIGN-style .par files ==


To generate a STAR file from a FREALIGN-style .par file, one could use:
To generate a STAR file from a FREALIGN-style .par file, one could use:
<code>
 
  relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
  relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
  awk '{if ($1!="C") {print $1"@./my/abs/path/bigstack.mrcs", $8, $9, $10, $11, " 80 2.0 0.1"}  }' < frealign.par >> all_images.star
  awk '{if ($1!="C") {print $1"@./my/abs/path/bigstack.mrcs", $8, $9, $10, $11, " 80 2.0 0.1"}  }' < frealign.par >> all_images.star
</code>
 
Assuming the voltage is 80kV, the spherical aberration is 2.0 and the amplitude contrast is 0.1. Also, a single stack is assumed called: <code>/my/abs/path/bigstack.mrcs</code>.
Assuming the voltage is 80kV, the spherical aberration is 2.0 and the amplitude contrast is 0.1. Also, a single stack is assumed called: <code>/my/abs/path/bigstack.mrcs</code>.


== Reference images ==
= Preparing references =


2D class averaging is typically performed in an unsupervised manner, i.e. without user-provided references. 3D classification or refinement does require a (single) 3D reference structure. This map should be provided in MRC or SPIDER format, and it should have the same dimensions as the input images. Take care that the pixel size (in Angstroms) matches that of the experimental images, as currently an internal magnification correction is '''not''' implemented. Because the Gaussian model used to calculate probabilities is based on the squared differences between the experimental images and projections of the reference, the absolute intensity scale (or grey-scale) of the reference map is relevant. However, RELION may correct for the greyscale internally at relatively small computational costs.
2D class averaging is typically performed in an unsupervised manner, i.e. without user-provided references. 3D classification or refinement does require a (single) 3D reference structure. This map should be provided in MRC or SPIDER format, and it should have the same dimensions as the input images. Take care that the pixel size (in Angstroms) matches that of the experimental images, as currently an internal magnification correction is '''not''' implemented. Because the Gaussian model used to calculate probabilities is based on the squared differences between the experimental images and projections of the reference, the absolute intensity scale (or grey-scale) of the reference map is relevant. However, RELION may correct for the greyscale internally at relatively small computational costs (just set the <code>Ref. map is on absolute greyscale?</code> option in the GUI to <code>No</code>.)


To limit model bias it is generally recommended to strongly low-pass filter your initial reference model. The Optimisation tab in the GUI has an entry to set an initial low-pass filter.
To limit model bias '''it is generally recommended to strongly low-pass filter your initial reference'''. The Optimisation tab in the GUI has an entry to set an initial low-pass filter.


=== Important note for 3D classification ===
== Important note for 3D classification ==


''Although for 3D classification an external reference may work OK (as illustrated in the [[Classification_example|fully guided 3D classification example]]), '''one often gets better results by starting classification from a consensus model that was generated from the structurally heterogeneous data set itself'''. To that purpose, one may refine (for multiple iterations) the external reference (as a single class) against the entire data set. The resulting model may then be used to generate (low-pass filtered!) random seeds and classify the data using multiple classes.''  
''Although for 3D classification an external reference may work OK (as illustrated in the [[Classification_example|fully guided 3D classification example]]), '''one often gets better results by starting classification from a consensus model that was generated from the structurally heterogeneous data set itself'''. To that purpose, one may refine (for multiple iterations) the external reference (as a single class) against the entire data set. The resulting model may then be used to generate (low-pass filtered!) random seeds and classify the data using multiple classes.''  


Note that the option <code>Ref. map is on absolute greyscale?</code> in the GUI should then probably be set to <code>No</code> for the initial single-reference refinement, and then to <code>Yes</code> for the actual classification run.
Note that the option <code>Ref. map is on absolute greyscale?</code> in the GUI should then probably be set to <code>No</code> for the initial single-reference refinement, and then to <code>Yes</code> for the actual classification run.

Latest revision as of 13:24, 13 November 2017

Recommended image preprocessing

The procedures outlined below are suited if you have already done your preprocessing in another program package and you want to use the stack(s) of particles created by that package. If you are starting fresh from a set of raw frames, you may find the new (as of release-1.1) preprocess run-type inside the RELION GUI more convenient.

General comments on data quality

RELION will work best if your input images are:

  • Even-sized (Odd dimensions are not implemented)
  • Square i.e. xdim=ydim
  • Clean from false particles (note that RELION does NOT discard any images during refinement).
  • Unmasked (masking is performed internally)
  • Non-interpolated (prevent any prior rotations/translations: use the originally scanned pixel values)
    • If downscaling is necessary because of memory issues: use a window-operation in Fourier-space, not a convolution in real-space (e.g. with a rectangle/B-spline).
  • Uncorrected for CTF (this is done internally)
    • If your data have previously been phase-flipped, that's OK: just tell RELION about it
    • Actually, if you are not planning to correct for CTFs inside RELION (e.g. for negative stain data), phase-flipping is recommended.
    • If your data have previously been pre-Wiener filtered or pre-multiplied by their CTF, that's a bad thing to do: go back to the original data.
  • Normalised Make sure the average density in the background area is (approximately) zero!!. Also, the standard deviation in the noise should be (approximately) one.

Negative stain data

The use of CTF phase-flipped particles without further CTF correction inside RELION is the recommended procedure for negative stain data:

  1. Get a MRC/SPIDER/IMAGIC stack of your clean, unfiltered, unmasked and phase-flipped particles. Often direct extraction from phase-flipped micrographs is the easiest and best option. Alternatively, extract the particles from the raw micrographs and perform phase-flipping on the extracted particles. Use your favourite package to do this, e.g. EMAN(2), XMIPP, IMAGIC, SPIDER, etc.
  2. Input this stack into the relion_preprocess command-line program (available from version 1.1 onwards) to re-scale and re-window your original particles to even-size (if necessary) and to normalize them. The output MRC stack (or STAR file) can be directly used as input for RELION refinement.

cryo-EM data

For cryo-EM data, full CTF-correction (including amplitudes and phases) inside RELION is highly recommended. Therefore, the recommended procedure involves telling RELION about the CTF parameters of each particle:

  1. Get one or multiple MRC/SPIDER/IMAGIC stack(s) of your clean, unfiltered, unmasked and non-CTF-corrected (only phase-flipping is also OK). Often direct extraction from the raw micrographs is the easiest and best option.
  2. Create a STAR file that contains the necessary CTF parameters (rlnDefocusU, rlnDefocusV, rlnDefocusAngle, rlnVoltage, rlnSphericalAberration, rlnAmplitudeContrast and rlnMicrographName) for all particles. The next section explains how to do this.
  3. Input this STAR file into the relion_preprocess command-line program (available from version 1.1 onwards) to re-scale and re-window your original particles to even-size (if necessary) and to normalize them. The output STAR file can be directly used as input for RELION refinement.

Dividing your data into groups

For refinement inside RELION you may divide your data into groups, so that a different noise spectrum and signal scale factor is estimated for each group independently. This is typically not necessary for refinements with negative stain data, but highly recommended when using cryo-EM data. Usually, each micrograph comprises one group. Therefore, the division into groups is based on the rlnMicrographName column in the input STAR file. However, to get robust noise and signal estimates, make sure each group contains at least ~10-20 particles. If you have very few particles per micrograph, then you may want to combine multiple micrographs into one group (i.e. use the same rlnMicrographName for particles coming from multiple micrographs). If you do so, make sure you join micrographs with similar apparent signal-to-noise ratios. Often this means with similar defocus values, but do note that each particle may still have its own defocus values (CTF corrections are done per-particle, not per group).

Creating input STAR files

The STAR file format is explained on the Conventions page. STAR files are easily readable plain text files, for which shell utilities like awk are very convenient. However, because not all users will be equally proficient in shell scripting, RELION comprises several shell script implementations to provide some basic operations with STAR files. See the STAR file utilities page for a description of these utilities, of which relion_star_loopheader, relion_star_datablock_stack and relion_star_datablock_singlefiles are used below.

Generate STAR files from separate stacks for each micrograph

If the input images are in a separate stack for each micrograph, then one could use the following commands to generate the input STAR file:


relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > my_images.star
relion_star_datablock_stack 4 mic1.mrcs mic1.mrcs 10000 10500 30 200 2 0.1  >> my_images.star
relion_star_datablock_stack 3 mic2.mrcs mic2.mrcs 21000 20500 25 200 2 0.1  >> my_images.star
relion_star_datablock_stack 2 mic3.mrcs mic3.mrcs 16000 15000 35 200 2 0.1  >> my_images.star


(Where the three stacks contain respectively 4, 3 and 2 images.) This would result in this STAR file that could be used directly as input into RELION. Note the rlnMicrographName label, and the repetition of the micrograph names on the datablock lines, which will lead to the inclusion of a unique rlnMicrographName for each micrograph. By doing so, distinct noise spectra will be estimated for each micrograph.

Generate STAR files from particles in single-file format

If the input images are in single-file format in distinct directories for each micrograph, then the commands would be:

relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > my_images.star
relion_star_datablock_singlefiles "mic1/*.spi" mic1 16000 15000 35 200 2 0.1  >> my_images.star
relion_star_datablock_singlefiles "mic2/*.spi" mic2 16000 15000 35 200 2 0.1  >> my_images.star
relion_star_datablock_singlefiles "mic3/*.spi" mic3 16000 15000 35 200 2 0.1  >> my_images.star


And the result would be this equivalent STAR file.

Generate STAR files from XMIPP-style CTFDAT files

To generate a STAR file from an XMIPP-style ctfdat file, one could use:

relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
relion_star_datablock_ctfdat all_images.ctfdat>>  all_images.star

Generate STAR files from FREALIGN-style .par files

To generate a STAR file from a FREALIGN-style .par file, one could use:

relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnDefocusV rlnDefocusAngle rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
awk '{if ($1!="C") {print $1"@./my/abs/path/bigstack.mrcs", $8, $9, $10, $11, " 80 2.0 0.1"}  }' < frealign.par >> all_images.star

Assuming the voltage is 80kV, the spherical aberration is 2.0 and the amplitude contrast is 0.1. Also, a single stack is assumed called: /my/abs/path/bigstack.mrcs.

Preparing references

2D class averaging is typically performed in an unsupervised manner, i.e. without user-provided references. 3D classification or refinement does require a (single) 3D reference structure. This map should be provided in MRC or SPIDER format, and it should have the same dimensions as the input images. Take care that the pixel size (in Angstroms) matches that of the experimental images, as currently an internal magnification correction is not implemented. Because the Gaussian model used to calculate probabilities is based on the squared differences between the experimental images and projections of the reference, the absolute intensity scale (or grey-scale) of the reference map is relevant. However, RELION may correct for the greyscale internally at relatively small computational costs (just set the Ref. map is on absolute greyscale? option in the GUI to No.)

To limit model bias it is generally recommended to strongly low-pass filter your initial reference. The Optimisation tab in the GUI has an entry to set an initial low-pass filter.

Important note for 3D classification

Although for 3D classification an external reference may work OK (as illustrated in the fully guided 3D classification example), one often gets better results by starting classification from a consensus model that was generated from the structurally heterogeneous data set itself. To that purpose, one may refine (for multiple iterations) the external reference (as a single class) against the entire data set. The resulting model may then be used to generate (low-pass filtered!) random seeds and classify the data using multiple classes.

Note that the option Ref. map is on absolute greyscale? in the GUI should then probably be set to No for the initial single-reference refinement, and then to Yes for the actual classification run.