Helical processing: Difference between revisions

From Relion
Jump to navigation Jump to search
Line 110: Line 110:
= Template-based semi-autopicking =
= Template-based semi-autopicking =


Semi-automated segment picking saves lots of time if huge number of micrographs are to be processed. Based on our limited tests, for short or flexible filaments, it achieves at least as good results as manual picking as long as high quality and diverse 2D classes (3-10 with different appearances) are used as templates along with carefully tuned parameters (especially the picking threshold).
Finer angular samplings (~3 degrees) slightly improve the results of helical segment picking. GPU acceleration is supported but it is not recommended to shrink the micropraphs or references for better performance because it may cause inaccuracies in finding tracks along helical filaments. If shrinking cannot be avoided, please use values larger than 0.5 or the algorithm will miss a considerable number of filaments. Minimum inter-particle distance is overwritten by the inter-box distance, which is equal to the product of the number of asymmetrical units and the helical rise. Maximum curvature (kappa) deals with flexibility of helices. Kappa = 0.3 means that the curvature of the picked filaments should not be larger than 30% the curvature of the circular mask. We recommend a value of ~0.05 for long and straight filaments (e.g. TMV and VipA/VipB filaments) and 0.2~0.4 for more flexible ones (e.g. MAVS-CARD and ParM filaments). You may also want to set the minimum length to exclude short filaments from the results, although we find the option not helpful since the data sets sometimes get considerably smaller.
We tune the picking threshold for a new data set in the following way. Firstly we manually pick ~10 representative micrographs. Micrographs in this set are different in terms of defoci, thickness of ice layers, contamination features and density of filaments. Then we write out FOM maps with the default picking threshold. Strings of bright dots, where the templates correlate strongly with the filaments, can be observed on "*_combinedCCF.spi" maps. The FOM values of bright dots imply possible ranges for picking thresholds. "*_combinedPLOT.spi" maps serve a as guidance for tuning thresholds. The maps show filtered dots with FOM values above the threshold and helical tracks found by the auto-picking algorithm. An optimal picking threshold generates tiny bright dots all the way along each of the filaments, and the fitted tracks should be long and continuous along the filaments and not interfere with each other. If *_combinedPLOT.spi* maps are empty, the reasons could be that the threshold is so high that excludes all FOM values, or so low that the program stops finding helical tracks among an ocean of dots. The option maximum standard deviation of noise can be set to -1 since track finding is less affected by big contamination features compared to auto-picking of single particles.
Auto-picking needs to be followed by 2D classification where bad classes and segments are discarded before the reconstruction.


= Particle extraction =
= Particle extraction =

Revision as of 00:32, 7 June 2016

Our results

A paper about helical processing in Relion 2.0 is still in preparation.

File:Http://tanzaniasafariadvisor.com/wp-content/uploads/2015/09/translate-110777 640.jpg

Getting organised - the sizes of boxes and masks

One significant difference between helices and single particles is that the former ones span arbitrarily long along their helical axes.

The sizes of particle boxes and masks should be given careful consideration before the start of a project. In RELION, particle boxes are always 2D squares or 3D cubes. Very large box sizes should be avoided in 3D jobs since they may cause memory problems.

You may want to measure the width of filaments before thinking about the box size for 3D reconstruction. Usually the width can be observed from raw micrographs, or you can perform 2D classification with a large tube diameter (but still smaller than the size of the circular mask) and measure the widths afterwards from classes with clear contours. For solving high resolution 3D structures, the tube diameters in auto-picking, particle extraction, 2D classification and the outer diameters in 3D classification and refinement should be slightly larger than the actual width of filaments since the tube diameters are used for normalisation and/or masking (especially in 3D reconstructions, otherwise signals at the periphery of the helix will be masked out).

Box sizes of >= 1.5 times the tube diameter is commonly adopted for 3D helical reconstruction while bigger boxes are sometimes better for generating 2D classes for template-based auto-picking or Fourier-Bessel analysis. If the box sizes are too small compared to the actual width of filaments, segments will fill the entire boxes and lead to inaccurate alignments or incomplete reconstructed maps. However, you may still want to use box sizes >= 200 pixels even if the helices have widths of, for example, only ~50 pixels, to include more subunits into the reconstructed map and get more accurate alignments.

The circular/spherical (background) diameter must be smaller than the box size and larger than the tube diameter. In addition to normalisation and masking out noises, these soft-edged circular/spherical masks also have the effect of reducing Fourier artefacts caused by the presence of helical structures near the box edges in image transformations. We usually set it to <=90% of the box size. The default value for single particle analysis is 75% but we think it might be a waste of space for some helical data sets.

The inner tube diameter in 3D reconstruction masks out the center of the helix and may improve the resolution for some cases during refinement. The value should be strictly smaller than the (outer) tube diameter and the inner mask is not applied if the value is set to negative. It can be set to positive only if you are certain that the structure is hollow in the center, or has something which does not conform to the helical symmetry (e.g. DNA on which the subunits are bound to). We suggest performing 3D refinement without the inner mask if the structure is not well-known.

In 3D reconstruction, the box size, outer diameter and circular/spherical (background) diameter together decide the maximum value of central Z length, which is sqrt(s^2 – d^2)/box. Central part of the map is used for searching and applying helical symmetry in real space. Its length along Z axis is usually set to default 30% of the box size. This length must also be larger than 2 times the upper limit of helical rise. Our preliminary experiments show that this value seems redundant for helices with invariant symmetry in terms of the final resolution of reconstruction. However, you might want to try other values for flexible helices with varying helical parameters or greater values for large helical rises to see whether the final reconstructions get improved.

Note that in the GUI, some diameters mentioned above are in pixels while others in Angstroms.

Import coordinates

Ignore this part and refer to "manual picking" or "template-based semi-autopicking" if you decide to pick the helical tube/segment coordinates manually or automatically using Relion GUI. Picking coordinates is a personal experience. Relion supports helical tube/segment coordinates in EMAN2 (*.box) and XIMDISP (*.coords) formats as well. Relion prefers tube to segment coordinates because the former ones give users freedom to set the inter-box distance in particle extraction. Since the inter-box distance needs to be multiple of the helical rise in 3D reconstruction, please provide tube coordinates whenever possible, especially if the helical symmetry is not precisely clear before the start of the project.

Tube coordinate files in EMAN2 (*.box) format should contain exactly the following content (ignore comments after // symbol):

1463    3307    260     260     -1        // Tube 1 starting coordinates: x, y, box width, box width, -1 
851     2211    260     260     -2        // Tube 1 end coordinates: x, y, box width, box width, -2 
407     2039    260     260     -1        // Tube 2 
-45     1482    260     260     -2 
... 

EMAN2 box widths are ignored and overwritten by the box size specified in particle extraction step.

Tube coordinate files in XIMDISP (*.coords) format should contain exactly the following content (ignore comments after // symbol):

Box     1 
         750         670    // Top-left coordinates of rubberband box 1
        1245         375    // Top-right coordinates of rubberband box 1
        2980        3275    // Bottom-left coordinates of rubberband box 1
        2485        3570    // Bottom-right coordinates of rubberband box 1
         750         670    // Top-left coordinates of rubberband box 1
Box     2 
        2500        3575 
        1925        3560 
        1990         515 
        2565         530 
        2500        3575 
... 

Tube coordinate files in Relion (*.star) format should at least contain the x, y coordinates for each filament (ignore comments after // symbol):

data_ 

loop_ 
_rlnCoordinateX #1 
_rlnCoordinateY #2 
  110.000000  1080.000000   // Tube 1 starting coordinates x, y 
 1855.000000   585.000000   // Tube 1 end coordinates x, y 
  635.000000  1325.000000   // Tube 2 
  560.000000  2490.000000 
... 

Segment coordinate files in XIMDISP (*.coords) format should contain exactly the following content (ignore comments after // symbol):

  x    y    psi           // One-line header 
1043    3380    7.125     // Segment 1: x, y coordinates, in-plane rotation angle (in degrees) 
1019    3383    7.125     // Segment 2
995     3386    7.125 
970     3389    7.125 
946     3392    7.125 
...

In-plane rotation angles are transformed according to Relion convention in particle extraction.

Segment coordinate files in RELION (*.star) format should at least contain the following prior information for each segment:

data_ 

loop_ 
_rlnCoordinateX #1 
_rlnCoordinateY #2 
_rlnHelicalTubeID #3 
_rlnAngleTiltPrior #4 
_rlnAnglePsiPrior #5 
_rlnHelicalTrackLength #6 
_rlnAnglePsiFlipRatio #7 
 1822.915020   227.604136            1    90.000000   -58.642915    43.599998     0.500000 
 1845.603159   264.835953            1    90.000000   -58.642915    87.199997     0.500000 
 1868.291298   302.067770            1    90.000000   -58.642915   130.799995     0.500000 
 1890.979436   339.299588            1    90.000000   -58.642915   174.399994     0.500000 
 1913.667575   376.531405            1    90.000000   -58.642915   217.999992     0.500000 
...

The ways to import movies/micrographs and coordinate files are explained in the Relion 2.0 tutorial. Please note that no errors are raised if you have accidentally made mistakes in handling imports until particle extraction. Please make sure that the file contents and the filenames with wildcards (*, ?) are provided correctly. In addition, Relion requires that the movie/micrograph file exists if the coordinate file with the same rootname is imported.

Manual picking

Manual picking is applicable if there are not too many micrographs and most of the helical tubes are long and straight. It is also used when semi-automated picking needs some templates generated from manually picked segments to start with. To pick helical tubes manually, run a manual picking job with ordinary settings and left-click starting and end points of filaments repeatedly on every micrograph. Helical segments will then be extracted along straight lines between the selected pairs of points in particle extraction step so you should only choose the rigid parts of the filaments.

Please make effort to pick the coordinates as accurate as possible on the rescaled micrographs displayed, i.e picking points along the helical axes with great caution, especially if you plan to use these filaments for the final 3D refinement. For better picking, we suggest that the diameter of picking circles should be slightly larger than the helical widths and the picking circles be filled by the projections. Intersections of filaments must be avoided as the overlapping segments greatly affect the alignment in reconstructions. Moreover, a filament with intrinsic discontinuities features should be picked as multiple ones in manual picking (for example, disks, which break the helical symmetry, are clearly seen on long TMV tubes). This type of filaments cannot be properly handled with template-based auto-picking.

Template-based semi-autopicking

Semi-automated segment picking saves lots of time if huge number of micrographs are to be processed. Based on our limited tests, for short or flexible filaments, it achieves at least as good results as manual picking as long as high quality and diverse 2D classes (3-10 with different appearances) are used as templates along with carefully tuned parameters (especially the picking threshold).

Finer angular samplings (~3 degrees) slightly improve the results of helical segment picking. GPU acceleration is supported but it is not recommended to shrink the micropraphs or references for better performance because it may cause inaccuracies in finding tracks along helical filaments. If shrinking cannot be avoided, please use values larger than 0.5 or the algorithm will miss a considerable number of filaments. Minimum inter-particle distance is overwritten by the inter-box distance, which is equal to the product of the number of asymmetrical units and the helical rise. Maximum curvature (kappa) deals with flexibility of helices. Kappa = 0.3 means that the curvature of the picked filaments should not be larger than 30% the curvature of the circular mask. We recommend a value of ~0.05 for long and straight filaments (e.g. TMV and VipA/VipB filaments) and 0.2~0.4 for more flexible ones (e.g. MAVS-CARD and ParM filaments). You may also want to set the minimum length to exclude short filaments from the results, although we find the option not helpful since the data sets sometimes get considerably smaller. We tune the picking threshold for a new data set in the following way. Firstly we manually pick ~10 representative micrographs. Micrographs in this set are different in terms of defoci, thickness of ice layers, contamination features and density of filaments. Then we write out FOM maps with the default picking threshold. Strings of bright dots, where the templates correlate strongly with the filaments, can be observed on "*_combinedCCF.spi" maps. The FOM values of bright dots imply possible ranges for picking thresholds. "*_combinedPLOT.spi" maps serve a as guidance for tuning thresholds. The maps show filtered dots with FOM values above the threshold and helical tracks found by the auto-picking algorithm. An optimal picking threshold generates tiny bright dots all the way along each of the filaments, and the fitted tracks should be long and continuous along the filaments and not interfere with each other. If *_combinedPLOT.spi* maps are empty, the reasons could be that the threshold is so high that excludes all FOM values, or so low that the program stops finding helical tracks among an ocean of dots. The option maximum standard deviation of noise can be set to -1 since track finding is less affected by big contamination features compared to auto-picking of single particles. Auto-picking needs to be followed by 2D classification where bad classes and segments are discarded before the reconstruction.

Particle extraction

Extraction of helical segments not only writes out particle stacks but also appends prior information (with labels _rlnHelicalTubeID, _rlnAngleTiltPrior, _rlnAnglePsiPrior, _rlnHelicalTrackLength and _rlnAnglePsiFlipRatio) to each segment.

Choices of box sizes for different purposes have been discussed above.

Bimodal angular priors (0.5 psi flip ratios) are always needed unless the helices have D2 symmetry, in which case the segments lack polarities.

Coordinates are start-end only for tube coordinates so set it to No for segment coordinates (auto-picked segments, etc). With tube coordinates, helical filaments are cut into overlapping segments with an inter-box distance of the number of asymmetrical units times the helical rise. For segment coordinates, the distance is predetermined and therefore cannot be changed here. No matter which coordinates are used, the distance must be multiple of the helical rise if the segments are extracted for 3D reconstruction.

More segments are written out with smaller inter-box distances and they make computation more expensive. Tiny intervals (<= 5 pixels) should be avoided since segment coordinates are rounded to integer values in the extractions and the relative inaccuracies of positions could be larger. On the other hand, the maximum inter-box distance depends on the Z length parameter in 3D reconstruction. If Z length is set to 30% (as default), the inter-box distance should not exceed (100%-30%)/2=35%. As a convention, please use ~10% the box size as the inter-box distance. Although based on our experience, a distance as large as 30% the box size still gives the best reconstruction if the helices are rigid with fixed symmetry. If the helical parameters are completely unknown, set the inter-box distance to 5-10% of the box size just for 2D classification. In addition, if you don't want overlapping segments, set the option "cut helical tubes into segments" to No and ignore the helical parameters. Then only one segment with prior information is extracted for each picked filament. This could be useful if you have manually picked start-end coordinates of short segments which have lengths of about one helical repeat in order to classify different types of filaments in the data set.

2D classification

Notes on 3D references

3D refinement

Mask creation

The top and bottom parts of the reconstructed helical map suffer from inaccuracies of orientations. A mask which only covers the central part might improve the overall resolution. After finding out the best values for parameters on "Mask" tab, we usually set the central Z length to 30% and increase it gradually at steps of 5% to find the largest value (<=80%, because further parts are masked out anyway) which still gives the best estimated resolution. Also it seems that wider soft edges (5~8 pixels) are useful for helices.

Particle polishing

Enable helical reconstruction and copy the number of asymmetrical units from the previous 3D refinement job. Copy the helical symmetry as well if local searches of helical symmetry have not been performed. Otherwise look into the file Refine3D/jobXXX/run_model.star for the averaged helical symmetry at the end of the refinement:

... 
data_model_classes 

loop_ 
_rlnReferenceImage #1 
_rlnClassDistribution #2 
_rlnAccuracyRotations #3 
_rlnAccuracyTranslations #4 
_rlnHelicalRise #5 
_rlnHelicalTwist #6 
Refine3D/jobXXX/run_class001.mrc     1.000000     0.186000     0.250000    21.775977    29.410791 
...

Usage of relion_helix_toolbox

Relion_helix_toolbox is a standalone executable in Relion 2.0 and features many useful tools which we often use when processing helical data. The command relion_helix_toolbox (without any options) displays the full list of functions and parameters available in the error message. Type the command relion_helix_toolbox [function] --help for the usage of a function. We have implemented tools such as reference building (--cylinder, --pdb_helix, --simulate_segments), masking (--spherical_mask), imposition and local searches of helical twist and rise (--impose, --search), bad segment removal (--remove_bad_tilt, --remove_bad_psi), splitting and merging of STAR files (--divide, --merge), etc.

References

We gain experience for helical processing in Relion using the movies and micrographs from EMDB's EMPIAR database (accession number EMPIAR-10019, 10020, 10021, 10022, 10031). The related research papers are listed below:

[1] Kudryashev, Mikhail, et al. "Structure of the type VI secretion system contractile sheath." Cell 160.5 (2015): 952-962. PubMed

[2] Fromm, Simon A., et al. "Seeing tobacco mosaic virus through direct electron detectors." Journal of structural biology 189.2 (2015): 87-97. PubMed

[3] Xu, Hui, et al. "Correction: Structural basis for the prion-like MAVS filaments in antiviral innate immunity." Elife 4 (2015): e07546. PubMed