Refine a structure to high-resolution

From Relion
Jump to navigation Jump to search

Filling in the GUI

For 3D refinements, select the run-type of 3D auto-refine from the drop-down menu at the top of the GUI. (This is a new feature of version 1.1.) This procedure implements so-called gold-standard FSC calculations, where two models are refined independently for two random halves of the data to prevent overfitting. Thereby, reliable resolution estimates and clean reconstructions are obtained without compromising reconstruction quality, see (Scheres & Chen, Nature Methods, in press) for more details. Note that for cyclic point group symmetries (i.e. C<n>), the two half-reconstructions are averaged up to 40 Angstrom resolution to prevent diverging orientations.

I/O tab

  • If the reference was not reconstructed from the input images in either XMIPP or RELION, you may assume it is not on the absolute greyscale.
  • Provide the correct symmetry point group. Note there are various settings for icosahedral symmetry, also see the Conventions. Make sure your input map is in the one you provide here.

CTF tab

  • The pixel size (in Angstrom) should be the same as the one used to estimate the CTF parameters (unless you rescaled the images afterwards, in which case the same scale factor should be applied to the pixel size).
  • If no CTF correction is to be performed, make sure you phase-flipped your data during preprocessing. See the Prepare input files page.
  • If the particles have been phase flipped, tell the program about this.
  • Some data sets have very-low resolution features that are not accounted for in the linear CTF model (with ~10% amplitude contrast). This will sometimes lead to too strong low-resolution features in the reconstructed maps. Separation based on these very low-resolution features may then hamper separation of distinct structural states. Therefore, it may be useful to ignore the CTFs (i.e. set them to one) until their first maximum. In several cases, this has led to successful classification of structurally heterogeneous data that could not be classified using the full CTF-correction. If desired, full CTF-correction can then be applied subsequently during separate refinements of the distinct classes.

Optimisation tab

  • To prevent model bias it is recommended to start refinement from a very strongly low-pass filtered map. If your input map is not low-pass filtered, it may be filtered internally using the Initial low-pass filter option. Typically, one filters as much as possible, i.e. before the reference becomes a feature-less blob that can no longer be refined. For example, we use 80 Angstroms for ribosomes and 60 Angstroms for GroEL.
  • The particle diameter (in Angstroms) serves to define a soft spherical mask that will be applied to the references to reduce their background noise. Note that a (preferably soft) user-provided mask (1=protein, 0=solvent) may also be used for highly non-spherical particles. Be careful though not to mask away any unexpected signal and always use a soft mask, i.e. one with values between 0 and 1 at the protein/solvent boundary.

Sampling tab

  • The initial angular and translational sampling rates given here will be automatically increased to their optimal values by the auto-refine procedure. We tend to use 7.5 degrees angular sampling for non-icosahedral cases and 3.7 degrees for icosahedral viruses. Most of the times using 6 pixels for the initial translational searches is enough, although this ultimately depends somewhat on how well-centered the particles were picked. However, note that pre-centering prior to RELION refinement is not necessary, and also not recommended (it often messes up the Gaussian distribution of origin offsets).

Compute tab

As of RELION-2.0, there are more computation-options accessible from the GUI. These are:

  • Combine iterations through disc? This option was implemented when some network cards on our cluster were buggy and large messages often failed. Large files are written to disc and that way, the MPI nodes speak to each other. If you have reasonably fast and reliable network connections, it may be better to set this option to "No", as it will be quite slow (although that depends on the speed of your disc access).
  • Use parallel disc I/O? If set to Yes, all MPI slaves will read their own images from disc. Otherwise, only the master will read images and send them through the network to the slaves. Parallel file systems like gluster of fhgfs are good at parallel disc I/O. NFS may break with many slaves reading in parallel.
  • Number of pooled particles: Particles are processed in individual batches by MPI slaves. During each batch, a stack of particle images is only opened and closed once to improve disk access times. All particle images of a single batch are read into memory together. The size of these batches is at least one particle per thread used. The nr_pooled_particles parameter controls how many particles are read together for each thread. If it is set to 3 and one uses 8 threads, batches of 3x8=24 particles will be read together. This may improve performance on systems where disk access, and particularly metadata handling of disk access, is a problem. It has a modest cost of increased RAM usage.
  • Pre-read all particles into RAM? If set to Yes, all particle images will be read into computer memory, which will greatly speed up calculations on systems with slow disk access. However, one should of course be careful with the amount of RAM available. Because particles are read in double-precision, it will take ( N * box_size * box_size * 8 / (1024 * 1024 * 1024) ) Giga-bytes to read N particles into RAM. For 100 thousand 200x200 images, that becomes 30Gb, or 120 Gb for the same number of 400x400 particles. Remember that running a single MPI slave on each node that runs as many threads as available cores will have access to all available RAM.
  • Use GPU acceleration? If set to yes, the program will run CUDA-code to accelerate computations on NVIDIA grpahics cards. Note that cards supporting CUDA compute 3.5+ are supported. It is typically recommended to run 1 MPI process on each card. The option 'Which GPUs to use' can be used to specify which MPI process is run on which card, e.g. "0:1:2:3" means that the slaves 1-4 will run on cards 0-3. Note that the master (mpi-rank 0) will not use any GPU: it will merely orchestrate the calculations. You can run multiple threads within each MPI process to further accelerate the calculations. We often use for example 4 threads.

Running tab

  • If one uses multi-core nodes, the use of myltiple threads (as many threads as cores on a machine) is recommended because the shared-memory parallelisation increases the amount of memory available per process. MPI is typically used for more scalable parallelisation over the different nodes. (In terms of CPU usage, MPI parallelisation is a bit more efficient than threads.)

Analyzing results

The program will write a out_it???_half?_model.star and a out_it???_half?_class001.mrc file for each of the two independent data set halves at every iteration. Only upon convergence the program will write one out_model.star and out_class001.mrc file with the results from the joined halves of the data. It are these final files you will be most interested in! Note that the joined map may no longer be used for refinement to prevent overfitting.

Also remember that your map will still need sharpening! Have a look at the [Analyse results] section for more details.