FAQs

From Relion
Jump to navigation Jump to search

Getting feedback

If you have a question, please check the FAQs and whether the topic has been addressed already on the CCP-EM mailing list. If not, use the CCP-EM list to send us your message. Please, do not send us direct e-mails with general questions about RELION.

FAQs

General

Could I get code that is not yet in beta-testing?

No. Only when the code is deemed to be stable enough, then we move from alpha-testing (strictly in-house only) to beta-testing (also external). If you would like to participate in beta-testing, just send us an e-mail as soon as the availability of the beta-release is announced on this Wiki.

How do you use RELION?

Read our Recommended procedures!

How can I ask a question that is not dealt with in the FAQs?

If you have a question, please first check whether the topic has been addressed already on the CCP-EM mailing list. If not, please use this list (and not a direct e-mail) to send us your message.

Preprocessing

How can I group micrographs in order to have more particles per group?

You can add a column called rlnGroupName to your particles.star file. Unique rlnGroupName values define unique groups. To make this easier, there is a semi-automated grouping procedure. Note that from version-1.3, RELION implements a new image display program that can also be used to re-group particles in a more convenient manner. Select a model.star file from a previous 2D or 3D classification/refinement run, tick the"Regroup selected particles in number of groups: X" option, and provide the desired number of groups (X).

What are groups for anyway?

If you're going to group micrographs together it is important to understand what groups are used for inside RELION. Although you may be familiar with the concept of defocus groups in other packages, RELION groups are NOT the same. Each particle has its own (possibly astigmatic) CTF model, and this is not affected by groups. Instead, for all particles inside each group RELION will estimate an average power spectrum for the noise (rlnSigma2Noise), as well as an average intensity scale factor. RELION will warn against using small numbers of particles inside a group, because these averages may become unstable. That in turn may lead to crashed runs that report errors like the sum of weights for certain particles being zero, or the scale factor being a strange number. As mentioned above, you may prevent small groups by grouping micrographs together, for example by using a semi-automated grouping procedure.

Do I need to downsize my images?

RELION will estimate the resolution of your model and downsize the images internally. Therefore, using downsized images will NOT be (much) faster than using large images. In general it is therefore NOT recommended to use downsized images: it could limit your resolution. The only reason why one would want to downsize images if they are too big (e.g. 800x800 pixels) for the reconstruction(s) to fit into memory. If this is the case, you can use the Preprocessing procedure to downsize your images.

Can I also use the estimated CTF parameters from XMIPP3?

In principle yes. The conventions are the same as those in RELION. However, there is currently no script to do this automatically. In our experience, (re-)estimating CTF parameters in CTFFIND3 is sufficiently fast and robust.

How do you average your direct-electron detector movies?

As of release 13, RELION contains a program called relion_image_handler. Besides many other simple image operations (like lowpass filtering, B-factor sharpening, MTF correction, image/map addition or subtraction, etc it can also average movie frames from movies.

Classification

Do you have an example of how to run 3D classification?

Yes, see the Classification example.

Should I refine my 3D classes in a different program to reach higher resolution?

This is not necessary: you may do so in RELION. Higher resolution often means using fine orientational and translational samplings, which may be prohibitive (speed and memory-wise) inside classification runs. One can however write out a new STAR file that contains only the images belonging to a certain class (see the next FAQ). That STAR file could then be used for a single-reference refinement with fine samplings in RELION.

How can I select images from a STAR file?

The following awk command may be used to select images belonging to classes number 3 or 4 (assuming rlnClassNumber is the 13th column):

awk '{if (NF<= 2) {print} else {if ($13==3 || $13==4) print }}' < classify_it000025_data.star > class3_only_input.star

Some knowledge of awk is REALLY useful! This light awk introduction may be helpful. The command above prints all lines if the number of fields (NF) is 1 (i.e. for the header), and else (i.e. for the data block) it will only print those lines if the 13th column ($13) equals 3 or 4. Note one may change the "==" sign for ">", "<=", etc. That way, one could select images above a certain MaxValueProbDistribution, within a certain AnglePsi range, etc.

3D refinement

How can I make a plot of the orientational distribution of my particles?

RELION outputs all the information in the STAR files for each iteration, but at the moment will not make a plot of this. You can use the make_orientational_distribution_plot.csh script to make a BILD file (using XMIPP-2.4), which can then be read into UCSF Chimera. Note that as of release 1.3, RELION will output .bild files that can be loaded into UCSF chimera directly.

I have run RELION but get a non-sense map as a result, e.g. a spherical blob

Make sure the following things are correct:

  • You have normalized all particles to zero-mean background with a standard deviation in the noise of one.
  • Your STAR file header is correct: each label corresponds to the correct column
  • You have used the correct pixel size
  • There are no particles with very large or very small pixel values.
  • You have indicated the starting map is not on the correct greyscale (if the map does not come from RELION itself or XMIPP)

(The first three items on this list are all taken care of if you use the PreProcessing GUI

Upon restarting I get an error: incorrect table model_group_xx

Make sure all groups have at least 20-50 particles in them. The error above is likely to an empty group for one of the two independent halves of the data. Note you can join multiple micrographs into one group by giving them the same rlnMicrographName in the input STAR file. If you do this, try to join micrographs with similar defocus values and similar apparent signal-to-noise ratios.

The resolution of the RELION output map is lower than I expected

There are two answers to this. Firstly, if your expectations are based on refinement in a different program, and that program does not strictly prevent overfitting, then it might be that your expectations are wrong: perhaps the other program overfitted your data and therefore has given you a false high-resolution estimate. Secondly, we have now observed that RELION slightly underestimates resolution. However, you may still get your high-resolution map as explained on the Analyse results page. Please write to us if you genuinely believe RELION has done a bad job at refining your structure. Perhaps we may learn how to improve RELION from your case.

How do you create soft masks for refinement?

As of release-1.3, there is the relion_mask_create program. Provide it any input density map (e.g. often a map created from a PDB model is useful for making masks); a threshold to use for initial binarisation of that map; the number of pixels to grow the initial binarised map; and the width of a raised-cosine edge (in pixels) that will be added to create a soft mask. The program also has some nifty options to perform logical operations on pairs of masks, which are useful when creating more complicated masks.


Computational issues

I am buying new GPUs, what do you recommend to run RELION on?

Our collaborator in Stockholm, Erik Lindahl, has made a useful blog with GPU hardware recommendations. Briefly, you'll need an NVIDIA GPU with a CUDA compute ability of at least 3.5, but you don't need the expensive double-precision NVIDIA cards, i.e. the high-end gamer cards will also do, but do see Erik's blog for details! Note that 3D auto-refine will benefit from 2 GPUs, while 2D and 3D classification can be run just as well with 1 GPU. Apart from your GPUs you'll need a decent amount of RAM on the CPU (at least 64Gb), and you may also benefit from a fast (e.g. a 400Gb SSD!) scratch disk, especially of your working directories will be mounted over the network connecting multiple machines.

How do I use my GPUs?

There is a good description of how to use your GPUs in our betaGuide.

I am buying a new cluster, what do you recommend to run RELION on?

This will of course depend on how much money you are willing to spend, and what kind of jobs you are planning to run. RELION is memory-intensive. Fortunately, it's hybrid-parallelisation allows to make use of modern clusters that consist of many multi-core nodes. In this set-up, MPI-parallelisation provides scalability across the many nodes, while pthreads allow to share the memory available on each of the nodes without leaving its multiple cores idle. Therefore, as long as each node has in total sufficient memory, one can always run multiple threads (and only one or a few MPI job) on each node. Therefore, RAM/node is probably a more important feature than RAM/core. The bigger the size of the boxed particles, the higher the RAM usage. For our high-resolution ribosome refinements (in boxes of ~400x400 pixels) we use somewhere between 15-25Gb of RAM per MPI process (the most expensive part in terms of RAM is the last iteration, which is done at the full image scale). We have 12-cores with 60Gb of RAM in total, so can run 2 MPI processes on each node. If you're planning to do atomic-resolution structures I wouldn't recommend buying anything that has less than 32Gb per node. Having 64Gb or more will probably keep your cluster up-to-date for longer. Then how many of those nodes you buy will probably depend on your budget (and possibly cooling limitations). We do 3.x Angstrom ribosome reconstructions from say 100-200 thousand particles in approximately two weeks using around 200-300 cores in parallel. Using more cores in parallel (e.g. 1,000) may cause serious scalability issues.

My runs keep crashing at seemingly random points in the refinement

We noticed similar problems on our Dell cluster. Things became much more stable when we switched off TCP segmentation offloading, by using the command mentioned on this linux page.

Please note that due to limited resources we cannot provide support related to high-performance computing issues... If your jobs seem to die at seemingly random points, please do not email us, but speak to your system administrator... RELION makes use of some pretty intensive high-performance computing, and setting this up satisfactorily is not always straightforward.

Although I ask for more threads, my MPI processes only take 100% or 200% CPU in top

If using OpenMPI compiled with NUMA locking support, MPI processes get bound to assigned cores automatically. This is to prevent context switches and cache misses. You can use `mpirun —bind none` to have each MPI process use multiple cores.

How can I minimise computational requirements?

See this page for an explanation about the computational requirements. Understanding these may help you make more informed decisions on how to minimise running costs. More information is also given in the 2012 JSB paper.

Why do I run out of disc space during the run?

As of version 1.2, RELION uses temporary files that are stored to disc to distribute all probability-weighted sums of the model. These files may be very large (depending on the size of your refinement, but possibly several Gb), and one will be written for each MPI process. If disc space becomes a problem, you may also distribute all these sums through the network using MPI. To do so, add the option --dont_combine_weights_via_disc to the command line (you can ignore the warning that this option is not recognized). However, we suspect that the problems with TCP offloading described above are much worse when this option is used.

My calculations take forever, what can I do?

Take smaller data sets of better quality. Some people routinely collect data sets with millions of particles. This may not always be the most efficient route to a high-resolution structure. We prefer to carefully collect relatively small data sets of very high quality. This reduces computational loads and leads to relatively clean data sets to start with. If your reconstruction requires hundreds of thousands of particles (from a direct-electron detector) to get to high resolution, then something else is probably wrong. The best reason to collect millions of particles would be if there is a very large degree of structural heterogeneity in your sample. In that case: you'd better be prepared to sweat anyway. ;-)

The expansion of my movie frames takes forever. What can I do?

Try the make_movie_subset.csh script, as explained on the Process movies page.