Classification example

From Relion
Jump to navigation Jump to search

Standard classification benchmark: ribosomes wi/wo EFG

This example was published in the Scheres (2012) JMB paper.

Download data and reference

The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from the EMDB at EBI, as well as the corresponding metadata that is stored in this PDF file. Save both files in your working directory. This example uses EMDB entry 1056 as initial reference map. Save this file in the same working directory. Note this reference has the same pixel size (2.8A) and the same box size (130x130) as the data set above. Therefore, no re-scaling or windowing operations are necessary. Although this map is at a resolution of ~9 A, note that it will be low-pass filtered to 100 A in the RELION run below.

Unpack the data (and rename the MRC-format map to have a .mrc extension as expected by RELION) as follows:

tar -xf J-Frank_70s_real_data.tar
gunzip emd_1056.map.gz
mv emd_1056.map emd_1056.mrc

Note MRC maps should have extension .mrc, as explain on the Conventions & File_formats#Image_I/O page.

Although for the sake of simplicity, these data are classified directly using an external reference, please do read the Important note for 3D classification on the use of external references.

Prepare input files

As explained on the Prepare input files page, RELION requires an input STAR-file to link inidividual images to their CTF information (only if no CTF-correction is to be performed, an input image stack may be used). The following will generate the STAR file for the data provided by Joachim Frank.

From the PDF select the following (14) lines and save them in a text file called defocus.dat

1 3 1347.0 1347.0 21580.
2 3 505.00 1852.0 24833.
3 3 989.00 2841.0 26450.
4 3 857.00 3698.0 28320.
5 3 475.00 4173.0 30993.
6 3 349.00 4522.0 33150.
7 3 478.00 5000.0 34588.
8 3 1242.0 6242.0 21580.
9 3 713.00 6955.0 24833.
10 3 1255.0 8210.0 26450.
11 3 1022.0 9232.0 28320.
12 3 304.00 9536.0 30993.
13 3 232.00 9768.0 33150.
14 3 232.00 10000. 34588.

Then save the following lines as a file called make_star.csh


#!/usr/bin/env csh
ls -l win/*dat | awk '{print $NF}' >imagelist
#
relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
#
set ngr = 14
set gr = 0
while ($gr < $ngr)
 @ gr++
 set nn=`head -n $gr defocus.dat  | tail -1 | awk '{print int($3)}'`
 set tot=`head -n $gr defocus.dat | tail -1 | awk '{print int($4)}'`
 set def=`head -n $gr defocus.dat | tail -1 | awk '{print $5}'`
 head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star
end


And execute it to generate the input STAR file with all image names and CTF information, using the command:

csh make_star.csh


The resulting file called all_images.star can be directly used as input for the relion_preprocess program, which will be used to normalise the images. Normalisation will subract the average of the background pixel values from the image and divide by the standard deviation of the background pixels. The background pixels are those outside a circle with a radius of 60 pixels.


relion_preprocess --operate_on all_images.star --o all_images_norm --norm --bg_radius 60

Run RELION

Save the following lines in a file called example_gui3d.settings:

is_continue == false
Output rootname: == example/K4
Continue from here:  == 
Input images: == /lmb/home/scheres/work/relion/ribo_test_case_new/all_images_norm.star
Reference map: == /lmb/home/scheres/work/relion/ribo_test_case_new/emd_1056.mrc
Ref. map is on absolute greyscale? == No
Initial low-pass filter (A): == 100
Particle mask diameter (A): == 340
Pixel size (A): == 2.82
Number of iterations: == 25
Regularisation parameter T: == 4
Mask individual particles with zeros? == Yes
Reference mask (optional): == 
Do CTF-correction? == Yes
Have data been phase-flipped? == No
Ignore CTFs until first peak? == No
Has reference been CTF-corrected? == No
Symmetry group: == C
Symmetry number: == 1
Angular sampling interval: == 7.5 degrees
Perform local angular searches? == No
Local angular search range: == 5
Number of classes: == 4
Offset search range (pix): == 6
Offset search step (pix): == 1
Number of MPI procs: == 7
Number of threads: == 8
Submit to queue? == Yes
Queue name:  == openmpi_8
Queue submit command: == qsub
Standard submission script: == /lmb/home/scheres/app/relion/gui/qsub.csh
Additional arguments: == --random_seed 1


Start the RELION GUI by typing relion from the command line (inside the working directory). Load the example_gui3d.settings through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button.

That's it! Just wait until your job is finished. Using 8 MPI nodes, each with 8 threads, this calculation took 19 wall-clock hours on the LMB cluster.

Anticipated results

On our cluster, the settings above yield two classes (1 & 3) that are very similar and are interpreted as 70S ribosomes with EF-G and 1 tRNA, one class (4) is interpreted as a 70S ribosome without EF-G and with 3tRNAS, and one minority class (2) is interpreted as a previously unobserved structure for this data set.

The class distribution (from K4_it025_model.star) is as follows:

data_model_classes

loop_ 
_rlnReferenceImage #1 
_rlnClassDistribution #2 
_rlnAccuracyRotations #3 
_rlnAccuracyTranslations #4 
example/K4_it025_class001.mrc     0.330351     2.090000     0.845000 
example/K4_it025_class002.mrc     0.063993     2.740000     1.290000 
example/K4_it025_class003.mrc     0.260153     2.150000     0.858000 
example/K4_it025_class004.mrc     0.345502     1.874000     0.752000