Classification example: Difference between revisions

From Relion
Jump to navigation Jump to search
 
(9 intermediate revisions by 2 users not shown)
Line 6: Line 6:
== Download data and reference ==
== Download data and reference ==


The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from [http://www.ebi.ac.uk/pdbe/emdb/singleParticledir/SPIDER_FRANK_data/J-Frank_70s_real_data.tar the EMDB at EBI], as well as the corresponding metadata that is stored in [http://www.ebi.ac.uk/pdbe/emdb/singleParticledir/SPIDER_FRANK_data/J_FRANK_70S_REAL/Ribosome_information_01_13_09.pdf this PDF file]. Save both files in your working directory. This example uses [ftp://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-1056/map/emd_1056.map.gz EMDB entry 1056] as initial reference map. Save this file in the same working directory. Note this reference has the same pixel size (2.8A) and the same box size (130x130) as the data set above. Therefore, no re-scaling or windowing operations are necessary. Although this map is at a resolution of ~9 A, note that it will be low-pass filtered to 60 A in the RELION run below.   
The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from [ftp://ftp.ebi.ac.uk/pub/databases/emtest/SPIDER_FRANK_data/J-Frank_70s_real_data.tar the EMDB at EBI], as well as the corresponding metadata that is stored in [ftp://ftp.ebi.ac.uk/pub/databases/emtest/SPIDER_FRANK_data/J_FRANK_70S_REAL/Ribosome_information_01_13_09.pdf this PDF file]. Save both files in your working directory. This example uses [ftp://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-1056/map/emd_1056.map.gz EMDB entry 1056] as initial reference map. Save this file in the same working directory. Note this reference has the same pixel size (2.8A) and the same box size (130x130) as the data set above. Therefore, no re-scaling or windowing operations are necessary. Although this map is at a resolution of ~9 A, note that it will be low-pass filtered to 100 A in the RELION run below.   


Unpack the data (and rename the MRC-format map to have a .mrc extension as expected by RELION) as follows:
Unpack the data (and rename the MRC-format map to have a .mrc extension as expected by RELION) as follows:
Line 22: Line 22:
      
      
From the PDF select the following (14) lines and save them in a text file called <code>defocus.dat</code>
From the PDF select the following (14) lines and save them in a text file called <code>defocus.dat</code>
<code>
 
  1 3 1347.0 1347.0 21580.
  1 3 1347.0 1347.0 21580.
  2 3 505.00 1852.0 24833.
  2 3 505.00 1852.0 24833.
Line 37: Line 37:
  13 3 232.00 9768.0 33150.
  13 3 232.00 9768.0 33150.
  14 3 232.00 10000. 34588.
  14 3 232.00 10000. 34588.
</code>


Then save the following lines as a file called <code>make_star.csh</code>
Then save the following lines as a file called <code>make_star.csh</code>


<code>
 
  #!/usr/bin/env csh
  #!/usr/bin/env csh
  ls -l win/*dat | awk '{print $NF}' >imagelist
  ls -l win/*dat | awk '{print $NF}' >imagelist
Line 56: Line 55:
   head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star
   head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star
  end
  end
</code>
 


And execute it to generate the input STAR file with all image names and CTF information, using the command:
And execute it to generate the input STAR file with all image names and CTF information, using the command:
<code>
 
  csh make_star.csh
  csh make_star.csh
</code>
 


The resulting file called <code>all_images.star</code> can be directly used as input for the <code>relion_preprocess</code> program, which will be used to normalise the images. Normalisation will subract the average of the background pixel values from the image and divide by the standard deviation of the background pixels. The background pixels are those outside a circle with a radius of 60 pixels.
The resulting file called <code>all_images.star</code> can be directly used as input for the <code>relion_preprocess</code> program, which will be used to normalise the images. Normalisation will subract the average of the background pixel values from the image and divide by the standard deviation of the background pixels. The background pixels are those outside a circle with a radius of 60 pixels.


<code>
 
  relion_preprocess --operate_on all_images.star --o all_images_norm --norm --bg_radius 60
  relion_preprocess --operate_on all_images.star --o all_images_norm --norm --bg_radius 60
</code>


== Run RELION ==
== Run RELION ==


Save the following lines in a file called <code>example_gui3d.settings</code>:
Save the following lines in a file called <code>example_gui3d.settings</code>:
<code>
 
  is_continue == false
  is_continue == false
  Output rootname: == example/K4
  Output rootname: == example/K4
  Continue from here:  == /lmb/home/scheres/work/relion/ribo_test_case_new/example3/K4_it012_optimiser.star
  Continue from here:  ==  
  Input images: == /lmb/home/scheres/work/relion/ribo_test_case_new/all_images_norm.star
  Input images: == /lmb/home/scheres/work/relion/ribo_test_case_new/all_images_norm.star
  Reference map: == /lmb/home/scheres/work/relion/ribo_test_case_new/emd_1056.mrc
  Reference map: == /lmb/home/scheres/work/relion/ribo_test_case_new/emd_1056.mrc
  Ref. map is on absolute greyscale? == No
  Ref. map is on absolute greyscale? == No
  Initial low-pass filter (A): == 60
  Initial low-pass filter (A): == 100
  Particle mask diameter (A): == 340
  Particle mask diameter (A): == 340
  Pixel size (A): == 2.82
  Pixel size (A): == 2.82
Line 105: Line 103:
  Standard submission script: == /lmb/home/scheres/app/relion/gui/qsub.csh
  Standard submission script: == /lmb/home/scheres/app/relion/gui/qsub.csh
  Additional arguments: == --random_seed 1
  Additional arguments: == --random_seed 1
</code>
 


Start the RELION GUI by typing <code>relion</code> from the command line (inside the working directory). Load the <code>example_gui3d.settings</code> through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button.
Start the RELION GUI by typing <code>relion</code> from the command line (inside the working directory). Load the <code>example_gui3d.settings</code> through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button.
Line 116: Line 114:


The class distribution (from K4_it025_model.star) is as follows:
The class distribution (from K4_it025_model.star) is as follows:
<code>
 
  data_model_classes
  data_model_classes
   
   
Line 124: Line 122:
  _rlnAccuracyRotations #3  
  _rlnAccuracyRotations #3  
  _rlnAccuracyTranslations #4  
  _rlnAccuracyTranslations #4  
  example/K4_it025_class001.mrc    0.286736     2.128000     0.842000
  example/K4_it025_class001.mrc    0.330351     2.090000     0.845000
  example/K4_it025_class002.mrc    0.064184     2.594000     1.070000
  example/K4_it025_class002.mrc    0.063993     2.740000     1.290000
  example/K4_it025_class003.mrc    0.256061     2.142000     0.856000
  example/K4_it025_class003.mrc    0.260153     2.150000     0.858000
  example/K4_it025_class004.mrc    0.393019     1.928000     0.759000
  example/K4_it025_class004.mrc    0.345502     1.874000     0.752000
</code>

Latest revision as of 13:19, 13 November 2017

Standard classification benchmark: ribosomes wi/wo EFG

This example was published in the Scheres (2012) JMB paper.

Download data and reference

The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from the EMDB at EBI, as well as the corresponding metadata that is stored in this PDF file. Save both files in your working directory. This example uses EMDB entry 1056 as initial reference map. Save this file in the same working directory. Note this reference has the same pixel size (2.8A) and the same box size (130x130) as the data set above. Therefore, no re-scaling or windowing operations are necessary. Although this map is at a resolution of ~9 A, note that it will be low-pass filtered to 100 A in the RELION run below.

Unpack the data (and rename the MRC-format map to have a .mrc extension as expected by RELION) as follows:

tar -xf J-Frank_70s_real_data.tar
gunzip emd_1056.map.gz
mv emd_1056.map emd_1056.mrc

Note MRC maps should have extension .mrc, as explain on the Conventions & File_formats#Image_I/O page.

Although for the sake of simplicity, these data are classified directly using an external reference, please do read the Important note for 3D classification on the use of external references.

Prepare input files

As explained on the Prepare input files page, RELION requires an input STAR-file to link inidividual images to their CTF information (only if no CTF-correction is to be performed, an input image stack may be used). The following will generate the STAR file for the data provided by Joachim Frank.

From the PDF select the following (14) lines and save them in a text file called defocus.dat

1 3 1347.0 1347.0 21580.
2 3 505.00 1852.0 24833.
3 3 989.00 2841.0 26450.
4 3 857.00 3698.0 28320.
5 3 475.00 4173.0 30993.
6 3 349.00 4522.0 33150.
7 3 478.00 5000.0 34588.
8 3 1242.0 6242.0 21580.
9 3 713.00 6955.0 24833.
10 3 1255.0 8210.0 26450.
11 3 1022.0 9232.0 28320.
12 3 304.00 9536.0 30993.
13 3 232.00 9768.0 33150.
14 3 232.00 10000. 34588.

Then save the following lines as a file called make_star.csh


#!/usr/bin/env csh
ls -l win/*dat | awk '{print $NF}' >imagelist
#
relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
#
set ngr = 14
set gr = 0
while ($gr < $ngr)
 @ gr++
 set nn=`head -n $gr defocus.dat  | tail -1 | awk '{print int($3)}'`
 set tot=`head -n $gr defocus.dat | tail -1 | awk '{print int($4)}'`
 set def=`head -n $gr defocus.dat | tail -1 | awk '{print $5}'`
 head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star
end


And execute it to generate the input STAR file with all image names and CTF information, using the command:

csh make_star.csh


The resulting file called all_images.star can be directly used as input for the relion_preprocess program, which will be used to normalise the images. Normalisation will subract the average of the background pixel values from the image and divide by the standard deviation of the background pixels. The background pixels are those outside a circle with a radius of 60 pixels.


relion_preprocess --operate_on all_images.star --o all_images_norm --norm --bg_radius 60

Run RELION

Save the following lines in a file called example_gui3d.settings:

is_continue == false
Output rootname: == example/K4
Continue from here:  == 
Input images: == /lmb/home/scheres/work/relion/ribo_test_case_new/all_images_norm.star
Reference map: == /lmb/home/scheres/work/relion/ribo_test_case_new/emd_1056.mrc
Ref. map is on absolute greyscale? == No
Initial low-pass filter (A): == 100
Particle mask diameter (A): == 340
Pixel size (A): == 2.82
Number of iterations: == 25
Regularisation parameter T: == 4
Mask individual particles with zeros? == Yes
Reference mask (optional): == 
Do CTF-correction? == Yes
Have data been phase-flipped? == No
Ignore CTFs until first peak? == No
Has reference been CTF-corrected? == No
Symmetry group: == C
Symmetry number: == 1
Angular sampling interval: == 7.5 degrees
Perform local angular searches? == No
Local angular search range: == 5
Number of classes: == 4
Offset search range (pix): == 6
Offset search step (pix): == 1
Number of MPI procs: == 7
Number of threads: == 8
Submit to queue? == Yes
Queue name:  == openmpi_8
Queue submit command: == qsub
Standard submission script: == /lmb/home/scheres/app/relion/gui/qsub.csh
Additional arguments: == --random_seed 1


Start the RELION GUI by typing relion from the command line (inside the working directory). Load the example_gui3d.settings through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button.

That's it! Just wait until your job is finished. Using 8 MPI nodes, each with 8 threads, this calculation took 19 wall-clock hours on the LMB cluster.

Anticipated results

On our cluster, the settings above yield two classes (1 & 3) that are very similar and are interpreted as 70S ribosomes with EF-G and 1 tRNA, one class (4) is interpreted as a 70S ribosome without EF-G and with 3tRNAS, and one minority class (2) is interpreted as a previously unobserved structure for this data set.

The class distribution (from K4_it025_model.star) is as follows:

data_model_classes

loop_ 
_rlnReferenceImage #1 
_rlnClassDistribution #2 
_rlnAccuracyRotations #3 
_rlnAccuracyTranslations #4 
example/K4_it025_class001.mrc     0.330351     2.090000     0.845000 
example/K4_it025_class002.mrc     0.063993     2.740000     1.290000 
example/K4_it025_class003.mrc     0.260153     2.150000     0.858000 
example/K4_it025_class004.mrc     0.345502     1.874000     0.752000