Classification example: Difference between revisions

From Relion
Jump to navigation Jump to search
 
(39 intermediate revisions by 2 users not shown)
Line 2: Line 2:


= Standard classification benchmark: ribosomes wi/wo EFG =  
= Standard classification benchmark: ribosomes wi/wo EFG =  
This example was published in the [http://dx.doi.org/10.1016/j.jmb.2011.11.010 Scheres (2012) JMB] paper.


== Download data and reference ==
== Download data and reference ==


The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from [http://www.ebi.ac.uk/pdbe/emdb/singleParticledir/SPIDER_FRANK_data/J-Frank_70s_real_data.tar the EMDB at EBI], as well as the corresponding metadata that is stored in [http://www.ebi.ac.uk/pdbe/emdb/singleParticledir/SPIDER_FRANK_data/J_FRANK_70S_REAL/Ribosome_information_01_13_09.pdf this PDF file]. Save both files in your working directory.
The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from [ftp://ftp.ebi.ac.uk/pub/databases/emtest/SPIDER_FRANK_data/J-Frank_70s_real_data.tar the EMDB at EBI], as well as the corresponding metadata that is stored in [ftp://ftp.ebi.ac.uk/pub/databases/emtest/SPIDER_FRANK_data/J_FRANK_70S_REAL/Ribosome_information_01_13_09.pdf this PDF file]. Save both files in your working directory. This example uses [ftp://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-1056/map/emd_1056.map.gz EMDB entry 1056] as initial reference map. Save this file in the same working directory. Note this reference has the same pixel size (2.8A) and the same box size (130x130) as the data set above. Therefore, no re-scaling or windowing operations are necessary. Although this map is at a resolution of ~9 A, note that it will be low-pass filtered to 100 A in the RELION run below. 


[ftp://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-1056/map/emd_1056.map.gz EMDB entry 1056] will be used as initial reference. Save this file in the same working directory. Note this reference has the same pixel size (2.8A) and the same box size (130x130) as the data set above. Therefore, no re-scaling or windowing operations are necessary. Although this map is at a resolution of ~9 A, note that it will be low-pass filtered to 80 A in the RELION run below. 
Unpack the data (and rename the MRC-format map to have a .mrc extension as expected by RELION) as follows:
 
Unpack the data as follows:
  tar -xf J-Frank_70s_real_data.tar
  tar -xf J-Frank_70s_real_data.tar
  gunzip emd_1056.map.gz
  gunzip emd_1056.map.gz
Line 16: Line 15:
Note MRC maps should have extension <code>.mrc</code>, as explain on the [[Conventions & File_formats#Image_I/O]] page.
Note MRC maps should have extension <code>.mrc</code>, as explain on the [[Conventions & File_formats#Image_I/O]] page.


== Normalise the images ==
'''Although for the sake of simplicity, these data are classified directly using an external reference, please do read the [[Prepare_input_files#Important_note_for_3D_classification|Important note for 3D classification]] on the use of external references'''.
 
The data come at an arbitrary density scale. As explained on the [[Preprocess images]] page, RELION prefers images without any CTF-correction, pre-centering or masking. The only thing that needs to be adjusted is the average background density that has to be approximately zero. Because the downloaded data sets has average densities of around 4, they need to be normalised. The following [http://xmipp.cnb.csic.es XMIPP (v2.4)] commands set the average density of the background (i.e. outside a circle with a 60-pixel radius) in each image to 0 and the standard deviation to 1.
<code>
xmipp_selfile_create "win/*dat" >all.sel
xmipp_normalize -i all.sel  -background circle 60
</code>
Note that any other package may be used to do a similar thing, the most important thing is that '''the average background density should be approximately ZERO'''.


== Prepare the input STAR file ==
== Prepare input files ==


As explained on the [[Prepare input files]] page, RELION requires an input STAR-file to link inidividual images to their CTF information (only if no CTF-correction is to be performed, an input image stack may be used). The following will generate the STAR file for the data provided by Joachim Frank.
As explained on the [[Prepare input files]] page, RELION requires an input STAR-file to link inidividual images to their CTF information (only if no CTF-correction is to be performed, an input image stack may be used). The following will generate the STAR file for the data provided by Joachim Frank.
      
      
From the PDF select the following (14) lines and save them in a text file called <code>defocus.dat</code>
From the PDF select the following (14) lines and save them in a text file called <code>defocus.dat</code>
  1 3 1347.0 1347.0 21580.
  1 3 1347.0 1347.0 21580.
  2 3 505.00 1852.0 24833.
  2 3 505.00 1852.0 24833.
Line 47: Line 40:
Then save the following lines as a file called <code>make_star.csh</code>
Then save the following lines as a file called <code>make_star.csh</code>


<code>
 
  #!/usr/bin/env csh
  #!/usr/bin/env csh
  ls -l win/*dat | awk '{print $NF}' >imagelist
  ls -l win/*dat | awk '{print $NF}' >imagelist
Line 62: Line 55:
   head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star
   head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star
  end
  end
</code>
 


And execute it to generate the input STAR file with all image names and CTF information, using the command:
And execute it to generate the input STAR file with all image names and CTF information, using the command:
  csh make_star.csh
  csh make_star.csh
The resulting file called <code>all_images.star</code> can be directly used as input for the <code>relion_preprocess</code> program, which will be used to normalise the images. Normalisation will subract the average of the background pixel values from the image and divide by the standard deviation of the background pixels. The background pixels are those outside a circle with a radius of 60 pixels.
relion_preprocess --operate_on all_images.star --o all_images_norm --norm --bg_radius 60


== Run RELION ==
== Run RELION ==


Save the following lines in a file called <code>example_gui3d.settings</code>:
Save the following lines in a file called <code>example_gui3d.settings</code>:
<code>
 
  is_continue == false
  is_continue == false
  Output rootname: == example/K4
  Output rootname: == example/K4
  Continue from here:  ==  
  Continue from here:  ==  
  Input images: == /lmb/home/scheres/work/relion/ribo_test_case/all_images.star
  Input images: == /lmb/home/scheres/work/relion/ribo_test_case_new/all_images_norm.star
  Reference map: == /lmb/home/scheres/work/relion/ribo_test_case/emd_1056.mrc
  Reference map: == /lmb/home/scheres/work/relion/ribo_test_case_new/emd_1056.mrc
Padding factor: == 3
  Ref. map is on absolute greyscale? == No
  Ref. map is on absolute greyscale? == No
  Initial low-pass filter (A): == 60
  Initial low-pass filter (A): == 100
  Particle diameter (A): == 340
  Particle mask diameter (A): == 340
  Pixel size (A): == 2.8
  Pixel size (A): == 2.82
  Number of iterations: == 25
  Number of iterations: == 25
  Regularisation parameter: == 4
  Regularisation parameter T: == 4
  Do solvent flattening? == Yes
  Mask individual particles with zeros? == Yes
  Solvent mask: ==  
  Reference mask (optional): ==  
Do normalisation correction? == Yes
Do intensity correction? == Yes
  Do CTF-correction? == Yes
  Do CTF-correction? == Yes
Only flip phases? == No
  Have data been phase-flipped? == No
  Have data been phase-flipped? == No
  Ignore CTFs until first peak? == No
  Ignore CTFs until first peak? == No
Line 94: Line 90:
  Symmetry group: == C
  Symmetry group: == C
  Symmetry number: == 1
  Symmetry number: == 1
  Angular sampling interval (deg): == 7.5 degrees
  Angular sampling interval: == 7.5 degrees
Limit the tilt angle? == No
Limited tilt angle: == -91
  Perform local angular searches? == No
  Perform local angular searches? == No
  Local angular search range: == 5
  Local angular search range: == 5
  Number of classes: == 4
  Number of classes: == 4
Oversampling order: == 1
  Offset search range (pix): == 6
  Offset search range (pix): == 8
  Offset search step (pix): == 1
  Offset search step (pix): == 1
  Number of MPI procs: == 12
  Number of MPI procs: == 7
  Number of threads: == 8
  Number of threads: == 8
  Submit to queue? == Yes
  Submit to queue? == Yes
Line 109: Line 102:
  Queue submit command: == qsub
  Queue submit command: == qsub
  Standard submission script: == /lmb/home/scheres/app/relion/gui/qsub.csh
  Standard submission script: == /lmb/home/scheres/app/relion/gui/qsub.csh
  Additional arguments: == --random_seed 1
  Additional arguments: == --random_seed 1
</code>
 


Start the RELION GUI by typing <code>relion</code> from the command line (inside the working directory). Load the <code>example_gui3d.settings</code> through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button.
Start the RELION GUI by typing <code>relion</code> from the command line (inside the working directory). Load the <code>example_gui3d.settings</code> through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button.
Line 121: Line 114:


The class distribution (from K4_it025_model.star) is as follows:
The class distribution (from K4_it025_model.star) is as follows:
<code>
 
  data_model_classes
  data_model_classes
 
  loop_  
  loop_  
  _rlnReferenceImage  
  _rlnReferenceImage #1
  _rlnClassDistribution  
  _rlnClassDistribution #2
  example_ff60_K4/K4_it025_class001.mrc    0.283900
  _rlnAccuracyRotations #3
  example_ff60_K4/K4_it025_class002.mrc    0.062387
_rlnAccuracyTranslations #4
  example_ff60_K4/K4_it025_class003.mrc    0.284248
example/K4_it025_class001.mrc    0.330351    2.090000    0.845000
  example_ff60_K4/K4_it025_class004.mrc    0.369465
  example/K4_it025_class002.mrc    0.063993    2.740000    1.290000
 
  example/K4_it025_class003.mrc    0.260153    2.150000    0.858000
</code>
  example/K4_it025_class004.mrc    0.345502    1.874000    0.752000

Latest revision as of 13:19, 13 November 2017

Standard classification benchmark: ribosomes wi/wo EFG

This example was published in the Scheres (2012) JMB paper.

Download data and reference

The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from the EMDB at EBI, as well as the corresponding metadata that is stored in this PDF file. Save both files in your working directory. This example uses EMDB entry 1056 as initial reference map. Save this file in the same working directory. Note this reference has the same pixel size (2.8A) and the same box size (130x130) as the data set above. Therefore, no re-scaling or windowing operations are necessary. Although this map is at a resolution of ~9 A, note that it will be low-pass filtered to 100 A in the RELION run below.

Unpack the data (and rename the MRC-format map to have a .mrc extension as expected by RELION) as follows:

tar -xf J-Frank_70s_real_data.tar
gunzip emd_1056.map.gz
mv emd_1056.map emd_1056.mrc

Note MRC maps should have extension .mrc, as explain on the Conventions & File_formats#Image_I/O page.

Although for the sake of simplicity, these data are classified directly using an external reference, please do read the Important note for 3D classification on the use of external references.

Prepare input files

As explained on the Prepare input files page, RELION requires an input STAR-file to link inidividual images to their CTF information (only if no CTF-correction is to be performed, an input image stack may be used). The following will generate the STAR file for the data provided by Joachim Frank.

From the PDF select the following (14) lines and save them in a text file called defocus.dat

1 3 1347.0 1347.0 21580.
2 3 505.00 1852.0 24833.
3 3 989.00 2841.0 26450.
4 3 857.00 3698.0 28320.
5 3 475.00 4173.0 30993.
6 3 349.00 4522.0 33150.
7 3 478.00 5000.0 34588.
8 3 1242.0 6242.0 21580.
9 3 713.00 6955.0 24833.
10 3 1255.0 8210.0 26450.
11 3 1022.0 9232.0 28320.
12 3 304.00 9536.0 30993.
13 3 232.00 9768.0 33150.
14 3 232.00 10000. 34588.

Then save the following lines as a file called make_star.csh


#!/usr/bin/env csh
ls -l win/*dat | awk '{print $NF}' >imagelist
#
relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star
#
set ngr = 14
set gr = 0
while ($gr < $ngr)
 @ gr++
 set nn=`head -n $gr defocus.dat  | tail -1 | awk '{print int($3)}'`
 set tot=`head -n $gr defocus.dat | tail -1 | awk '{print int($4)}'`
 set def=`head -n $gr defocus.dat | tail -1 | awk '{print $5}'`
 head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star
end


And execute it to generate the input STAR file with all image names and CTF information, using the command:

csh make_star.csh


The resulting file called all_images.star can be directly used as input for the relion_preprocess program, which will be used to normalise the images. Normalisation will subract the average of the background pixel values from the image and divide by the standard deviation of the background pixels. The background pixels are those outside a circle with a radius of 60 pixels.


relion_preprocess --operate_on all_images.star --o all_images_norm --norm --bg_radius 60

Run RELION

Save the following lines in a file called example_gui3d.settings:

is_continue == false
Output rootname: == example/K4
Continue from here:  == 
Input images: == /lmb/home/scheres/work/relion/ribo_test_case_new/all_images_norm.star
Reference map: == /lmb/home/scheres/work/relion/ribo_test_case_new/emd_1056.mrc
Ref. map is on absolute greyscale? == No
Initial low-pass filter (A): == 100
Particle mask diameter (A): == 340
Pixel size (A): == 2.82
Number of iterations: == 25
Regularisation parameter T: == 4
Mask individual particles with zeros? == Yes
Reference mask (optional): == 
Do CTF-correction? == Yes
Have data been phase-flipped? == No
Ignore CTFs until first peak? == No
Has reference been CTF-corrected? == No
Symmetry group: == C
Symmetry number: == 1
Angular sampling interval: == 7.5 degrees
Perform local angular searches? == No
Local angular search range: == 5
Number of classes: == 4
Offset search range (pix): == 6
Offset search step (pix): == 1
Number of MPI procs: == 7
Number of threads: == 8
Submit to queue? == Yes
Queue name:  == openmpi_8
Queue submit command: == qsub
Standard submission script: == /lmb/home/scheres/app/relion/gui/qsub.csh
Additional arguments: == --random_seed 1


Start the RELION GUI by typing relion from the command line (inside the working directory). Load the example_gui3d.settings through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button.

That's it! Just wait until your job is finished. Using 8 MPI nodes, each with 8 threads, this calculation took 19 wall-clock hours on the LMB cluster.

Anticipated results

On our cluster, the settings above yield two classes (1 & 3) that are very similar and are interpreted as 70S ribosomes with EF-G and 1 tRNA, one class (4) is interpreted as a 70S ribosome without EF-G and with 3tRNAS, and one minority class (2) is interpreted as a previously unobserved structure for this data set.

The class distribution (from K4_it025_model.star) is as follows:

data_model_classes

loop_ 
_rlnReferenceImage #1 
_rlnClassDistribution #2 
_rlnAccuracyRotations #3 
_rlnAccuracyTranslations #4 
example/K4_it025_class001.mrc     0.330351     2.090000     0.845000 
example/K4_it025_class002.mrc     0.063993     2.740000     1.290000 
example/K4_it025_class003.mrc     0.260153     2.150000     0.858000 
example/K4_it025_class004.mrc     0.345502     1.874000     0.752000