Classification example: Difference between revisions
(26 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
= Standard classification benchmark: ribosomes wi/wo EFG = | = Standard classification benchmark: ribosomes wi/wo EFG = | ||
This example was published in the [http://dx.doi.org/10.1016/j.jmb.2011.11.010 Scheres (2012) JMB] paper. | |||
== Download data and reference == | == Download data and reference == | ||
The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from [ | The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from [ftp://ftp.ebi.ac.uk/pub/databases/emtest/SPIDER_FRANK_data/J-Frank_70s_real_data.tar the EMDB at EBI], as well as the corresponding metadata that is stored in [ftp://ftp.ebi.ac.uk/pub/databases/emtest/SPIDER_FRANK_data/J_FRANK_70S_REAL/Ribosome_information_01_13_09.pdf this PDF file]. Save both files in your working directory. This example uses [ftp://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-1056/map/emd_1056.map.gz EMDB entry 1056] as initial reference map. Save this file in the same working directory. Note this reference has the same pixel size (2.8A) and the same box size (130x130) as the data set above. Therefore, no re-scaling or windowing operations are necessary. Although this map is at a resolution of ~9 A, note that it will be low-pass filtered to 100 A in the RELION run below. | ||
Unpack the data (and rename the MRC-format map to have a .mrc extension as expected by RELION) as follows: | |||
tar -xf J-Frank_70s_real_data.tar | tar -xf J-Frank_70s_real_data.tar | ||
gunzip emd_1056.map.gz | gunzip emd_1056.map.gz | ||
Line 16: | Line 15: | ||
Note MRC maps should have extension <code>.mrc</code>, as explain on the [[Conventions & File_formats#Image_I/O]] page. | Note MRC maps should have extension <code>.mrc</code>, as explain on the [[Conventions & File_formats#Image_I/O]] page. | ||
'''Although for the sake of simplicity, these data are classified directly using an external reference, please do read the [[Prepare_input_files#Important_note_for_3D_classification|Important note for 3D classification]] on the use of external references'''. | |||
== Prepare | == Prepare input files == | ||
As explained on the [[Prepare input files]] page, RELION requires an input STAR-file to link inidividual images to their CTF information (only if no CTF-correction is to be performed, an input image stack may be used). The following will generate the STAR file for the data provided by Joachim Frank. | As explained on the [[Prepare input files]] page, RELION requires an input STAR-file to link inidividual images to their CTF information (only if no CTF-correction is to be performed, an input image stack may be used). The following will generate the STAR file for the data provided by Joachim Frank. | ||
From the PDF select the following (14) lines and save them in a text file called <code>defocus.dat</code> | From the PDF select the following (14) lines and save them in a text file called <code>defocus.dat</code> | ||
1 3 1347.0 1347.0 21580. | 1 3 1347.0 1347.0 21580. | ||
2 3 505.00 1852.0 24833. | 2 3 505.00 1852.0 24833. | ||
Line 49: | Line 40: | ||
Then save the following lines as a file called <code>make_star.csh</code> | Then save the following lines as a file called <code>make_star.csh</code> | ||
#!/usr/bin/env csh | #!/usr/bin/env csh | ||
ls -l win/*dat | awk '{print $NF}' >imagelist | ls -l win/*dat | awk '{print $NF}' >imagelist | ||
Line 64: | Line 55: | ||
head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star | head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star | ||
end | end | ||
And execute it to generate the input STAR file with all image names and CTF information, using the command: | And execute it to generate the input STAR file with all image names and CTF information, using the command: | ||
csh make_star.csh | csh make_star.csh | ||
The resulting file called <code>all_images.star</code> can be directly used as input for the <code>relion_preprocess</code> program, which will be used to normalise the images. Normalisation will subract the average of the background pixel values from the image and divide by the standard deviation of the background pixels. The background pixels are those outside a circle with a radius of 60 pixels. | |||
relion_preprocess --operate_on all_images.star --o all_images_norm --norm --bg_radius 60 | |||
== Run RELION == | == Run RELION == | ||
Save the following lines in a file called <code>example_gui3d.settings</code>: | Save the following lines in a file called <code>example_gui3d.settings</code>: | ||
is_continue == false | is_continue == false | ||
Output rootname: == | Output rootname: == example/K4 | ||
Continue from here: == | Continue from here: == | ||
Input images: == / | Input images: == /lmb/home/scheres/work/relion/ribo_test_case_new/all_images_norm.star | ||
Reference map: == / | Reference map: == /lmb/home/scheres/work/relion/ribo_test_case_new/emd_1056.mrc | ||
Ref. map is on absolute greyscale? == No | Ref. map is on absolute greyscale? == No | ||
Initial low-pass filter (A): == | Initial low-pass filter (A): == 100 | ||
Particle mask diameter (A): == 340 | |||
Pixel size (A): == 2.82 | Pixel size (A): == 2.82 | ||
Number of iterations: == 25 | Number of iterations: == 25 | ||
Regularisation parameter T: == 4 | Regularisation parameter T: == 4 | ||
Mask | Mask individual particles with zeros? == Yes | ||
Reference mask: == | Reference mask (optional): == | ||
Do CTF-correction? == Yes | Do CTF-correction? == Yes | ||
Have data been phase-flipped? == No | Have data been phase-flipped? == No | ||
Ignore CTFs until first peak? == No | Ignore CTFs until first peak? == No | ||
Has reference been CTF-corrected? == No | Has reference been CTF-corrected? == No | ||
Symmetry group: == C | Symmetry group: == C | ||
Symmetry number: == 1 | Symmetry number: == 1 | ||
Angular sampling interval | Angular sampling interval: == 7.5 degrees | ||
Perform local angular searches? == No | Perform local angular searches? == No | ||
Local angular search range: == 5 | Local angular search range: == 5 | ||
Number of classes: == 4 | Number of classes: == 4 | ||
Offset search range (pix): == 6 | Offset search range (pix): == 6 | ||
Offset search step (pix): == 1 | Offset search step (pix): == 1 | ||
Number of MPI procs: == | Number of MPI procs: == 7 | ||
Number of threads: == 8 | Number of threads: == 8 | ||
Submit to queue? == Yes | Submit to queue? == Yes | ||
Queue name: == openmpi_8 | Queue name: == openmpi_8 | ||
Queue submit command: == qsub | Queue submit command: == qsub | ||
Standard submission script: == / | Standard submission script: == /lmb/home/scheres/app/relion/gui/qsub.csh | ||
Additional arguments: == --random_seed 1 | Additional arguments: == --random_seed 1 | ||
Start the RELION GUI by typing <code>relion</code> from the command line (inside the working directory). Load the <code>example_gui3d.settings</code> through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button. | Start the RELION GUI by typing <code>relion</code> from the command line (inside the working directory). Load the <code>example_gui3d.settings</code> through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button. | ||
Line 122: | Line 114: | ||
The class distribution (from K4_it025_model.star) is as follows: | The class distribution (from K4_it025_model.star) is as follows: | ||
data_model_classes | data_model_classes | ||
loop_ | loop_ | ||
_rlnReferenceImage | _rlnReferenceImage #1 | ||
_rlnClassDistribution | _rlnClassDistribution #2 | ||
_rlnAccuracyRotations #3 | |||
_rlnAccuracyTranslations #4 | |||
example/K4_it025_class001.mrc 0.330351 2.090000 0.845000 | |||
example/K4_it025_class002.mrc 0.063993 2.740000 1.290000 | |||
example/K4_it025_class003.mrc 0.260153 2.150000 0.858000 | |||
example/K4_it025_class004.mrc 0.345502 1.874000 0.752000 |
Latest revision as of 13:19, 13 November 2017
Standard classification benchmark: ribosomes wi/wo EFG
This example was published in the Scheres (2012) JMB paper.
Download data and reference
The test data used below were deposited by Joachim Frank and represent a standard benchmark for 3D classification algorithms. These 10,000 images may be downloaded from the EMDB at EBI, as well as the corresponding metadata that is stored in this PDF file. Save both files in your working directory. This example uses EMDB entry 1056 as initial reference map. Save this file in the same working directory. Note this reference has the same pixel size (2.8A) and the same box size (130x130) as the data set above. Therefore, no re-scaling or windowing operations are necessary. Although this map is at a resolution of ~9 A, note that it will be low-pass filtered to 100 A in the RELION run below.
Unpack the data (and rename the MRC-format map to have a .mrc extension as expected by RELION) as follows:
tar -xf J-Frank_70s_real_data.tar gunzip emd_1056.map.gz mv emd_1056.map emd_1056.mrc
Note MRC maps should have extension .mrc
, as explain on the Conventions & File_formats#Image_I/O page.
Although for the sake of simplicity, these data are classified directly using an external reference, please do read the Important note for 3D classification on the use of external references.
Prepare input files
As explained on the Prepare input files page, RELION requires an input STAR-file to link inidividual images to their CTF information (only if no CTF-correction is to be performed, an input image stack may be used). The following will generate the STAR file for the data provided by Joachim Frank.
From the PDF select the following (14) lines and save them in a text file called defocus.dat
1 3 1347.0 1347.0 21580. 2 3 505.00 1852.0 24833. 3 3 989.00 2841.0 26450. 4 3 857.00 3698.0 28320. 5 3 475.00 4173.0 30993. 6 3 349.00 4522.0 33150. 7 3 478.00 5000.0 34588. 8 3 1242.0 6242.0 21580. 9 3 713.00 6955.0 24833. 10 3 1255.0 8210.0 26450. 11 3 1022.0 9232.0 28320. 12 3 304.00 9536.0 30993. 13 3 232.00 9768.0 33150. 14 3 232.00 10000. 34588.
Then save the following lines as a file called make_star.csh
#!/usr/bin/env csh ls -l win/*dat | awk '{print $NF}' >imagelist # relion_star_loopheader rlnImageName rlnMicrographName rlnDefocusU rlnVoltage rlnSphericalAberration rlnAmplitudeContrast > all_images.star # set ngr = 14 set gr = 0 while ($gr < $ngr) @ gr++ set nn=`head -n $gr defocus.dat | tail -1 | awk '{print int($3)}'` set tot=`head -n $gr defocus.dat | tail -1 | awk '{print int($4)}'` set def=`head -n $gr defocus.dat | tail -1 | awk '{print $5}'` head -n ${tot} imagelist | tail -n ${nn} |awk -v"def=$def" -v"gr=$gr" '{print $1, gr, def, 200, 2, 0.15}' >> all_images.star end
And execute it to generate the input STAR file with all image names and CTF information, using the command:
csh make_star.csh
The resulting file called all_images.star
can be directly used as input for the relion_preprocess
program, which will be used to normalise the images. Normalisation will subract the average of the background pixel values from the image and divide by the standard deviation of the background pixels. The background pixels are those outside a circle with a radius of 60 pixels.
relion_preprocess --operate_on all_images.star --o all_images_norm --norm --bg_radius 60
Run RELION
Save the following lines in a file called example_gui3d.settings
:
is_continue == false Output rootname: == example/K4 Continue from here: == Input images: == /lmb/home/scheres/work/relion/ribo_test_case_new/all_images_norm.star Reference map: == /lmb/home/scheres/work/relion/ribo_test_case_new/emd_1056.mrc Ref. map is on absolute greyscale? == No Initial low-pass filter (A): == 100 Particle mask diameter (A): == 340 Pixel size (A): == 2.82 Number of iterations: == 25 Regularisation parameter T: == 4 Mask individual particles with zeros? == Yes Reference mask (optional): == Do CTF-correction? == Yes Have data been phase-flipped? == No Ignore CTFs until first peak? == No Has reference been CTF-corrected? == No Symmetry group: == C Symmetry number: == 1 Angular sampling interval: == 7.5 degrees Perform local angular searches? == No Local angular search range: == 5 Number of classes: == 4 Offset search range (pix): == 6 Offset search step (pix): == 1 Number of MPI procs: == 7 Number of threads: == 8 Submit to queue? == Yes Queue name: == openmpi_8 Queue submit command: == qsub Standard submission script: == /lmb/home/scheres/app/relion/gui/qsub.csh Additional arguments: == --random_seed 1
Start the RELION GUI by typing relion
from the command line (inside the working directory). Load the example_gui3d.settings
through the File Menu option. Then in the Running tab adapt the parameters for your particular cluster setup, and submit the job by clicking on the orange "Run!" button.
That's it! Just wait until your job is finished. Using 8 MPI nodes, each with 8 threads, this calculation took 19 wall-clock hours on the LMB cluster.
Anticipated results
On our cluster, the settings above yield two classes (1 & 3) that are very similar and are interpreted as 70S ribosomes with EF-G and 1 tRNA, one class (4) is interpreted as a 70S ribosome without EF-G and with 3tRNAS, and one minority class (2) is interpreted as a previously unobserved structure for this data set.
The class distribution (from K4_it025_model.star) is as follows:
data_model_classes loop_ _rlnReferenceImage #1 _rlnClassDistribution #2 _rlnAccuracyRotations #3 _rlnAccuracyTranslations #4 example/K4_it025_class001.mrc 0.330351 2.090000 0.845000 example/K4_it025_class002.mrc 0.063993 2.740000 1.290000 example/K4_it025_class003.mrc 0.260153 2.150000 0.858000 example/K4_it025_class004.mrc 0.345502 1.874000 0.752000