Pre-distribution version, for Linux, Mac OS X and Windows. Contact Rob Nicholls for more information.
This software is under development - if any aspect of the functionality/implementation is considered undesirable, or if any bugs, strange behavior or unexpected results are encountered, please email - any comments, questions and suggestions are very much appreciated.
ProSMART (Procrustes Structural Matching Alignment and Restraint Tool) is a software tool designed for the conformation-independent structural comparison of protein chains. At current, ProSMART has two components:
ProSMART ALIGN allows the pairwise alignment of chain-pairs, and also allows the batch-processing of multiple pairwise alignments. Similarity is based on the conservation of local structure, and is consequently independent of global conformation. For each chain-pair, multiple superpositions may be provided: a global superposition based on the achieved alignment, and a superposition for each identified common rigid substructure (e.g. domain), if any are identified. Output includes residue-based, sidechain-based and global similarity scores, thus providing a multi-resolution view of chain similarity. Residue-based and sidechain-based scores may be viewed in color, using the graphics software PyMOL. Experimental ProSMART analysis features are under development and available in the latest versions of CCP4mg.
ProSMART RESTRAIN uses the results from ProSMART ALIGN in order to generate restraints for a target protein, based on one (or multiple) other homologous protein structures. Restraints may also be generated for individual fragments (e.g. secondary structure elements). The use of restraints from ProSMART is intended to improve refinement by REFMAC5. Note that parameter tweaking in refmac will almost certainly be required. If you have a higher-resolution structure that is well-refined, then you can attempt to use information from this structure in order to improve reliability of refinement, providing this "external reference" structure is sufficiently similar to the "target" structure that you are trying to refine. Generally, external structures should be identical or close homologs, although in theory even structures with low sequence similarity might be used under the employed formalism; the user must decide what is sensible/appropriate.
A basic understanding of directory navigation in the unix command line is assumed.
ProSMART is provided as a
file (with the X's replaced by the appropriate date). After downloading
this file, navigate to the location where you downloaded this file,
using the command line. It may be unpacked by typing (with the X's
replaced, as appropriate):
tar zxf prosmart_source_XXX.tar.gz
A directory called
ProSMART/ will have been created within the current working directory (this may be confirmed by typing
ls to list the contents of the current working directory). In the rest of this installation guide, we shall refer to the
ProSMART/ directory as the 'ProSMART directory'.
Installation of ProSMART is achieved using the makefile located in the ProSMART directory. On most systems, the default configuration settings will be suitable. However, on other systems it may be necessary or desirable to alter the ProSMART's installation directories, particularly if you don't have sufficient permissions to install into the default directories (which may require superuser/administrator privileges).
To reconfigure ProSMART, open the makefile (located:
ProSMART/makefile) using a text editor. Near the top of this file, you will see these lines (or similar, depending on the particular version):
BIN_DIR = /usr/local/bin/
LIB_DIR = /usr/local/share/
You may choose to change any of the locations specified by these
variables. For example, to put binaries/library in CCP4 directories, set
BIN_DIR = $(CBIN), and
LIB_DIR = $(CLIBD) (sufficient privileges will be required to install). For more information, see the makefile (version 0.807 or later).
BIN_DIR variable specifies the location that the
ProSMART binaries will be copied to upon installation. It is important
that this directory is specified by the
PATH environment variable, so that ProSMART can be executed from any directory. Type
echo $PATH to list the directories specified in the
LIB_DIRvariable specifies the location that the ProSMART library will be copied to upon installation. This can be any directory that has appropriate permissions.
Installation of ProSMART is done using the makefile located in the ProSMART directory.
make and hit enter. This will compile ProSMART. Upon success, three binaries will be created in the ProSMART directory:
make install and hit enter. Upon success, this will copy the three binaries to
BIN_DIR and copy the library to
LIB_DIR (as specified in the makefile).
Optional: to clean up the ProSMART directory, type
make clean and hit enter. This will remove all created object files and binaries from the current directory.
Upon successful installation, ProSMART can be executed by typing:
prosmart. This will display a list of command line arguments, equivalent to typing
If installation was unsuccessful, the most common problem is that you
don't have enough privileges to write to the desired directories. In
this case, you may wish to ask for help from your system administrator,
or obtain appropriate permissions (e.g. super user or administrator
privileges, as required; e.g. typing
sudo make install may
work, providing you have super user privileges). Otherwise, you could
reconfigure the makefile, changing the ProSMART installation directories
to some directories that you do have appropriate permissions for (see above).
The experimental Windows version of ProSMART has been tested on Windows 7, and assumes that the CCP4 suite is installed. For more information on installation without CCP4 (or on different versions of Windows), please contact the author. The Windows version of ProSMART does not support the simultaneous execution of multiple child processes (for better performance, try using the Linux/MacOSX version).
prosmart_windows_XXX.zip file (in Windows 7, right click on the compressed folder and select the "Extract All..." option).
From within the unzipped folder, double-click the
install.bat file (warning: do not attempt to run
install.bat from within the compressed folder).
install.bat batch file will copy the three ProSMART executables
prosmart_restrain.exe, and the directory
Prosmart_Library into the CCP4 directory for binaries (specified by the
CBIN environment variable, e.g.
C:\CCP4-Packages\ccp4-6.2.0\bin). Upon successful installation, ProSMART should be runnable from the Command Prompt (by typing
and from within CCP4i, if the appropriate CCP4i task interfaces have
been installed. To uninstall ProSMART, double-click the supplied
uninstall.bat file, which remvoes ProSMART from
If any unknown errors occur, or if you would like assistance, please seek help from your system administrator or contact the author.
.tar.gzfile, open CCP4i.
In the "System Administration" drop-down menu on the right side of the screen, select "Install/uninstall Tasks", which should open a pop-up window.
Select "Run the Installation Manager to Install a new task", and "Perform automatic installation of tasks into user's local CCP4i area" (if wanting to install for all users, change "user's local CCP4i" to "main CCP4i").
In the "Task archive" field, browse to find the downloaded
.tar.gz file (the name of the package and version number should automatically appear). Click "Apply".
Note: it may be advisable to uninstall any existing interface versions before attempting to install a new task interface.
Run ProSMART. From the command line, run the executable
prosmart with the appropriate arguments (see below). Arguments can be either passed to the program as command line arguments and/or via a configuration file (see
-f argument below).
Note: never run the
prosmart_restrain binaries - these are called automatically by
The residue alignment must be reasonable in order to successfully
generate external restraints for use in refinement. Consequently, if
external restraints are to be generated, it is advised to manually check
the ProSMART residue alignment (by viewing the outputted pdb files in
PyMOL, with the outputted PyMOL color scripts) to confirm it is correct
(or at least reasonable) by checking the generated alignment file. If it
is not, then try different alignment parameters or seek help.
Importantly, note also that the argument
-id can be used to
force the correct alignment of sequence-identical structures, which is
particularly useful when intending to generate restraints using
identical structures, as well as for other applications.
Output is communicated through a HTML-format results page, called
ProSMART_Results.html, which may be found in the output directory. The log files automatically generated by ProSMART (
prosmart_restrain_logfile.txt) provide useful information, indicate which files have been created, and may help any troubleshooting.
The strength and qualitative nature of the generated atomic bond restraints (and other parameters) can be adjusted within ProSMART (see below), and can also be adjusted within REFMAC5 (version 5.7.0005 or later). These parameters may be played with in order to get a successful refinement using the external restraints. Appropriate parameters will be dependent on your particular case - data quality, resolution, similarity of external structure, etc.
Suppose you are trying to refine a low-resolution structure
mypdb1.pdb, and want to utilise information from a known higher-resolution structure
mypdb2.pdb during refinement. The simplest example of aligning and generating restraints for all chains in
mypdb1.pdb using all chains in
mypdb2.pdb as external information is:
prosmart -p1 mypdb1.pdb -p2 mypdb2.pdb
For the alignment of
B, the alignment and generation of external restraints from
mypdb2_B for use in the refinement of
mypdb1_A can be achieved as follows:
prosmart -p1 mypdb1.pdb -p2 mypdb2.pdb -c1 A -c2 B
To perform alignment, but not generate restraints, use the
prosmart -p1 mypdb1.pdb -p2 mypdb2.pdb -a
To perform generate restraints, but not perform alignment, use the
argument (note that a valid alignment file must already exist! This
functionality is useful if you want to edit the alignment before
generating restraints based on the alignment):
prosmart -p1 mypdb1.pdb -p2 mypdb2.pdb -r
The chains do not need to be specified - to perform pairwise alignment of all chains from
mypdb1.pdb with all chains in
mypdb2.pdb, then generate restraints for all chains in
mypdb1.pdb, the following command would be appropriate:
prosmart -p1 mypdb1.pdb -p2 mypdb2.pdb
If the second PDB file is not specified, then an all-on-all alignment will be performed. All chains within
mypdb1.pdb will be pairwise aligned, and restraints generated for each other, e.g. this is valid:
prosmart -p1 mypdb1.pdb
Note that, in an all-on-all alignment, each chain will not be aligned/restrained to itself by default.
For fragment motif alignment, and generation of motif restraints, appropriate syntax may be:
prosmart -p1 mypdb1.pdb -helix
This command will identify sufficiently helical regions, and generate the corresponding restraints. Note that the default library consists of two entries: an ideal helix and a typical strand. These are located in the ProSMART library. This library (and alignment rules) may be edited and extended by the user. To run ProSMART using all fragments in the local library, use the
-lib argument. Note also that use of the
-helix arguments automatically overrides any secondary reference PDB files specified using the
It is possible to specify multiple PDB files and chains in order to
automatically perform multiple pairwise alignments, e.g. this is valid:
prosmart -p1 mypdb1.pdb mypdb2.pdb -p2 mypdb3.pdb mypdb4.pdb mypdb5.pdb
If more than one PDB file is specified and chains are specified then the
PDB files must be repeated for each chain so that a one-to-one
correspondence exists between the arguments of
-c1, and between
-c2. E.g. this is a valid command to perform an all-on-all alignment of
prosmart -p1 mypdb1.pdb mypdb2.pdb mypdb2.pdb -c1 A A B -a
However, this is not:
prosmart -p1 mypdb1.pdb mypdb2.pdb -c1 A A B -a
If only one PDB file is specified then multiple chains may be selected, e.g. this is valid:
prosmart -p1 mypdb1.pdb -c1 A B C -a
These rules apply separately for the target (
-c1) and the secondary (
Suppose you have two structures: the low-resolution structure that you
want to refine (target.pdb) and a sequence-identical higher-resolution
structure that you want to use as prior information (external.pdb). Then
to generate restraints for all chains in target.pdb, using all chains
from external.pdb, type:
prosmart -id -p1 target.pdb -p2 external.pdb.
-id" keyword indicates that the structures are
sequence-identical, and so it will assume that the residues are
equivalent, bypassing the alignment stage and assuming equivalence of
residue numbering. This should be used for identical structures.
However, the "
-id" keyword should not be used if the
structures are non-identical in sequence. If the structures are
homologous in structure but non-identical in sequence then do not use the "
-id" keyword, as the generated restraints would be non-sensical and likely destructive during refinement.
You can do other things, like generate restraints using only the best-scoring chains ("
However, note that it may be reasonable to use the default all-on-all
chain restraint generation in some cases, since REFMAC5 currently
selects only the best restraint for each atom-pair when multiple
restraints are generated.
An example of a possible execution of REFMAC5 with external restraints is:
Description of used keywords:
refmac5 \ XYZIN pdb_in.pdb \ HKLIN mtz_in.mtz \ XYZOUT pdb_out.pdb \ HKLOUT mtz_out.mtz \ <<EOF NCYC 20 EXTERNAL USE MAIN EXTERNAL DMAX 4.2 EXTERNAL WEIGHT SCALE 10 EXTERNAL WEIGHT GMWT 0.15 @prosmart_restraints_file.txt MONI DIST 1000000 END EOF
||Number of REFMAC5 refinement cycles. Note that external restraints will seem to have little effect if using only few refinement cycles (e.g. 5). Something like 20-40 cycles may be required, and even more if also using jelly-body restraints (see below).|
||Discards any side-chain restraints that may be present in the external restraints file (remove this keyword to keep side-chains restraints; the
||Maximum restraint interatomic distance. Highly recommended value: 4.2.|
||External restraints weight. Increasing this weight increases the influence of external restraints during refinement. Optimal value: varies dramatically - note that the value 10 shown here is rather arbitrary.|
||Geman-McClure parameter, which controls robustness to outliers. Increasing this value reduces the influence of outliers (i.e. restraints that are very different from the current interatomic distance). However, increasing this value too high results in too many restraints being considered outliers - this means that only the restraints that are very similar to the current interatomic distances will have much effect. Optimal value: varies dramatically - note that the value 0.15 shown here is rather arbitrary. Note also that the optimal value is highly correlated with the external restraints weight parameter.|
||Specifies the location of the external restraints file generated by ProSMART, e.g. called
||Only monitor distances (i.e. identify individual restraints as potential outliers in the log file) greater than this value relative to the restraint sigma (i.e. 1000000*sigma in this example). Consequently, since the default value is 10, many unnecessary extra lines will be written to the REFMAC5 log file by default when using external restraints - this is avoided by setting the value of
Appropriate values of
EXTERNAL WEIGHT SCALE and
EXTERNAL WEIGHT GMWT will depend on many factors, including the existing geometry weight, the resolution and quality of the data and reference structure(s), the number of NCS-restrained chains, and the local similarity between the target and external structures. Different values will generally have to be tried in order to have any chance of successfully using external restraints.
For more information about external restraints, including the external restraints weight and the Geman-McClure parameter, see Nicholls et al. (2012).
Note that the "
RIDGE DISTANCE SIGMA 0.1" keyword can be used to specify for jelly-body restraints (also called harmonic restraints) to be used in regions where there are no external restraints. For more information, see Murshudov et al. (2011), and for more information about the
RIDGE keywords, see here.
||run only ProSMART ALIGN, default alignment method. This uses all mainchain atoms for superposition and scoring.|
||run only ProSMART ALIGN, alternative method 1 (faster). This uses only C-alpha atoms for superposition and scoring.|
||run only ProSMART ALIGN, alternative method 2 (slower). This uses all mainchain atoms for superposition, then uses only C-alpha atoms for scoring.|
||run only ProSMART RESTRAIN (assumes alignment exists in correct location).|
||directory for output files relative to current directory (default: |
||external text file used as an alternative way to provide arguments to the program. Arguments can be space and/or newline separated.|
||max number of threads - max number of child processes to be executed simultaneously by ProSMART. For optimal performance, this value should be equal to the number of logical cores (threads) in the CPU. Alternatively, this value may be reduced in order to free cpu resources, increasing system performance at the expense of ProSMART.|
||output a log file in XML format, called |
||output a log file in XML format, to the location specified.|
||name of REFMAC binary executable (default: |
||specifies that input structure is an NMR/MD ensemble (this functionality is experimental). Only one target PDB file should be specified, and no external reference. The first state (chain) will be used as target, and all other states as secondary references.|
||allows oligomers/complexes to be considered as individual structural units by merging multiple chains within a PDB file. This is achieved by combining all (or desired) chains into a single chain; all residue numbers are renamed accordingly.|
||location of target PDB file(s) (required).|
||location of secondary external reference PDB file(s).|
||chains of interest in PDB file(s) 1 (if unspecified, all will be used).|
||chains of interest in PDB file(s) 2 (if unspecified, all will be used).|
||specifies that the chains to be aligned are sequence-identical. More specifically, assumes that the residue numbering is the same between the two structures.|
||if only target chains are specified (i.e. |
||specify residue ranges, so that only portion of PDB files may be
used during alignment (applied during PDB file processing - simply
removes residues outside this range). The |
||specify individual residues to be removed (applied during PDB file
processing - simply removes these residues from consideration prior to
structural alignment). The |
||specifies to run alignment of all fragments in the ProSMART library
instead of pairwise alignment of two chains (any secondary PDB file(s)
specified using |
||specifies to run fragment alignment of the helix entry in the
ProSMART library instead of pairwise alignment (any secondary PDB
file(s) specified using |
||specifies to run fragment alignment of the strand entry in the
ProSMART library instead of pairwise alignment (any secondary PDB
file(s) specified using |
||provide a library configuration file, to use instead of the default |
||provide the location of a library to be used instead of the default |
||override all Procrustes score thresholds in the fragment library (warning - will be applied to all fragments).|
||override all fragment lengths in the fragment library (warning - will be applied to all fragments).|
||length of fragment, in residues (default |
||Procrustes score dissimilarity threshold. Can be used to remove regions from the alignment that are not locally conserved. By default, no threshold is used.|
||only fragments with Procrustes scores less than this value will be used for producing the global superposition.|
||helix-sharing is used to help align pairs of structures. This value is the cutoff for defining helix similarity at the chosen fragment length (note: this has nothing to do with helix alignment).|
||helix-sharing is used to help align pairs of structures. This value is the dynamic alignment penalty for pairs of helix fragments (note: this has nothing to do with helix alignment).|
||specifies to perform pure structure-based alignment, ensuring that sequence conservation has no influence.|
||fragment-pairs whose corresponding residues are all sequence-identical are assigned a fixed dissimilarity score (which may be negative) instead of the ordinary Procrustes score during alignment. This effectively forces the alignment to obey sequence conservation, for structure-pairs with very high sequence identity.|
||specifies that fragment-based alignment refinement and residue-based alignment optimisation should not be performed, taking the raw alignment resulting from dynamic programming.|
||specifies for all main chain atoms (N, CA, C and O) to contribute to side chain scores. Only CA is included by default.|
||interatomic distance threshold for the 'NumDist' score, which is the number of corresponding side chain atoms that deviate more than the threshold, after local backbone superposition.|
||specifies to not account for potential side chain flips during scoring. By default, flips are applied in order to achieve the lowest net interatomic distance after local superposition.|
||do not perform the rigid substructure identification functionality.|
||Procrustes dissimilarity score threshold; only fragments with scores below this value will be used for rigid substructure identification.|
||intrafragment rotational dissimilarity score threshold (unit: angle in degrees); only fragments with scores below this value will be used for rigid substructure identification.|
||minimum number of fragments (size of cluster) when performing rigid substructure identification.|
||single linkage clustering threshold for rigid substructure identification (unit: cosine of angle).|
||controls final cluster rigidity of identified rigid substructures (unit: cosine of angle). Increasing this value will force the superposition to agree better with fewer fragments in the centre of the cluster, rather than being based on agreement with the whole substructure. Higher values are better for superposition, although may result in rigid substructures not being identified at all.|
||scales cluster colour resolution (i.e. dissimilarity threshold) in outputted PyMOL colour scripts.|
||output cluster distance matrices, for subsequent inspection using other software (e.g. R).|
||display intrafragment rotation scores as a cosine distance ( = 1 - cos(theta) ), rather than as the default angle (degrees).|
||do not output superposed PDB files and PyMOL colour code scripts.|
||output superposed PDB files.|
||any output superposed PDB files will comprise all chains, rather than just the particular chain of interest (i.e. the one aligned and superposed).|
||output PyMOL colour code scripts.|
||adjusts the color resolution in the output PyMOL color files corresponding to main chain scores.|
||adjusts the color resolution in the output PyMOL color files corresponding to side chain scores.|
||RGB colour code for used for defining similarity.|
||RGB colour code for used for defining dissimilarity.|
||generates generic self-restraints for the primary structures, ignoring any secondary input structures. This feature is generalised and may be applied to any molecules, e.g. can be used for DNA/RNA.|
||generates restraints for all target chains, using all secondary chains as external reference structures.|
||only use the best external reference chain. For each of the target chains, only one of the external reference chains is used for restraint generation. Which chain is the best is determined by the global alignment score, which is based on net agreement of local structure, independent of global conformation.|
||by default, self-restraints are not generated (i.e. restraints for pdb1.pdb chain A will not be generated using pdb1.pdb chain A as an external reference). This parameter allows self-restraints to be generated.|
||max restraint distance - size of sphere around atom in which restraints can exist.|
||min restraint distance - restraints of length below this value are not included.|
||default sigma used to weight atomic distance restraints in external refinement (only if sigma parameter estimation is not used, or fails to find a suitable solution).|
||minimum possible value of sigma, which overrides sigma parameter estimation where appropriate.|
||possible values are |
||alignment score cutoff for restraints (default |
||"side chain average" score cutoff for restraints (default |
||value (default |
||scales sigmas so that restraints have different weighting (default |
||specifies that restraints on bonds/angles will be removed (REFMAC5 may be executed to generate list of bonded atom-pairs, if required). If bonds/angles are not removed, then default sigmas will be used, since the assumptions for estimated sigmas will be violated. Specification to use estimated sigmas will by default cause bonds/angles to be removed.|
||specifies that only restraints for main chain atoms should be generated; side chain atoms will be ignored.|
||specifies that restraints for both main chain and side chain atoms should be generated.|
||specify residue ranges, so that restraints are only generated for a portion of a structure (similar to the |
||specify individual residues to be removed (similar to the |
||Further to chain-chain restraints files and the final pdb-all restraints files, also output the intermediate pdb-chain restraints files. This is disabled by default, to save disk space.|
||don't copy final restraints files to main output dir.|
||Specifies the REFMAC5 restraint type (as specified here). Possible values: |
-his specified then general main-chain h-bond restraints, including those for helices and sheets, will be generated together. In contrast, if
-h_sheetare both specified then restraints for helices and sheets will be generated separately.
-strictkeyword is used.
||generate generic bond restraints. By default, these restraints represent h-bonds, and are generated for the whole main chain, including all helices, sheets, loops, etc. according to detected hydrogen bonding patterns.|
||generate generic bond restraints for helices. This includes all types of helices (not just alpha). Types of helix restraints can be specified using keywords |
||generate generic bond restraints across beta-sheets.|
||require strict structural conservation to helix/strand conformations in order for generic restraints to be generated for those regions. Uses fragment library to determine which regions are sufficiently helical/strand-like.|
||generate restraints for potential 3_10-helices. Specifically, requires residue separation of 3 residues.|
||generate restraints for potential alpha-helices. Specifically, requires residue separation of 4 residues.|
||generate restraints for potential pi-helices. Specifically, requires residue separation of 5 residues.|
||target value of the generic bond restraints.|
||minimum interatomic distance for restrained atom-pairs.|
||maximum interatomic distance for restrained atom-pairs.|
||minimum number of residues between restrained atom-pairs.|
||maximum number of residues between restrained atom-pairs.|
||specify to allow only specific number(s) of residues between restrained atom-pairs.|
||specify to disallow specific number(s) of residues between restrained atom-pairs.|
||controls how atom-pairs are selected, i.e. which atom-types can form bonds. Possible values: |
||overrides the default number of allowed bonds per atom. Default is 1 for N, 2 for O.|
||only for use in special cases where the target PDB file (specified with |
If alternative residue conformations are detected, only the first conformation present is used for alignment and scoring.
Alignments achieved using ProSMART are forced to maintain order of sequence.
If ProSMART is executed more than once with the same PDBs/chains,
then files generated during previous executions will be overwritten,
even if some of the command line arguments are different. If it is not
desired for files to be overwritten, then it is recommended for
different output directories to be used (this can be achieved using the
When performing more than one alignment in a single ProSMART
execution, the pairwise alignments are by default executed in parallel
as multiple jobs in order to utilise the multi-threading capabilities of
modern multi-core processors (preferences can be set using the
argument). This allows a dramatic increase in the usage of system
resources and processing power. Consequently, specifying for multiple
chain alignments to be executed concurrently is often much quicker than
performing single pairwise alignments consecutively (performance scales
approximately linearly with the number of physical cores in the cpu, and
with cpu frequency). Furthermore, if both alignment and restraint
generation are performed in the same execution, and atomic bonds are to
be generated, then the generation of atomic bonds (using REFMAC5) and
alignment of structures will be performed in parallel in order to