CS-ROSETTA web server user manual
The user manual provides a step-by-step explanation of all settings available at every step of the 3D structure generation process. Example projects ready to be downloaded as well as pre-loaded web forms are used to better explain several of the core functions.
Table of Contents:
General introduction

What is CS ROSETTA

The CS-ROSETTA webserver makes the generation of 3D models of monomeric proteins accessible to the larger scientific community. The advantage of the CS-ROSETTA protocol is, that it only requires the 13CA, 13CB, 13C', 15N, 1HA and 1HN NMR chemical shifts as input for the structural calculations. The power of this webserver lies in its connection to the eNMR grid; calculation which take months on a single CPU are finished in several days.
The generation of the models can be devided in three steps:

  1. Generation of a fragment library
  2. Assembly of the protein
  3. Rescoring of the generated models

Generation of fragment library:
Based on the chemical shifts, CS-ROSETTA uses a SPARTA-based selection procedure to select fragments from a fragment library (Molecular Fragment Replacement - MFR). This library consists of fragments of which the chemical shifts as well as the torsion angles are known. The selected fragments form the building blocks in the subsequent ROSETTA assembly step.

Assembly of the protein:
The assembly of the protein utilizes a regular ROSETTA Monte Carlo assembly and relaxation procedure. To be able to make a reliable prediction of the 3D structure of the query protein, 10000-50000 models are generated for each query protein. Note that the starting package is the same for each model. This step takes the most time and is run on the eNMR grid.

Rescoring of the generated models:
The assembled models are re-evaluated by adding a chemical shift term to the all atom energy score. The chemical shift term is basically a term which compares the original chemical shifts with the backcalculated chemical shifts of a generated model (using SPARTA). Finally, the "new" lowest energy model is compared to the other generated models. If the difference between the models and the all atom score shows convergence the lowest energy model is chosen.


Figure 1: A flow chart of the CS-ROSETTA protocol.


How do I request a username and password
There are two certificates necessary to use the CS-ROSETTA webserver; firstly an eNMR certificate and secondly a CS-ROSETTA webserver specific certificate (username and password). CS-ROSETTA requires extensive computing power and is therefore connected to the eNMR grid. To request an eNMR certificate, please go to http://www.enmr.eu/eNMR-registration and select your country. After uploading the eNMR certificate to your browser, it is possible to request a CS-ROSETTA webserver certificate. Fill in the webform at http://haddock.science.uu.nl/enmr/signup-csr-grid.html and your username and password will be emailed to you.
Submitting to the CS-ROSETTA3 web portal
You can find the CS-ROSETTA web portal through the WeNMR website:
http://www.wenmr.eu/wenmr/structure-calculation-software/
Or directly at: http://haddock.science.uu.nl/enmr/services/CS-ROSETTA3/

The current webserver requires seven pieces of information; a name for your run, selection of data format, chemical shift list, number of models to be generated, optional removal of flexible ends. Finally there are two options for rescoring your models: the standard CS-rescoring, and DP rescoring.
You sign your submission by supplying your personal username and password.
Give your run a name: The runname will be used as the name for the folder in which you can find the results of the calculation. Only alpha-numeric characters and the "-" and "_" can be used. Furthermore, the runname has a maximum length of 20 characters.
File format: Currently there are three file formats supported: TALOS, BMRB2.1 and BMRB3.1. The talos format is explained here. The bmrb formats can be found at !!!!
The TALOS format is defined at http://spin.niddk.nih.gov/NMRPipe/talos/#preparing shifts.
  • The file can only contain 13CA, 13CB, 13C', 15N, 1HA and 1HN shifts, named CA, CB, C, N, HA and HN respectively.
  • The HA of glycine form an exeption, they are named HA2 and HA3 (assigned arbitrarily, see example below).)
  • The protein sequence starts with DATA SEQUENCE, and space characters are ignored. The sequence can be devided over several lines as long as each line starts with DATA SEQUENCE
  • The file must contain a VARS line with the name of the columns and a FORMAT line with contains the format of the rows.

Below you see an example of the talos format:


NOTE WELL!
  1. Missing chemical shift data are allowed, but the amino acid sequence shown in the header MUST be the full sequence of the protein and MUST start from residue #1, CS ROSETTA will generate protein structures with the sequence defined in the header.
  2. The chemical shifts have to be properly referenced. To make sure that your referencing is correct, please go to How do I make sure that me referencing is correct?
  3. Any tags MUST be excluded, because they do not belong to the structure of the protein and do possibly influence it (also change the numbering and the sequence).
Number of Models: In this field you can fill out the number of models you want to calculate. The maximum number of models a user can generate, is determined by the level of access you have.

Level 1: 15000 models (default)
Level 2: 30000 models (upon request)
Level 3: 50000 models (upon request)


"By using the current method implemented in CS-ROSETTA package, 5,000 to 20,000 predicted CS-ROSETTA models are generally required to obtain convergence. For small proteins - proteins smaller than 80 aminoacids - 1,000 to 5,000 CS-ROSETTA models often suffice. ROSETTA takes about 5-10 minutes to calculate one all-atom model on a single 2.4GHz CPU." ( from http://spin.niddk.nih.gov/bax/software/CSROSETTA/index.html).
Truncate flexible ends: The truncate option uses the prediction of TALOS+ ( Shen et al., 2009 ) to determine whether the protein has flexible ends. We recommend you to do this step manually, due to the importance of this step.
Example of Input
The example of this tutorial is UvrC, a DNA binding protein of 42 aminoacids. Previous studies have shown that the first X and the last C aminoacids are flexible and therefore they were excluded beforehand. The user requested 20000 models. There are no similar proteins in the reference database, and truncation is, due to the manual intervention, not necessary. The input file can be seen here. Figure 2 shows a snapshot of the filled out webform.


Figure 2: A snapshot of the filled-out webform for UvrC.
Step 2: Calculations
After succesfully submitting the job UvrC, the user receives an email which contains a jobspecific link. This link can be used to firstly see the status of the job and finally to retrieve the results. The input file is validated and saved as talos.tab

The first step in the procedure is the generation of the fragment library. The user recieves an email after initialisation of this process. NOTE: This process might fail due to referencing problems. To solve referencing problems, go to How do I make sure that me referencing is correct?
The second step is the assembly step on the GRID. The number of generated models can be followed by checking your job specific link. Here it will show "RUNNING on the GRID - generating models with rosetta " and the number of structures generated so far.
In the final step the models are rescored and the output is prepared. The user will recieve an email which the jobspecific link after the output is finalized. The jobspecific link will now direct to a page where the output can be retrieved (see example).
Step 3: Analysis of the results
How to select CS-ROSETTA models:

After finishing CS-ROSETTA structure generation, the user has to decide whether the generated ROSETTA models are acceptable. For this purpose, it is convenient to plot the (re-scored) ROSETTA full-atom energies of all models vs the CA RMSD values relative to the lowest-(rescored)-energy model, using the data stored in the files "name.rms.rescore.txt" and "name.rms.orgener.txt". The plots are saved as "name.rms.rescore.png" and "name.rms.rescore.png"

1. If the 10 lowest energy models all differ by less than 2 angstrom CA RMSD from the model with the lowest (re-scored) energy (see plot of protein GB3 below, from http://spin.niddk.nih.gov/bax/software/CSROSETTA/index.html ), the structure prediction is deemed successful and the 10 lowest energy models are accepted.
NOTE: There is accumulation of error with increase of proteinsize. For small proteins, the 2A limit should be applied strictly, but for a 120 AA protein, this limit is less strict.

2. If no clustering around low energy models is observed (see plot of protein nsp1 below, from from http://spin.niddk.nih.gov/bax/software/CSROSETTA/index.html ), the structure prediction has not converged and the low energy models can not be accepted at this stage.

The resultpage of UvrC can be found below. All result are zipped together in outdir.tar.gz. In this file the following files can be found:
  • The input file after validation and possibly truncation; talos.tab
  • The selected fragments for MFR: aat000_03_05.200_v1_3 and aat000_09_05.200_v1_3
  • The outputfile of the ROSETTA step, final.out
  • The top 100 rescored models; named S_****_**.pdb
  • The Chemical Shift chi2 score: CS_chi2.txt
  • Files with the name, RMSD and respective energy( raw or rescored): name.rms.rawscore.txt and name.rms.rescore.txt
  • Other files, which contains the same information as the files mentioned above, but in a different combination. These files are useful for plotting the different variables.

Results

Below you find the output for 2png (E. Ploskon et al., 2008 J.Biol.Chem. 283: 518-528). The convergence for this model is very succesful, especially the ROSETTA score vs the CA RMSD from the lowest energy model - S_3044_14 - shows a very nice funnel. This ensures that the lowest energy models are very similar. Which in its turn makes the model generation very robust for this protein. If the lowest ten models (energyscore) for both the ROSETTA score and the rescore energies are compared to model depostited in the pdb database, the follwing results are obtained.

Rescored Original
Name RMSD from
NMR structure
Name RMSD from
NMR structure
S_3044_14 2.15 S_3044_14 2.15
S_1034_16 2.29 S_2513_16 2.17
S_2513_16 2.17 S_1034_16 2.29
S_1989_01 2.34 S_1989_01 2.34
S_2930_19 2.33 S_3311_17 2.36
S_3361_11 2.37 S_3361_11 2.37
S_2714_20 2.38 S_1387_02 2.54
S_3311_17 2.36 S_3109_14 2.24
S_1387_02 2.54 S_2714_20 2.38
S_1457_01 2.4 S_2930_19 2.33
Average 2.33 Average: 2.32
Table 1: RMSD from NMR-solved structure

Table one shows that the models are very close to the NMR model, so the protocol showed itself also reliable for this protein. The rescoring step did not make much diffence in the selection of models.
CS ROSETTA status for 2png
2017-11-19 15:12:37
The current status of your request is: FINISHED
Your CS ROSETTA run has successfully completed. The complete run can be downloaded as a gzipped tar file here
Results
In total there are 51861 models generated
(requested: 50000).

BEST MODEL:
Rescored Energy
-152.78
BEST MODEL:
Original Energy
-154.83 (1)

The first table shows the top five models after rescoring, the rescored energy, the original rank, the Chi score and the difference from the best rescored model (RMSD).
The second table shows the top five models before rescoring, their energy term, the rescored rank, Chi score and the difference from the best original score model (RMSD).
View the ten models in a Jmol structure viewer.
Your browser must be Java enabled:
Rescore
Name Score Orig Rank Chi RMSD
S_3044_14 -152.78 1 8.20 0.00 View Download
S_1034_16 -151.93 3 7.43 0.87 View Download
S_2513_16 -151.51 2 10.30 0.75 View Download
S_1989_01 -151.45 4 9.09 0.53 View Download
S_2930_19 -151.31 10 5.67 0.63 View Download


Original
Name Score Resc Rank Chi RMSD
S_3044_14 -154.83 1 8.20 0.00 View Download
S_2513_16 -154.09 3 10.30 0.75 View Download
S_1034_16 -153.79 2 7.43 0.87 View Download
S_1989_01 -153.72 4 9.09 0.53 View Download
S_3311_17 -153.22 8 9.40 0.59 View Download


Frequently asked questions
How do I make sure that my referencing is correct:
If you are not sure whether you have used the correct referencing, please convert your input file to BMRB format. The RCI server can rereference your chemical shifts. Go to http://wishart.biology.ualberta.ca/rci/cgi-bin/rci_cgi_1_e.py and upload your bmrb file.
In the advanced options select "6) Correct referencing of chemical shifts - Yes"

In the following screen select: Other files - Rereferenced chemical shifts

In the file you will notice the offset of the chemical shift. Safe this file and covert it to talos format, and submit to the CS ROSETTA webserver.