Introduction
Instance mapping procedure
How to query phospho3D
Structural comparison
P3Dscan
Structural descriptors
Notice on MS data
Data and annotation sources
Minimum system requirements
Using Jmol


The phosphorylation of specific protein residues is a crucial event in the regulation of several cellular processes, acting on activation, deactivation or recognition of the target protein. A great deal of eukaryotic cell proteins (30 up to 50% of the total) undergo this post-translational modification.

The recent improvement in the experimental identification of phosphoproteins and phosphoresidues has increased dramatically the amount of phosphorylation sites data and the need of computational tools for collecting and analysing these data has grown accordingly.

In past years several sequence-based methods to predict phosphorylation sites were developed using different approaches such as regular expressions with context-based rules, Position-Specific Scoring Matrices (PSSMs) and artificial neural networks.

The structural basis and the determinants of phosphorylation specificity are often unclear.

A possible hypothesis that would help explaining the problems encountered so far in unraveling the rules of kinase specificity resides in the presence of structural determinants that only sometimes overlap with sequence consensi and that might be independent of the residue order in protein sequences. To explore the cognitive power of this hypothesis, we have developed a procedure for the annotation and analysis of the three-dimensional structure of experimentally verified protein phosphorylation sites, also called instances, retrieved from the phospho.ELM database.


Back to top

The similarity search program BLASTp is used to search all the Phospho.ELM sequences against the complete PDB database. For each sequence, hits are retained if they display at least 98% sequence identity in non-gapped regions, 100% identity at the P-site, at least 30 residues are aligned, no more than 15% of the gaps are present in the alignment, and the E-value is <10-6.

For each phosphorylation site, or instance, mapped onto PDB chains, a structural neighbourhood, that we define 3D zone, was defined by a distance criterion (residues in protein the structure falling at a distance < 10 A from the P-site).


Back to top

Phospho3D can be searched by PDB code, PDB keyword, by kinase name or UniprotKB accession.

Queries can be performed by requiring to return either all the P-sites in the database or a non-redundant subset of them. Non-redundant sets have been obtained using the PISCES resource. The supplied non-redundant PDB sets are PDB100 (non identical structures), PDB90 (less than 90% sequence identity among the proteins of the set), PDB70, PDB50, PDB30 and PDB20.

The information returned to the user consists of a brief description of the structure extracted from the following PDB fields: code, title, keywords, release date, resolution and experimental technique.

In addition, a list of instances (i.e. mapped phosphorylation sites) is provided. For each of them, the following information is supplied: chain identifier, residue number (both on structure and sequence, since the sequence numbering in PDB often does not correspond to the one used in UniprotKB), the instance flanking sequence, the kinase(s) responsible for phosphorylation (when known) and the PubMed ID of the publication supporting the experimental evidence.
In grey, under the header redundancy, we also report the list of non-redundancy group(s) for which the P-site PDB chain is a representative.

The user is given three viewing options:

3D view: allows to visualise the three-dimensional structure of the given phosphorylation zone via the Jmol viewer. The instance is depicted in orange, the flaking sequence in blue and the neighbour residues in white.
Tabular view: the user can retrieve the annotation at residue level for the given zone. For each zone residue the structural descriptors listed in the Structural descriptors section (see below) are presented.
Comparison: the results of a large-scale local structural comparison for each zone can be retrieved. The comparison was performed against a representative dataset of PDB chains. Each match can be visualised using Jmol viewer.


Back to top

Local structural comparison was carried out using a new version of Query3D, a sequence/fold independent algorithm [1]. Structural matches are assessed by residue similarity and root mean square deviation (rmsd).

The large-scale local structural comparison results stored in the phospho3D database were obtained using a rmsd value of 0.7 A, and running the Query3D algorithm between the whole set of phosphorylation zones collected in the database and a representative set (sequence identity ≤ 20%) of 490 PDB X-ray protein chains with experimental resolution ≤ 1.5 Å, extracted from eukaryotic organisms. The representative set of PDB chains was obtained using first the advanced search at RCSB for selecting structures from eukaryotic organisms and, then, applying the PISCES resource for reducing redundancy and discarding non X-ray structures.

In Phospho3D version 2.0 the thresholds used both in the comparison procedure and in the selection of the representative set of structures are more stringent because of the extremely high number of PDB strutures available.


Back to top

P3Dscan is a function that allows the user to upload a PDB-formatted structure and perform a local structural comparison against the 4058 3D zones (one for each Phospho.ELM mapped instance) stored in the database, aimed at identifying local structural similarities between the user query structure and P-site 3D neighborhoods. The comparison algorithm – that P3Dscan runs on-the-fly - is the same as the one described in the structural comparison section and used for the large-scale comparison whose results are stored in the database, even though with more stringent rmsd and score threshold.

In fact, in order to reduce the computational time and to display only significant 3D matches, the P3Dscan comparison algorithm rmsd and score thresholds have been set to 0.5 and 4, respectively. This means that matches with less than four matching residue pairs and whose 3D superimposition dispalys an rmsd greater than 0.5 Angstrom, are rejected.

The user has to upload a PDB-formatted file and run the comparison by clicking the ‘p3d scan’ button. Due to the large number of 3D zones used as target in the comparison, the search can take up to a minute per protein chain. The resulting matches are displayed in tabular format. Each line of the output table reports the information of a single match.

The legend of the P3Dscan output table is as follows:

match ID: an integer number specific of each match.
probe chain ID: the chain identifier of the user uploaded structure chain where the match is found.
target: the 3D zone where the match with the user query structure is found. p3dID is the Phospho3D instance identifier.
target Psite: residue type and number of the P-site belonging to the matching 3D zone.
rmsd: rmsd of the matching residue 3D superimposition.
score: number of matching residue pairs.
pairs of matching residues: type and number of matching residues.

A match can contain or not the P-site of the matching 3D zone and this information is reported as orange colored lines in the output table. The comparison results are also provided in text format for download. The user can choose if downloading all the structural matches or only those containing the 3D zone P-site.


Back to top

Structure code: PDB code.
Chain identifier: PDB chain identifier.
Position in the structure: residue number in the chain.
Residue type: three letter code amino acid.
Secondary structure: secondary structure has been assigned according to the DSSP nomenclature [2]: H=”alpha helix”, B=”residue in isolated beta-bridge”, E=”extended strand, participates in beta ladder”, G=”3-helix (3/10 helix)”, I=”5 helix (pi helix)”, T=”hydrogen bonded turn” and S=”bend”.
Absolute solvent accessibility: DSSP solvent accessibility.
Percentage solvent accessibility: 100x(DSSP solvent accessibility)/(maximum solvent accessibility as calculated in
[3]).
B-factor: The temperature factor is taken from the PDB file for each atom in a residue, and then averaged over the whole residue. It is reported as mean ± standard error of the mean.
B-factor standardized: obtained substracting to the B-factor the mean of the B-factors in a chain (except the highest and lowest values) and dividing the result by the standard deviation.
Cavity: The program SURFNET
[4] generates internal cavities and surface grooves within a PDB structure. We report (1 = biggest cavity) the rank and the volume of the cavity where the residue is found and the total number of cavities with volume > 200 Å3 in the structure. Notice that, even if only one atom of a residue appears in a cavity, the residue is considered to be part of the cavity.
Protrusion index (CX) : CX is the ratio between external (volume of empty space) and internal (volume occupied by protein) volumes of a sphere of predetermined radius (10 Å) centered around each non-hydrogen atom. Atoms in protruding regions have a high ratio (CX) between the external and the internal volume
[5].
Depth index (DPX) : depth of an atom i is defined as its distance (Å) from the closest solvent accessible atom j. DPX is ≈0 for solvent accessible atoms and >0 for atoms buried in the protein interior
[6].

Protrusion and depth indexes are determined using the PSAIA package, a software tool that integrates several algorithms for protein interactions and structure geometry analysis of protein complexes in a single application [7].

Conservation score: The evolutionary conservation score are derived from CONSURFDB, a repository of pre-calculated protein evolutionary conservation profiles [8] based on the ConSurf application [9].
Disorder probabilities: DisEMBL
[10] is used to predict protein disorder. For each residue, the disorder probability can be studied according to 3 different criteria. LOOPS/COILS: T, S, B or I (as defined by DSSP) are considered as loops (coils). Loops/coils are not necessarily disordered, but protein disorder is generally found within loops. HOT LOOPS: Those are loops with a high degree of mobility as determined from temperature (B-)factors. REMARK-465: Missing coordinates in X-Ray structure as defined by REMARK-465 entries lines in PDB. Non-assigned electron densities most often reflect intrinsic disorder.


Back to top

Notice that P-sites derived from MS (mass spectrometry) experiments should be regarded with reservation. In fact, due to the current procedures for MS data deposition, it is difficult to systematically detect if a phospho-instance was identified in physiologically abnormal conditions (e.g. in proteins extracted from oncogenic tissues or that do not undergo phosphorylation, such as hemoglobin) [11]. In order to help users detect such potentially problematic cases, we reported – for each P-site – the nature of the original experiment (Low- or High-Throughput) and the corresponding literature reference (PMID).
Moreover, we encourage users to carefully analyse the structural context of P-sites, which might be indicative of non-reliable original data. One example is represented by the Tyr phosphorylation site mapped to position 133 of the human hemoglobin subunit beta (UniProtKB:P68871), for which Phospho3D stores 43 PDB structures. In most of the reported structures, the solvent accessibility of Y133 is zero and it is never greater than 3.5%. This structural information suggests that the original data mapping might not be reliable.


Back to top

Phospho.ELM: Version 9.0, August 2010
SCOP: version 1.75
CATH: version 3.3.0
PDB: Aug 2010 update
ConsurfDB: Jul 2009 update


Back to top

CPU: 600 Mhz
Memory: 512 Mb RAM
Operating System: Windows (XP,Vista), Mac OSX (10.4 or greater), Linux
Tested browsers: Firefox 2.0 (Linux), Firefox 3.* (Windows Vista, MacOS X, Linux), Internet Explorer 8 (Windows Vista), Safari 5 (MacOS X), Google Chrome (Windows Vista)
Java: Version > 1.4


Back to top

Jmol is a free, open source molecule viewer for students, educators, and researchers in chemistry and biochemistry. It is cross-platform, running on Windows, Mac OS X, and Linux/Unix systems.

Basic commands using Jmol on phospho3D database:

Mac OS X, Linux and Windows
Rotation: press and hold the mouse left button and move the mouse.
Translation: press and hold the 'ctrl' button plus the mouse right button and move the mouse.
Zoom: press and hold the 'shift' button and move up/down the mouse or use the mouse wheel.

For more documentation and updated information, check the Jmol website.


Back to top

[1] Gherardini PF, Ausiello G, Helmer-Citterich M. PLoS ONE 2010 5(8): e11988.
[2] Kabsch W, Sander C. Biopolymers. 1983; 22(12): 2577-637.
[3] Miller S, Janin J, Lesk AM, Chothia C. J Mol Biol. 1987; 196(3): 641-56.
[4] Laskowski RA. J Mol Graph. 1995 Oct;13(5):323-30, 307-8
[5] Pintar A, Carugo O, Pongor S. Bioinformatics. 2002 Jul;18(7):980-4.
[6] Pintar A, Carugo O, Pongor S. Bioinformatics. 2003 Jan 22;19(2):313-4
[7] Mihel J, Siki M, Tomi S, Jeren B, Vlahovicek K. BMC Struct Biol. 2008 Apr 9;8:21.
[8] Goldenberg O, Erez E, Nimrod G, Ben-Tal N. Nucleic Acids Res. 2009 Jan;37(Database issue):D323-7.
[9] Armon A, Graur D, Ben-Tal N. J Mol Biol. 2001 Mar 16;307(1):447-63
[10] Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB. Structure. 2003 Nov;11(11):1453-9.
[11] Nichols AM, White FM. Manual validation of peptide sequence and sites of tyrosine phosphorylation from MS/MS spectra. Methods Mol Biol. 2009;492:143-60.


Back to top

September 2010