NMR-Observed Atomic Bond Length Stability Supports a Dimensionality Shift in Protein Main Chain 3D Structure Description and Representation
Wei Li*
Department of Pharmacology, Shantou University Medical College, Shantou City, Guangdong Province, P. R. China
*Corresponding author: Wei Li, Department of Pharmacology, Shantou University Medical College, No. 22, Xinling Road, Shantou City, Guangdong Province, P. R. China. Tel: +8615817969015; Email: liweiqidong@stu.edu.cn ; wli23@126.com
Received Date: 07 August, 2018; Accepted Date: 21 August, 2018; Published Date: 27 August, 2018
Citation: Wei Li (2018) NMR-Observed Atomic Bond Length Stability Supports a Dimensionality Shift in Protein Main Chain 3D Structure Description and Representation. Curr Res Bioorg Org Chem: CRBOC-112. DOI: 10.29011/CRBOC -112. 100012
1. Abstract
To date, the Cartesian (x, y, z) coordinate system is the default system in the Protein Data Bank to specify atomic positions in protein structures. Presented here is an alternative spherical coordinate system approach for a three-dimensional lossless deconstruction of protein main chain structures experimentally determined by NMR spectroscopy. To the default Cartesian system and a previously reported global spherical approach, this alternative local spherical approach provides a geometric description of the three-dimensional structure of protein main chains, which requires only two parameters (θ and φ), instead of the default three, i.e., x, y, z. Intrinsically a simpler approach than the default and previously reported approaches, this 2018 one induces a dimensionality shift from three to two, allowing it to find its potential application in significantly increasing the efficiency of protein structure-centered researches.
2. Keywords: Atomic Bond Length Stability; Dimensionality Shift; Frequency Distribution; Protein Main Chain Structure; Spherical Coordinate System
In 2011 and 2015, it was proposed for the first time that protein 3D structures be represented in spherical coordinates (r, φ, θ), with an aim to express all protein 3D structures deposited in the PDB in spherical coordinates [2]. Indeed, this 2011-2015 approach is a global spherical coordinate system one, where the protein geometric centroid is taken as the unique original point for all atoms in a protein molecule, resulting in two applications, i.e., the separation of the protein outer layer from its inner core, and the identification of protrusions and invaginations on the protein surface [2].
·
The
atom pair of Ni and CAi
·
The
atom pair of CAi and COi
·
The
atom pair of COi and Ni+1
As a result of the three types of directly bonded atom pairs and the three spherical parameters, there are twelve variables in total, as listed below in (Table 1).
As shown by the four frequency distribution plots in (Figure 1), the atomic bond lengths (as experimentally determined by NMR spectroscopy) of the three types of atomic pairs appear rather stable, all with sharp peaks centering at the average r values listed in Table 1. In light of this experimentally observed atomic bond length stability, a dimensionality shifts from three to two arises from this 2018 approach, which effectively allows a two-parameter (only φ and θ, instead of the xyz coordinates) geometric description of the three-dimensional structure of protein main chains. Intrinsically a simpler approach, this 2018 one causes a dimensionality reduction, which can be of potential significance for a wide range of protein structure-centered research fields.
With the results
of statistical analysis in place, I present below the result from a set of
statistical modeling of the protein main chain geometry as determined by
(mainly solution-state) NMR spectroscopy. In Table 1,
a set of equations are listed to model the frequency distributions of θ and φ. From a close visual inspection of (Figure 2) and
a quantitative analysis, the frequency distribution of θ exhibits a largely symmetric
parabolic pattern, with its value ranging from 0 to π, the distribution frequency of θ reaches
its peak when θ
≈0.5 π, i.e., the local
vector starting from the original atom j to the ending atom j+1 is
perpendicular to the xy plane, either parallel (upwards) or
anti-parallel (downwards) to the z axis.
From a visual inspection of Figure 3, the frequency distribution of φ appears largely random, with its value ranging from -π to π, i.e., for the local vector which starts from the original atom j to the ending atom j+1, its projection in the xy plane is largely random.
After the data fitting process as shown in (Figures 2 and 3), a set of Chi-square tests (the goodness-of-fit test) were conducted to examine the fitness between the observed distribution and the expected distribution. In the Chi-square tests, the p-values were found to be 1.0 for all mathematical models listed in Table 1, i.e., the observed distributions of the two spherical parameters fit to the expected distributions as described using the mathematical equations in Table 1, and statistically the fitness is totally acceptable for both θ and φ.
5. Conclusion
To the default Cartesian coordinate system of the PDB database and the 2011-2015 global approach, this article proposes an alternative local spherical coordinate system approach to describe protein main chain geometry with only two parameters (θ and φ), resulting in a dimensionality shift from three to two and consequently a viable and simpler approach for protein main chain structure description and representation. Intrinsically a simpler approach with this dimensionality reduction, this 2018 one can find its potential application in increasing the efficiency of protein structure-centered research fields, such as protein structure alignment [3], comparison [4,5] and molecular dynamics simulations.
6. Discussion
In light of the basic chemistry dictations that the distances between directly bonded atoms should be relatively constant throughout the protein structure, it is conceivable that this 2018 approach be applied to describe and map protein side chain 3D structure, too, provided that the atomic bonding pattern is clearly and sequentially predefined for the side chain3D structure of each and every amino acid residue, as is done here with the linear atom sequence mentioned above for protein main chain 3D structure description and representation.
In the four histograms in Figure 1, however, there are a range of outlying r values, according to the computational analysis of the protein main chain atomic bond length distributions. For example, in PDB file 1bhb.pdb, between Val34 and Pro37, there is a gap of two amino acid residues, whose atomic coordinates information is missing. As a result, the r value for the atom pair CO34-N37 was calculated to be 24.876 Å. As another example, in PDB file 2rop.pdb, between Leu90 and Thr120, there is a gap of 29 amino acid residues, whose atomic coordinates information is missing, too. As a result, the r value for the atom pair CO90-N120 was calculated to be 50.102 Å. Both examples arise from the gaps in the experimentally determined protein structures, calling for continued comprehensive (instead of fragmented [6] or with gaps) experimental structure determination for the proteios building block of life.
7. Conflict of Interest and Ethical Statement
Regarding the publication of this article, the author declares that there is no conflict of interests, no ethical approval is required.
Figure
1:
Frequency distribution of r for the protein
main chain structures. In this histogram, the number of bins is 100. In this
figure, the parameter along with its unit for x-axis is defined in Table 1.
Figure
2:
Statistical models (blue) and frequency distributions (red) of θ. In this histogram, the number of bins is
100. In this figure, the parameter along with its unit for x-axis is defined in Table 1.
Figure
3:
Statistical model (blue) and frequency distribution (red) of φ. In this histogram, the number of bins is 100. In
this figure, the parameter along with its unit for x-axis is defined in Table 1.
Table 1: A list of figures and subfigures that illustrate the
statistics of the three spherical coordinate system parameters. For the 191490
PDB files, this table summarizes the results of the statistical analysis and
modeling of protein main chain geometry as experimentally determined by NMR
spectroscopy, including the average values and the standard deviations of the
twelve variables, and twelve equations (representing the frequency
distributions) from the data fitting processes for the twelve variables.
© by the Authors & Gavin Publishers. This is an Open Access Journal Article Published Under Attribution-Share Alike CC BY-SA: Creative Commons Attribution-Share Alike 4.0 International License. Read More About Open Access Policy.