You are on page 1of 7

Chemical Physics Letters 625 (2015) 9197

Contents lists available at ScienceDirect

Chemical Physics Letters


journal homepage: www.elsevier.com/locate/cplett

Big data reduction by tting mathematical functions


A search for appropriate functions to t Ramachandran surfaces
Anita Rgyanszki a , Klra Z. Gerlei a , Attila Surnyi a , Andrs Kelemen b ,
Svend J. Knak Jensen c, , Imre G. Csizmadia a,d , Bla Viskolcz a
a
Department of Chemical Informatics, University of Szeged, Boldogasszony sgt. 6., H-6725 Szeged, Hungary
b
Department of Applied Informatics, University of Szeged, Boldogasszony sgt. 6., H-6725 Szeged, Hungary
c
Department of Chemistry, Aarhus University, Langelandsgade 140, DK-8000 Aarhus C, Denmark
d
Department of Chemistry, University of Toronto, 80 St. George Street, M5S 3H6 Toronto, Ontario, Canada

a r t i c l e i n f o a b s t r a c t

Article history: The potential energy surface associated with internal rotation of a pair of geminal functional groups was
Received 11 December 2014 studied using electron structure calculations. The functional groups were attached to a methylene car-
In nal form 17 February 2015 bon and were chosen as saturated hydrocarbons, unsaturated hydrocarbons and heteroatom containing
Available online 25 February 2015
moieties like amide bonds in various orientations. For the majority of the studied compounds extended
Fourier expansions, augmented with Gaussian functions were needed to achieve accuracy within a few
kJ/mol. The present letter aims to take the rst steps of a bottom up solution for protein folding by nding
the functions of small peptide residues.
2015 Elsevier B.V. All rights reserved.

1. Introduction same vicinity (52.3 ) [1] thus it is still regarded as ideal. The two
methyl rotations in propane also produced an ideal surface where
Protein folding is one of the biggest conundrums of our century. 3 3 = 9 minima occurred regularly at 60 , +60 and 180 values
One of the reasons that this problem has not been solved yet is of the dihedral angles. In n-butane the dihedral angles associated
due to the big data associated with it. Reducing this data set is a with the rotation about the central C C bond were 69.5 , +69.5
possible approach toward simplication. In order to make these and 180 . On the basis of this it would have been easy to assume
simplications, we rst have to deeply understand mathematical that this is the general rule for conformational analysis.
properties of conformational spaces. In 1963 Ramachandran [2] attempted to perform conforma-
It has not been explored as yet how the complexity of the tional analysis on simple peptides and the pattern for such a double
function must change with increasing complexity of the potential rotor CONH CH2 CONH turned out to be quite different from
energy surface. the propane surface. More recently it turned out that even a simple
Conformational analysis of organic molecules has been initi- compound as n-pentane behaved in non-ideal fashion [3]. Conse-
ated before the middle of the 20th century. Simple hydrocarbons quently, it now appears to be reasonable to assume that surface
like ethane (H3 C CH3 ), propane (CH3 CH2 CH3 ) and n-butane topology is a function of the rotating moieties.
(CH3 CH2 CH2 CH3 ) were the basic molecules that exhibited To establish the topology of the surface a set of grid points
such structural characteristics that provided the basis of confor- need to be computed. The locations of the minima on the surface
mational analysis. Methyl rotations ( C CH3 ) and ethyl rotations represented by the grid points will lead to the topological image
( CH2 CH2 ) were very similar in a variety of compounds as illus- of the potential energy surface (PES). Fitting analytic functions to
trated in Figure 1. the grid points is a mathematical technique that has already been
The minima for methyl rotations occurred of 60 , +60 and developed. Actually, any complete set of function (power series
180 for ethane, propane and for the anti (a) orientation of n- [4], trigonometric or Gaussian) may be used to t a function that
butane. These are considered ideal values. For the gauche (g+ or describes a potential energy curve or surface. Power series can be
g) orientation in n-butane, the methyl rotation remained in the used successfully to t PES in the reaction subspace [5] while in the
conformational subspace trigonometric functions are favoured [1].
Fitting mathematical functions to potential energy curves was
Corresponding author. investigated by several authors [610]. Pople and his co-workers
E-mail address: kemskj@chem.au.dk (S.J. Knak Jensen). attempted to use very simple trigonometric functions to achieve

http://dx.doi.org/10.1016/j.cplett.2015.02.031
0009-2614/ 2015 Elsevier B.V. All rights reserved.
92 A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197

Figure 1. A: Modication of conformational potential energy curve by increasing degree of substitution. B: Conformational potential energy surface (PES) of propane.

H H
General structure of geminally
substituted methane.
C
R R

H H


H C H
I
C C

H H H H Propane

H H H H H H


C C C
II H C C H n-Pentane

H H H H

O H H H


H C C N H
III C N C C N-Acetyl-Glycine
N-methylamide
H H H O H H

O H3 C H H
N-Acetyl-Alanine
N-methylamide
H C C N H
IV
C N C C

H H H O H H

H H H H
H
H
C C
V N-Acetyl-Valine
H
O C H H N-methylamide


H C C N H
C N C C

H H H H

Scheme 1. Molecules studied as double rotors ( and ) to generate the potential energy surface. The side chain orientation is measured by .
A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197 93

a reasonable accuracy. In 1988, Chung has pointed out that full


Fourier expansion is needed [11]. In the meantime Peterson at al. [1]
started to use an extended Fourier series in which Gaussian func-
tions were augmenting the trigonometric expansion. Work has also
continued using rather large multi-term Fourier expansion [12].
A variety of molecular systems were also tted by quantitative
analytic functions [1318]. Such surfaces may be used for qualita-
tive visualization. This is usually accomplished by 2D-contour or
pseudo 3D plots. These were historically demonstrated in several
cases such as CH3 [19], CH2 S(O)H [20] and CH2 S(O2 )H [21].
This process was expanded, rst for one dimensional cases [22],
and the present letter deals with two dimensional problems.

2. Methods

The choices of basis set and level of theory are crucial. How-
ever, we need to compromise our choice so that we can extend our
current study to larger peptides in the future. Energies, E, associ-
ated with internal rotation were calculated quantum mechanically
using the B3LYP/6-31G(d) implementation of the density functional
theory in the Gaussian 09 [23,24] software package. The choice of
B3LYP/6-31G(d) was inspired by the nding that it happened to
give results in good agreement with results obtained using a much
higher level of theory [25]. The calculations were carried out for
a range of dihedral angle values,  and , of the ve molecules
(Scheme 1) in the interval [180,180] with grid points at 15 incre-
ments in order to generate the surface. This required 25 points along
each of the two independent variables, thus a total of 252 = 625 grid
points had to be computed for given 2D surface. Figure 2 denes
the nomenclature of the idealized topology of 2D conformational
potential energy surface and the Ramachandran map.
A LevenbergMarquardt algorithm [26], a nonlinear least square
method with a local minimizer, was performed to t the functions
with two independent variables,  and , to the surface data. The
ts were carried out using the MATLAB [27] software. The goodness
of the t was monitored by the calculated R2 and the RMSE values.
Details of the formalism are found in the Supplementary Material
section (Methodology 1) where RMSE shows the differences of the
calculated grid points and the tted function.
We would like to note that our approach to study PES for pep-
tides has proved useful for several other systems [5,28,29].

3. Current scope and future prospective

Finding the right methodologies for data set reduction is an


arduous work. When proposing such methodologies it is reason-
able to start with the description of small compounds, and aim for
a bottom up solution. The long term goal would require extensive
Scheme 2. A schematic ow-chart showing the phases an extensive research from
research as illustrated by Scheme 2. the optimization of tted functions to the use of the results in a force eld descrip-
Mathematical software, such as Matlab, will make the least tion.
square t of a given function to the grid points. However, it is up
to the researcher to choose the explicit form of the mathemati-
cal function. The purpose of the present letter is to explore how
the complexity of the mathematical function increases with the For example, using 15 increments for an oligiopeptide with
complexity of molecular structure. One needs to keep in mind two 10 amino acids leads to 25 points along each of and angles
criteria: which requires something like 1028 grid points due to the following
relationship [30] (1):
(i) The tting should achieve near chemical accuracy or values N = 252n = 625n = 62510 1028 (1)
somewhat better.
(ii) The tted function should have as few parameters as possible The molecules that were studied are shown in Scheme 1. Of the
in order to achieve big data reduction effectively. ve compounds the rst two (I and II) have saturated rotating moi-
eties attached to the central CH2 skeleton. The second set (IIIV)
In the case of a single amino acid diamide the needed 625 grid contains amide bonds ( CONH ) with three different orientations
points are easily manageable. However, the problem grows expo- in the rotating moieties. Compounds, IIIV have dissymmetrical
nentially with the increasing size of the peptide molecule. structure: which is actually glycine diamide CONH CH2 CONH ,
94 A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197

Figure 2. Schematic topology of conformational PES map showing the 360 to 360 range of dihedral angles of double rotors. A: Hydrocarbons, B: Peptides (Ramachandran
Map).

alanine diamide CONH CH Me CONH and valine diamide complexity in appearance may lead to increasing complexity of the
CONH CH Me2 CONH . mathematical function to be tted.
Clearly in the rst two compounds (I and II) the rotating moi-
eties connected to the central methylene carbon are of tetrahedral 4. Results and discussion
moieties. In contrast, the last three compounds (IIIV) have at,
trigonal planar, rotating moieties with heteroatoms. The complexity of the mathematical function that describes the
These set of ve compounds (IV) represent increasing com- PES is expected to depend on the complexity of the corresponding
plexity of the potential energy surface. Of course the increasing molecular structure.

Figure 3. Potential energy surfaces for the compounds I and II in Scheme 1 calculated for the gas phase at the level B3LYP/6-31G(d).
A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197 95

Figure 4. Fully relaxed potential energy surfaces for the compounds IIIV in Scheme 1 calculated for the gas phase at the level B3LYP/6-31G(d).

Each surface had a minimal set of functions (2), which includes For the sake of the transformation of coordinate system an
a summation up to 6, yielding 6 4 = 24 terms of simplied Fourier extended two dimensional Fourier-series (4) is needed.
expansions [31] with two independent variables.

6

6 Ec (, )= f1 cos(m + m )f2 cos(m m )


Ea (, ) = a0 + (a1 cos m + b1 cos m m=1
m=1
+ f3 cos(m + m f4 sin(m m )
+ a2 sin m + b2 sin m ) (2)
+ f5 sin(m + m )f6 cos(m m )
where a0 is the constant term in the series, is the conversion
+ f7 sin(m + m )f8 sin(m m ) (4)
factor from degrees to radians and m is the number of the terms.
To achieve higher accuracy, a set of Gaussian functions were tted
(3) to the recognizable critical points. Figures 3 and 4 show the conformational PES for compounds IV.
The optimized parameters of the ve molecules and the accuracy of
9
(c (0m )2 /2m
2 +c ( 2 2
0m ) /2 m ) the tted surface are listed in Tables 1 and 2. The tted parameters
Eb (, )= Am e (3)
are listed in Supplementary Materials in Tables S1 and S2.
m=1
Tables 1 and 2 also summarize the symmetric properties of
where m is the number of the terms, A is the amplitude, 0 and 0 the surfaces studied and the corresponding tted functions. The
dene the center and the and are the and extension of inhomogeneity of the molecules main chain causes a reduction
the ellipsoid. in the number of the symmetry axes and the surface becomes
96 A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197

Table 1
Summary of the minimum energy critical points of the IIV compounds and accuracy of surface t.

Compound Unique Optimized results Fitted results R2 RMSE Fitted function Number of tted Eopt Et
minima parameters

 E  E
(kJ mol1 ) (kJ mol1 )

I 9 60.36 60.72 0.00 60.36 60.72 0.00 0.998 0.344 (1) 25 0

II 4 64.69 89.66 14.15 67.28 89.90 9.90 0.997 0.992 (1) + (2) 70 4.25
2 68.28 67.90 7.07 68.28 67.90 5.79 1.28
4 65.40 180.00 3.62 65.40 180.00 2.51 1.11
1 179.23 179.23 0.00 179.23 179.23 0.63 0.63

III L , D 122.78 21.93 10.24 120.20 19.39 12.65 0.929 4.631 (1) + (2) + (3) 118 2.41
L 179.99 180.00 2.58 178.8 178.8 2.66 0.08
L, D 82.15 68.54 0.00 81.43 50.41 2.28 2.28

IV D 165.06 30.94 29.28 159.3 28.79 24.50 0.911 5.725 (1) + (2) + (3) 118 4.78
D 67.92 26.68 24.21 67.17 21.11 28.07 3.86
L 126.55 21.42 13.04 117.10 19.10 14.71 1.67
D 73.52 57.39 10.88 67.33 67.17 10.05 0.83
L 158.33 163.41 5.97 155.50 167.00 6.06 0.09
L 82.87 72.60 0.00 86.36 78.89 2.23 2.23

less-symmetric, consequently, the function which describes that symmetry what so ever. The increasing complexity in the structure
surface is increasingly more complicated. Consequently, the R2 leads to increasing complexity of the tted mathematical functions.
values are gradually reduced from unity, the differences between The surface of glycine diamide, the alanine diamide and the valine
calculated and RMSE of the tted functions are also reduced in a diamide needs 6 terms of 2 dimensional Fourier expansions, 9 terms
gradual fashion. of Gaussian functions and 6 terms of extended Fourier functions.
Tables 1 and 2 indicate that the PESs which are highly sym- The PES was originally described by 625 energy values dened
metrical can be tted by a simple two dimensional Fourier by the two rotated dihedral angles. Creating a function describ-
expansion (1). In contrast the PESs of the molecules that have strong ing the conformational space of the species drastically reduced the
intramolecular interactions can only be described with more com- amount of data (Tables 1 and 2) and made it possible to perform
plicated functions. The number of the Fourier expansions depends a more involved mathematical analysis on the functions than it is
on the number of the higher maxima on the surface. If the surface ever doable on a matrix of data points.
has only diagonal and central symmetry, it is required to incorpo- While the Fourier expansion denes mostly the periodicity of
rate Gaussian functions (2). The number of the higher maxima and the surface, the Gaussian function denes the minima and the
the minima determine the number of these functions. maxima and the extended Fourier series dene the surface dissym-
The molecules, (IIIV), contain a peptide bond and have a dis- metry. It seems the complexity of the molecule and the number of
symmetrical structure, which is described by a surface with a the symmetry axes species the complexity and the accuracy of
symmetry center only for the glycine diamide (III). In contrast, the tted function. Table S3 illustrates this point in the cases of
the chiral alanine diamide (IV) and the valine diamide (V) have no propane and n-pentane. The number of functions needed is mostly

Table 2
Summary of the minimum energy critical points of the valine diamide (V) and accuracy of the surface t.  indicates the orientation of the sidechain (Scheme 1). To the 625
DFT computed grid-points the t was performed using tting functions of (1) + (2) + (3) and 118 parameters. R2 and RMSE were 0.913 and 6.364 kJ mol1 , respectively.

Compound Unique minima Optimized results Fitted results Eopt Et

  E  E
(kJ mol1 ) (kJ mol1 )

D (a) 62.93 39.08 164.98 29.08 60.00 38.18 31.19 2.11


D (g) 50.87 39.25 68.52 29.88 52.87 38.18 31.19 1.31
D (g+) 50.91 42.45 69.38 34.46 52.73 41.82 35.01 0.55
L (a) 86.81 22.80 173.73 22.46 85.45 23.64 16.98 5.48
L (a) 126.58 132.80 175.43 7.31 125.50 136.40 3.95 3.36
L (g) 132.08 159.92 60.83 5.11 132.70 154.50 0.24 4.87
L (g+) 152.50 153.74 71.60 5.68 150.90 150.90 1.45 4.23
D (a) 128.55 65.43 178.76 31.50 128.50 63.64 28.42 3.08
D (g) 131.95 40.53 67.14 37.58 132.70 41.82 32.61 4.97
D (g+) 160.47 36.22 64.15 29.76 160.50 34.55 23.91 5.85
L (g) 122.94 18.74 66.09 10.99 121.80 20.00 14.92 3.93
V L (g+) 130.10 21.79 71.94 16.24 129.10 20.00 16.91 0.67
D (a) 77.03 142.69 177.34 40.48 78.18 143.60 43.02 2.54
D (g) 68.84 174.90 41.88 44.92 67.27 172.70 47.90 2.98
D (g+) 74.35 164.18 102.17 47.55 74.55 165.50 49.77 2.22
L (g) 122.84 18.70 66.10 10.99 121.80 20.00 13.14 2.15
L (g+) 130.11 21.79 71.94 16.24 129.10 20.00 16.91 0.67
 D (a) 73.51 60.81 173.19 9.10 70.91 60.64 14.18 5.08
 D (g) 59.54 33.73 65.04 19.24 60.00 34.55 17.28 1.96
 D (g+) 62.65 38.66 72.09 20.76 63.64 35.44 19.81 0.95
 L (a) 83.49 78.96 175.19 0.00 78.18 81.82 3.28 3.28
 L (g) 83.88 65.26 71.82 1.85 85.45 67.27 5.81 3.96
 L (g+) 84.28 72.33 60.61 3.74 85.45 74.55 8.15 4.41
A. Rgyanszki et al. / Chemical Physics Letters 625 (2015) 9197 97

determined by the complexity of the molecule. Although the t- their biological and environmental answers, TMOP-4.2.2.C-
ting becomes more appropriate by applying more functions, there 11/1/KONV-2012-0010 Supercomputer the national virtual
is a limit; simple molecules tted surfaces are only worsened if an laboratory, HUSRB/1002/214/193 Bile Acid Nanosystems as
excessive number of functions are used. Molecule Carriers in Pharmaceutical Applications and TMOP
In the gas phase, at this level of theory, not all of peptide min- 4.2.4. A/2-11-1-2012-0001, National Excellence Program Elabo-
ima appear on the surface. In the case of glycine diamide (Figure rating and operating an inland student and researcher personal
S1) ve conformers (L ,  L ,  D , L D ) and in the case of alanine support system, subsidized by the European Union and co-nanced
diamide (Figure S2) six conformers (D , L ,  L ,  D , L D ) instead by the European Social Fund.
of the ideal nine conformers shown in Figure 1B. In the case of The authors would like to thank Miln Szori for helpful dis-
the valine diamide twenty three conformers (Table 2) instead of cussions. The authors thank M. Labdi and L. Mller for the
the possible twenty seven were located. Four of the minima, L administration of the computer clusters used for this work.
(g), L (g+), L (a) and the L (a), couldnt be located. The compar-
ison of the tted surfaces suggested that for a given number of grid Appendix A. Supplementary data
points, such as 252 = 625 simple hydrocarbons can be tted more
accurately than peptides. However, the Ramachandran surfaces of Supplementary data associated with this article can be found, in
glycine, alanine and valine diamide can be tted with acceptable the online version, at doi:10.1016/j.cplett.2015.02.031.
R2 values (0.91 R2 0.93). More important are the deviations of
the optimized minimum energy points from the tted ones. These References
are given in the last column of Tables 1 and 2. These 625 grid points
can be represented by mathematical functions containing less than [1] M.R. Peterson, I.G. Csizmadia, J. Am. Chem. Soc. 100 (1978) 6911.
120 parameters. It is hoped that this may lead to successful big data [2] G.N. Ramachandran, C. Ramakrishnan, V. Sasisekharan, J. Mol. Biol. 7 (95)
(1963).
reduction needed to understand protein folding. A tted analytic [3] G. Tasi, B. Nagy, G. Matisz, T.S. Tasi, Comput. Theor. Chem. 963 (2011) 378.
function would reduce this big data into a more manageable set. [4] D. Autrey, N. Meinander, J. Laane, J. Phys. Chem. A 108 (2003) 409.
The last column of Tables 1 and 2 shows that the maximum devia- [5] I. Szab, G. Czak, Nat. Commun. 6 (2015) 5972.
[6] J.D. Lewis, T.B. Malloy, T.H. Chao, J. Laane, J. Mol. Struct. 12 (1972) 427.
tion between the optimized and the tted energies of the minima [7] L. Radom, J.W. Hehre, J.A. Pople, J. Am. Chem. Soc. 94 (1972) 2371.
is within 1.5 kcal/mol. [8] L. Radom, W.A. Lathan, W.J. Hehre, J.A. Pople, J. Am. Chem. Soc. 95 (1973) 693.
[9] L. Radom, J.A. Pople, J. Am. Chem. Soc. 94 (1970) 4786.
[10] M. Head-Gordon, J.A. Pople, J. Phys. Chem. 97 (1993) 1147.
5. Conclusion [11] A. Chung-Philips, J. Chem. Phys. 88 (1988) 1764.
[12] T.A.K. Kehoe, M.R. Peterson, G.A. Chass, B. Viskolcz, L. Stacho, I.G. Csizmadia, J.
The PES can be described with an accurate multi variable math- Mol. Struct. THEOCHEM 666667 (2003) 79.
[13] M.R. Peterson, I.G. Csizmadia, J. Am. Chem. Soc. 101 (1979) 1076.
ematical function. Depending on the structural complexity of the [14] T.A. Modro, W.G. Liauw, M.R. Peterson, I.G. Csizmadia, J. Chem. Soc. Perkin 2
given molecule, the surface can be characterized by a combination (1979) 1432.
of Fourier series and Gaussian functions. [15] G.R. DeMare, O.P. Strausz, M.R. Peterson, I.G. Csizmadia, J. Comput. Chem. 1
(1980) 141.
The molecules which have two symmetric rotational groups [16] M.R. Peterson, G.R. DeMare, I.G. Csizmadia, O.P. Strausz, J. Mol. Struct. 86 (1981)
and their surfaces have all the bilateral symmetry axes and can be 131.
described with a linear combination of the single Fourier series. In [17] M.R. Peterson, I.G. Csizmadia, Theor. Org. Chem. 3 (1982) 190.
[18] M.R. Peterson, G.R. DeMare, I.G. Csizmadia, O.P. Strausz, J. Mol. Struct. 92 (1983)
the case when the molecule has heteroatoms, the surface becomes
239.
less-symmetric and the tted functions are increasingly more com- [19] R.E. Kari, I.G. Csizmadia, J. Chem. Phys. 1443 (1969).
plicated and Gaussian functions are needed to t the surface. If [20] S. Wolfe, A. Rauk, I.G. Csizmadia, Can. J. Chem. 113 (1969).
the surface has only central symmetry or has no symmetry the t- [21] S. Wolfe, A. Rauk, I.G. Csizmadia, J. Am. Chem. Soc. (1969) 1567.
[22] A. Rgyanszki, A. Surnyi, I.G. Csizmadia, A. Kelemen, S.J. Knak Jensen, S.Y. Uysal,
ting needs a dissymmetrical correction function, in the form of an B. Viskolcz, Chem. Phys. Lett. 599 (2014) 169.
extended Fourier series. [23] M.J. Frisch, et al., Gaussian 09, Revision A.1, Gaussian, Inc., Wallingford, CT, USA,
The present study suggests that the potential energy surfaces 2009.
[24] A.D. Becke, J. Chem. Phys. 98 (1993) 5648.
and hypersurfaces of exible molecules, such as peptides, con- [25] G. Endrdi, A. Perzel, O. Farkas, M.A. McAllister, G.I. Csonka, J. Ladik, I.G. Csiz-
taining several internal bond-rotations, may be reasonably well madia, J. Mol. Struct. THEOCHEM 391 (1997) 15.
represented by these types of tting method. For such macro [26] J.J. More, Lect. Notes Math. 630 (1978) 105.
[27] MATLAB 2013b, The MathWorks Natick, 2013.
molecules the grid points would be in the domain of big data. [28] A.J.C. Varandas, Phys. Chem. Chem. Phys. 13 (2011) 9796.
[29] D. Autrey, N. Meinander, J. Laane, J. Phys. Chem. A 108 (2004) 409.
Acknowledgements [30] I. Jkli, A. Perczel, B. Viskolcz, I.G. Csizmadia, Protein Model, Springer Interna-
tional Publishing, 2014.
[31] G.P. Tolstov, Fourier Series, Courier-Dover, 1976.
The authors acknowledge the nancial support within TMOP-
4.2.2.A-11/1/KONV-2012-0047 New functional material and

You might also like