Determination of Best-Fit Potential Parameters For A Reactive Force Field Using A Genetic Algorithm

J Mol Model (2012) 18:10491061 DOI 10.
1007/s00894-011-1124-2
ORIGINAL PAPER
Determination of best-fit potential parameters for a reactive force field using a genetic algorithm
Poonam Pahari & Shashank Chaturvedi
Received: 9 February 2011 / Accepted: 9 May 2011 / Published online: 11 June 2011 # Springer-Verlag 2011
Abstract The ReaxFF interatomic potential, used for organic materials, involves more than 600 adjustable parameters, the best-fit values of which must be determined for different materials. A new method of determining the set of best-fit parameters for specific molecules containing carbon, hydrogen, nitrogen and oxygen is presented, based on a parameter reduction technique followed by genetic algorithm (GA) minimization. This work has two novel features. The first is the use of a parameter reduction technique to determine which subset of parameters plays a significant role for the species of interest; this is necessary to reduce the optimization space to manageable levels. The second is the application of the GA technique to a complex potential (ReaxFF) with a very large number of adjustable parameters, which implies a large parameter space for optimization. In this work, GA has been used to optimize the parameter set to determine best-fit parameters that can reproduce molecular properties to within a given accuracy. As a test problem, the use of the algorithm has been demonstrated for nitromethane and its decomposition products. Keywords Genetic algorithm . Force field . Decomposition products . Potential parameters PACS numbers 34.20.Cf . 83.10.Mj . 71.15.Pd . 31.50.-x . 33.15.-e
P. Pahari (*) : S. Chaturvedi Computational Analysis Division, Bhabha Atomic Research Centre, Visakhapatnam, India e-mail: poonam.pahari@gmail.com S. Chaturvedi e-mail: shashankvizag@gmail.com
Introduction Molecular dynamics (MD) simulations involve the use of a potential function to determine the forces acting on individual particles. The trajectories of the particles under the influence of this self-consistent force yield the temporal evolution of the system [14]. MD methods can be divided into ab initio and classical methods. Ab initio methods, although very accurate and general, are computationally extremely demanding. Their application is thus restricted to relatively small systems and short simulation times [5]. On the other hand, the use of classical force fields allow the dynamical simulation of millions of atoms, which makes them applicable to the study of a wide variety of physical processes (e.g., shock waves, dislocation dynamics, fracture and oxidation) that require larger system sizes and simulation times. However, the force fields are very difficult to develop, and their accuracy must be established for each application. The main challenge is to develop methodologies that retain the accuracy of quantum mechanics while allowing large-scale simulations. The interaction between atoms in a polyatomic molecule, or in a solid, can be described in terms of the potential energy surface (PES). This specifies the potential energy of a system of atoms in terms of the coordinates of all the atoms. There are two techniques for obtaining the PES. The first involves the generation of potential energy data using ab initio electronic structure methods, followed by a variety of fitting procedures [6, 7]. The second technique involves fitting a functional form for the PES to data obtained from high-resolution spectroscopy [8] or scattering experiments [9, 10], such as rotational spectra or vibrational energy levels. In the second technique, it is a major task to gather available information to construct a functional representation which can be used for MD simulations [1113]. Hence, other options have also been explored, such as neural networks and interpolative
1050
J Mol Model (2012) 18:10491061
moving least squares [1416].The use of genetic algorithms (GA) for determining best-fit potential parameters for nickel, by matching solid-state properties, is reported in [17]. For complex, polyatomic molecules, particularly those involving CHNO atoms, the potentials must allow for bond formation and breaking, and for the influence of closeneighbor effects on molecular structures. Potentials that take these effects into account are called reactive potentials examples include Tersoff, Brenner, REBO, BEBO, Valbond, and so on [2, 1823]. An empirical interatomic potential has been proposed by Tersoff for covalent systems. This potential has the form of the Morse pair potential, with the bond strength parameter depending upon the local environment, and is the first attempt to explain the structural chemistry of covalent systems. The Brenner potential [3] is based on the Tersoff potential, but includes correction terms that account for the overbinding of radicals and nonlocal effects. Hence, it can be applied to hydrocarbons, graphite and diamond lattices [3]. The Brenner potential can describe bond breaking, but its formalism does not include nonbonding contributions like Coulomb and van der Waals forces, which are important in predicting the structures and properties of many systems. Thus, this potential does not accurately predict the potential energy curves for hydrocarbons and graphite. Certain generalized forms of the Brenner potential include nonbonding forces, but are still not able to accurately predict the shapes of dissociation and reactive potential curves [5]. The reactive empirical bond order (REBO) potential [19] has also been developed by modifying the Tersoff potential. The initial form of the potential was only dependent on interatomic distances and did not include any dependence on molecular shape. A more advanced version has eliminated this limitation, but it still ignores long-range interactions and partial charges [24, 25]. The bond energy bond order (BEBO) [20, 21] and ValBond potentials [22, 23] also suffer from the limitation that they cannot describe the fully bonded equilibrium geometries of complex molecules [5]. The reactive force field ReaxFF appears to address all of these problems; details are available from [5]. It uses a general relationship between bond distance and bond order on the one hand, and between bond order and bond energy on the other. Valence terms, including contributions from torsion and
Table 1 Numbering for the atoms in the molecular species considered
valence angles, are defined in terms of the same bond orders, so that all of these terms smoothly go to zero as the bonds break. ReaxFF has Coulomb and van der Waals potentials to describe nonbonding interactions. This potential was initially developed with hydrocarbons in mind, and later extended to more complex systems consisting of carbon, hydrogen, nitrogen and oxygen [26]. It requires a total of 611 parameters for CHNO systems. Best-fit values of these parameters need to be determined for a particular system of atoms. We are interested in MD studies of the pressure- and temperature-induced decompositions of CHNO materials [27]. Hence, it is necessary to obtain the best-fit parameters of ReaxFF for representative CHNO materials. That is the topic of the present work. In the present work, a best-fit form of the ReaxFF potential was obtained by attempting to match experimentally known molecular parameters (such as bond lengths, valence angles and torsion angles) for the molecular species of interest and for their decomposition products. This work has two novel features. The first is the application of the GA minimization technique to a complex potential (ReaxFF) with a very large number of adjustable parameters, which implies a large parameter space for optimization. The second novel feature is the use of a parameter reduction technique to determine which subset of the 611 parameters plays a significant role for the species of interest; this is necessary to reduce the optimization space to manageable levels. In the Computational technique section, we describe the principles behind the parameter reduction technique and the GA algorithm. In Results for nitromethane and its decomposition products, we identify the relevant subset of ReaxFF parameters for a sample CHNO molecule and its decomposition products. The Results based on the genetic algorithm section describes the use of a genetic algorithm to determine the best-fit values for this reduced set of ReaxFF parameters. The limitations of the GA study are then summarized, and conclusions are presented.
Computational technique The ReaxFF potential makes use of more than 600 parameters for molecules containing C, H, N and O atoms. In the present
Molecular Atom type Atom type Atom type Atom type Atom type Atom type Atom type species and number and number and number and number and number and number and number CH3N O2 CH3N O CH2O OH NO C-1 C-1 C-1 O-1 N-1 H-2 H-2 H-2 H-2 O-2 H-3 H-3 H-3 H-4 H-4 O-4 N-5 N-5 O-6 O-6 O-7
J Mol Model (2012) 18:10491061
1051
work, our objective is to illustrate a new method for determining the best-fit values of these parameters for a given reactive molecule and its decomposition products. In order to reduce the computational cost, we choose a simple CHNO molecule: nitromethane (CH3NO2) and its decomposition products. Table 1 shows the molecular species considered, and the numbering schemes for the atoms of each of these molecular species.
Overall procedure For a given set of ReaxFF parameters, we can perform geometry optimization separately for each species based on molecular mechanics (MM). The minimum energy state yielded by MM gives the geometric properties for a given species, such as bond lengths, bond energies, and valence and torsion angles. These properties are then compared with
Table 2 Errors obtained in single parameter optimization. Terr =0.193 when single-parameter optimization was performed with 611 parameters Name of species CH3 NO2 CH3 NO2 CH33 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH3 NO2 CH2O CH2O CH2 O CH2O CH2O CH2O CH2O CH3NO CH3 NO CH3 NO CH3 NO CH3 NO CH3 NO CH3 NO CH3 NO CH3 NO CH3 NO CH3 NO CH3 NO CH3 NO OH NO Property Atom number 1 1 1 1 5 5 3 4 5 1 1 2 3 2 4 1 1 1 3 4 4 4 1 1 1 1 6 4 5 5 6 2 4 5 6 1 1 Atom number 2 3 4 5 6 7 1 1 1 5 5 3 2 1 1 2 3 4 1 1 1 1 2 3 4 5 5 1 1 1 5 1 1 1 5 2 2 Atom number Atom number VN VR VM Ei
Bond distance Bond distance Bond distance Bond distance Bond distance Bond distance Valence angle Valence angle Valence angle Valence angle Valence angle Torsion angle Torsion angle Torsion angle Torsion angle Bond distance Bond distance Bond distance Valence angle Valence angle Valence angle Torsion angle Bond distance Bond distance Bond distance Bond distance Bond distance Valence angle Valence angle Valence angle Valence angle Valence angle Torsion angle Torsion angle Torsion angle Bond distance Bond distance
2 3 2 6 7 1 1 5 5 4 5 6 7
1.109 1.108 1.108 1.545 1.209 1.209 220.0 218.0 214.0 238.0 238.0 242.0 238.0 178.0 300.0 1.106 1.106 1.208 235.0 242.5 242.5 236.0 1.094 1.092 1.1101 1.482 1.211 218.0 222.0 214.0 226.0 216.0 236.0 230.0 244.0 0.9396 1.1223
1.109 1.1086 1.1086 1.5456 1.2098 1.2098 110.3 109.8 107.8 119.2 119.2 121.0 119.3 89.9 150.8 1.1061 1.1061 1.208 117.463 121.267 121.271 118.45 1.094 1.092 1.1101 1.482 1.211 109.3 111.1 107.3 113.3 108.8 118.45 115.75 122.22 0.9396 1.1223
1.095 1.0939 1.0939 2.1954 1.2649 1.2649 95.8 96.1 116.4 153.2 153.3 96.5 131.6 74.0 169.2 1.1373 1.1373 1.3046 105.243 127.378 127.379 107.37 1.047 1.048 1.0493 1.2934 2.431 102.1 116.5 115.3 115.6 118.4 107.37 128.80 136.50 1.1456 1.2296
0.0002 0.0002 0.0002 0.1769 0.0021 0.0021 0.0043 0.0040 0.0016 0.0204 0.0204 0.0102 0.0027 0.0079 0.0037 0.0008 0.0008 0.0064 0.0027 0.0006 0.0006 0.0022 0.0018 0.0016 0.0030 0.0162 1.0156 0.0011 0.0006 0.0014 0.0001 0.0003 0.0022 0.0032 0.0034 0.0481 0.0091
2 2 3 3
3 2 3 1 3 3 2 1
2 3 2
1052
J Mol Model (2012) 18:10491061
the experimentally known (reference) values. The error in each geometrical quantity is defined as Ei VR VM =VN 2 ; 1
where VR is the reference value, VM is the value yielded by energy minimization, and VN is a suitable normalization factor. The reference values were taken from [28, 29] or computed using semiempirical molecular orbital calculations with the MOPAC code [30]. The first two columns of Table 2 list the molecular species and the geometrical properties that we seek to match by adjusting values of the ReaxFF parameters. A total of 37 molecular properties, including bond lengths, valence and torsion angles are taken into account when computing the deviation. The atoms involved in calculating these 37 properties are also listed in columns 36 of the table. The normalization factors VN are chosen as follows. According to [5], the allowed deviations in the bond lengths and angles were 0.01 and 2 respectively. Since bond lengths are of the order of 1 , and bond angles are typically of the order of 100, this corresponds to allowed deviations of 1% and 2%, respectively. Hence, in Eq. 1, the normalizing factor for the bond length is taken to be the same as its reference value, and as twice its reference value for bond angles. For the bond energy, the reference value is taken as VN. The values of VN and VR are given in columns 7 and 8 of Table 2. Given the error in each geometrical quantity, we define the overall objective function as: Terr hXN i 1=2 Ei =No i1
o
We observe that there are significant changes in Terr during optimization with respect to certain parameters, while other parameters yield relatively small changes. The parameters that lead to significant changes in Terr can thus be identified. If the set of parameters identified in this way is still too large, we resort to a second stage. This involves calculating the equivalent of the cross-correlation between the changes in each of these input parameters and the resulting changes in molecular properties. This second stage consists of the following steps: 1. Start with a nominal set of parameters Pj0. 2. Apply a set of randomly-chosen mutations to this vector, to generate, say, a set of 20,000 mutated vectors. Mutated values of each parameter are generated using the relation Pj Pj 0 1 Aj Rn ; where Aj is the amplitude of the perturbation in the j-th parameter and Rn is a random number lying between 0 and 1. Larger values of Aj give access to a larger search space around the nominal point. On the other hand, using very large values of Aj may lead to unphysical choices of some parameters, especially those that are used as exponents in the ReaxFF potential. Hence, the second stage should be repeated for different values of the amplitude Aj. In principle, the amplitudes Aj could be different for each parameter. However, since the amplitudes are normalized, we have opted to use one value (Aj1) for parameters that appear as exponents in ReaxFF, and another value (Aj2) for all other parameters. 3. Using a process called mating in GA theory, generate combinations of these 20,000 vectors, yielding a total of 40,000 final vectors. These vectors are stored as a matrix A with dimensions of 40,000 number of parameters. Details of the mating process are explained in the Genetic algorithm procedure. 4. For each of these vectors, perform molecular mechanics (MM) calculations for nitromethane and its product
where No is the total number of properties being matched and Terr is the function to be minimized. Determination of significant parameters The purpose of the present work is to illustrate a new best-fit procedure by applying it to a single reactive molecule and its decomposition products. For this restricted group of species, it is expected that only a subset of the 611 parameters would play a significant role, with the other parameters playing a relatively minor role. We first need to determine the significant parameters, and then to determine their best-fit values. This identification of significant parameters is performed in two stages, as described below. The first stage, which involves preliminary screening, is a single-parameter search [5]. The parameters are varied one at a time, with the rest being held constant. Each parameter is allowed to take on three different values, and the best-fit value for the parameter is chosen by locating the minimum of a parabola; this is somewhat similar to Brents method.
22 21.5 21
Terr
20.5 20 19.5 19
400
800
Parameter
Fig. 1 Evolution of Terr during single-parameter optimization with respect to ReaxFF parameters. Ordinate shows 100Terr
J Mol Model (2012) 18:10491061
1053
species, and obtain the minimum energy state of each species. This yields the equilibrated values of the molecular properties listed in Table 2. The corresponding deviations from the reference values are stored as a matrix B with dimensions of 40,000 number of properties. Each column in matrix A now contains 40,000 values of a particular parameter, while each column in matrix B contains the deviations in one molecular property.
5. Each element aij in a column of matrix A is then normalized as follows: anormalized aij a ; s
where a is the arithmetic mean and is the standard deviation of the elements in that column. The same process is applied to elements in matrix B. This normalization is required because matrix A, consisting
Table 3 Errors obtained after single-parameter optimization with 190 parameters. Terr =0.0513 Name of molecule CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH2O CH2O CH2O CH2O CH2O CH2O CH2O CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO OH NO Property Atom type 1 1 1 1 5 5 3 4 5 1 1 2 3 2 4 1 1 1 3 4 4 4 1 1 1 1 6 4 5 5 6 2 4 5 6 1 1 Atom type 2 3 4 5 6 7 1 1 1 5 5 3 2 1 1 2 3 4 1 1 1 1 2 3 4 5 5 1 1 1 5 1 1 1 5 2 2 Atom type Atom type Weight Literature/exp. value 1.109 1.1086 1.1086 1.5456 1.2098 1.2098 110.3 109.8 107.8 119.2 119.2 121.0 119.3 89.9 150.8 1.1061 1.1061 1.208 117.463 121.267 121.271 179.99 1.094 1.092 1.1101 1.482 1.211 109.3 111.1 107.3 113.3 108.8 118.45 115.75 122.22 0.9396 1.1223 ReaxFF computed value 1.090 1.089 1.088 1.643 1.233 1.233 95.2 95.5 120.6 153.8 152.4 95.8 131.9 93.6 155.9 1.101 1.101 1.226 113.55 123.225 123.226 180.0 1.148 1.158 1.143 1.443 1.23 102.9 115.4 113.0 125.7 96.4 108.32 119.26 134.98 0.9438 1.190 Error 0.310 3 0.310 3 0.310 3 0.410 2 0.410 3 0.410 3 0.4710 2 0.4310 2 0.3610 2 0.21110 1 0.19310 1 0.10910 1 0.2810 2 0.410 3 0.310 3 0.000 0.000 0.2103 0.310 3 0.110 3 0.110 3 0.000 0.2410 2 0.3610 2 0.910 3 0.210 3 0.210 3 0.810 3 0.410 3 0.710 3 0.310 2 0.3310 2 0.1810 2 0.210 3 0.2710 2 0.000 0.3710 2
2 3 2 6 7 1 1 5 5 4 5 6 7
1.109 1.108 1.108 1.545 1.209 1.209 220.0 218.0 214.0 238.0 238.0 242.0 238.0 178.0 300.0 1.106 1.106 1.208 235.0 242.5 242.5 360.0 1.094 1.092 1.1101 1.482 1.211 218.0 222.0 214.0 226.0 216.0 236.0 230.0 244.0 0.9396 1.1223
2 2 3 3
3 2 3 1 3 3 2 1
2 3 2
1054
1 0.9 0.8
J Mol Model (2012) 18:10491061
Curve 1 Curve 2
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40
As the value approaches unity, most parameters would be rejected, which would reduce the optimization cost, but at the risk of eliminating important physics from the ReaxFF model. Cutoff values of 0.2, 0.4 and 0.6 have been examined in this study. The above procedure yields a subset of ReaxFF parameters that significantly affect at least one of the molecular properties listed in Table 2. Following this parameter reduction, an optimization study is performed based on GA. The GA procedure is described in the next subsection. Genetic algorithm procedure The concept of genetic algorithms was inspired by Darwins theory of evolution [32]. The idea is to perform natural selection for some group of parameters G which accurately describes the real system. The group of parameters are allowed to breed by mating and mutation, after which natural selection is carried out. Natural selection kills the poorest adapted species. The selected species are then allowed to breed by mating and mutation again, and so on for each genetic iteration. GA requires a starting population consisting of a certain number of parameter vectors, such as 256 or 1024. The starting population is generated as follows. Half the required number (e.g., 128 or 512) is generated by adding random fluctuations to the reference vector, yielding an additional set of 128 (or 512) parameter vectors. This methodology is adapted from the concept of mutation in GA. The remaining half is created by adapting the concept of mating: the parameter vectors generated above are cut at random positions and
Cross Correlation
Deviations for two different parameters
Fig. 2 Cross-correlation of the 41st (curve 1) and 37th (curve 2) parameters with each of 37 molecular deviations
of the input parameters, and matrix B, consisting of the deviations in the molecular quantities, physically represent different quantities with very different numerical values. 6. Compute matrix C, given by: C AT B; where C has the dimensions of number of input parameters number of deviations in output properties. Each row in C corresponds to the cross-correlation of one parameter with each of the deviations in molecular properties. If all of the values in a row are small, it is reasonable to claim that the parameter does not significantly affect molecular properties. This method is, in a sense, similar to calculating the cross-correlation (CC) between two random variables X and Y [31]. The CC gives an estimate of the linkage between changes in the two variables. If the two variables are only weakly coupled to each other, their crosscorrelation is small. Here, instead of X and Y being scalar, we have vectors with dimensions corresponding to the number of input parameters and the number of deviations in output properties, respectively. The peak value of the CC in a given row is the quantity of interest, since a parameter must be retained in our optimization if it significantly affects even one deviation in a molecular quantity. We specify some cutoff value and select only those parameters that have a peak CC higher than the cutoff value. The cutoff must lie between 0 and 1. The lower the value, the larger the number of parameters that will be retained, and the greater the cost of subsequent optimization.
1 0.9
Cross Correlation
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0 20 40 60 80 100 120 140 160 180 200
Number of Parameters
Fig. 3 Maximum cross-correlation value for each of the 190 parameters
J Mol Model (2012) 18:10491061 Fig. 4 Number of parameters selected as a function of the number of vectors for the amplitude set (0.05, 0.1). Results are shown for cross-correlation cutoff levels of 0.2, 0.4 and 0.6
Variation in the amplitude :5-10%
50
1055
Maximum No. of Parameters selected with given cutoff
45 40 35 30 25 20 15 10 5 0 190 Cutoff : 0.4 Cutoff : 0.6 Cutoff : 0.2
380
570
1710
1900
3800
5130
15390
19000
38000
45000
Parameter set
joined to yield a new pair of vectors. The procedure is best illustrated with an example. Let the starting pair of vectors be denoted by a1(i) and a2(i), 1 i 51. Suppose the random position is 20. The new set of vectors b1(i) and b2(i) is then calculated as follows: b1i a1i1 . . . 20 a2i21 . . . 51 b2i a2i1 . . . 20 a1i21 . . . 51 For each of the vectors generated above, corresponding to a parameter set, molecular mechanics simulations are carried out using the ReaxFF potential for each of the species listed in Table 1. The steepest descent algorithm
is used to locate the energy minimum for each species. Once the energy is minimized, we determine the bond lengths, valence angles and torsion angles for the molecules in the equilibrium configuration. This then yields the values of the deviations listed earlier. Finally, we get a single value of the function Terr for this parameter set. Each GA trial thus yields a vector of Terr values, with each element of the vector corresponding to one parameter set. Half of the parameter vectorsthose that yield the lowest values of Terrare then selected for the next GA trial, with the rest being eliminated.
Maximum No. of Parameters selected with given cutoff
Fig. 5 Number of parameters selected as a function of the number of vectors for the amplitude set (0.1, 0.2). Results are shown for cross-correlation cutoff levels of 0.2, 0.4 and 0.6
Variation in the amplitude :10-20% 50 45 40 35 30 25 20 15 Cutoff : 0.4 10 5 0 190 Cutoff : 0.6
Cutoff : 0.2
380
570
1710
1900
3800
5130
15390
19000
38000
45000
Parameter set
1056 Table 4 Cutoff=0.2 No. of vectors 28 190 380 570 1710 1900 3800 5130 15390 19000 38000 45000 Amplitude variation 0.050.1 190 48 23 20 22 21 20 21 21 21 14 13 Amplitude variation 0.10.2 190 48 28 24 22 22 23 23 23 23 23 23 Amplitude variation 0.40.5 190 30 0 0 0 0 0 0 0 0 0 0
J Mol Model (2012) 18:10491061
error falls further to 0.05. The 190 parameter values thus obtained form the starting point for the next stage of optimization: cross-correlation calculation. Cross-correlation calculation for parameter reduction The set of 190 significant parameters identified above still defines a rather large search space. In order to cut down on the number of input parameters even further, we need to calculate the equivalent of cross-correlation between each of these 190 input parameters and the 37 deviations, using the procedure defined in the Determination of significant parameters section. Figure 2 shows the variation in the CC of two parameters with the 37 molecular deviations. The highest values observed are 0.92 and 0.24 for the 41st and 37th parameters, respectively. This means that the 41st parameter is highly significant while the 37th parameter is not. In the same way, we determine the highest CC for all 190 parameters, which are shown in Fig. 3. Imposing a cutoff of 0.7 reduces the number of significant parameters to 51. As explained in the Determination of significant parameters section, cross-correlation results are affected by three choices: 1. Amplitudes Aj1 and Aj2. In order to determine the sensitivity of the results, we have examined the sets (Aj1, Aj2) = (0.05, 0.1), (0.1, 0.2) and (0.4, 0.5), respectively. For higher amplitudes, many combinations of parameters are likely to yield unphysical results for molecular properties. Such parameter sets are rejected altogether in this study.
Note that, in the first trial, the starting point is a single reference vector from which the desired population is generated by a combination of mutation and mating, as described above. However, in successive trials, 50% of the vector population is yielded by the last step. For example, if the total desired population has a size of 256, the last trial yields 128. The remaining 50% (128 vectors) are generated as follows. We extract half (64) of the vectors yielded by the last trialthose corresponding to the lowest Terr values. Sixty-four vectors are then created by random mutations of these vectors, and the remaining 64 are created by randomly mating these 64. Thus, we have a total of 128 vectors obtained from the previous trial and 128 newly created vectors produced using the mutationmating procedure.
40-50
Single-parameter search for parameter reduction All 611 parameters of the Reaxff potential are used in this step. Figure 1 shows the variation of Terr during this optimization process. Columns 9 and 10 of Table 2 show the results obtained at the end of this 611-parameter optimization. We observe that there are significant changes in Terr with respect to certain parameters, while other parameters yield relatively small changes. This study yields a set of 190 parameters which significantly affect Terr. We then repeat the single-parameter search with this reduced set of 190 parameters, yielding the final result given in Table 3. The
% Variation in the amplitude
Results for nitromethane and its decomposition products
10-20
5-10
0 80
100
120
140
160
180
200
Parameter Number
Fig. 6 Serial numbers of parameters selected for a vector size of 45,000. Cutoff level is 0.2. Results are shown for two different amplitude levels
J Mol Model (2012) 18:10491061 Table 5 Parameter vector set=28 Amplitude variation No. of significant parameters for a cutoff of 0.2 190 190 190 No. of significant parameters for a cutoff of 0.4 184 188 179 No. of significant parameters for a cutoff of 0.6 87 135 82
1057 No. of significant parameters for a cutoff of 0.7 51 86 48
0.050.1 0.10.2 0.40.5
2. The cutoff used for the peak cross-correlation value: we have examined 0.2, 0.4, 0.6 and 0.7. 3. The number of vectors generated. This sensitivity is examined below. The number of vectors should naturally be much larger than the number of parameters (190) so that we sample various combinations. The solution is to progressively increase the number of vectors until the number of parameters accepted after the cross-correlation study becomes constant. Hence, we have varied the number of vectors progressively from 28 (1/7 the number of parameters) to 45,000 (fifty times the number of parameters). For an amplitude set of (0.05, 0.1), Figure 4 shows the maximum number of parameters selected as a function of the number of vectors. The following points may be noted: 1. As expected, the number of parameters retained is a sensitive function of the cutoff. 2. For a given cutoff, we would expect the number of parameters retained to become constant beyond a certain number of vectors. While this is true up to 19,000 vectors, there is a surprising fall in the number of parameters retained beyond that point. The probable explanation for this is that a higher number of vectors
1.74 20
gives access to a larger number of combinations of ReaxFF parameters. This allows the significances of parameters to be determined separately. The conclusion is that a minimum number of vectors is necessary for the CC study. Figure 5 shows the same results for an amplitude set of (0.1, 0.2). The results become constant as we increase the number of parameter sets beyond 5,000. The results between 5,000 and 19,000 are close to those in Fig. 4. However, there is no fall beyond 19,000. This is probably because, due to the higher-amplitude perturbations, we require a larger number of vectors to properly sample the search space. The conclusion is that for a given perturbation amplitude, the number of vectors must be progressively increased to a high enough value until the result becomes constant. Also, the study must be repeated for different amplitudes to get a reasonably large search space. Table 4 shows the number of parameters selected as a function of the number of vectors for different amplitudes and for a cutoff level of 0.2. For the case of an amplitude of 0.40.5, only the case with 190 vectors leads to the selection of any parameters. This is because, due to the large amplitude, most parameter sets lead to unphysical results, and are thus rejected altogether. So far, we have seen that using a larger amplitude for a given cutoff leads to the acceptance of a slightly higher number of parameters. It is then necessary to check if this process has converged (i.e., whether the parameters selected are mostly the same for different amplitudes). Figure 6 shows the results for a fixed cutoff of 0.2 for three amplitudes. The abscissa lists the serial numbers of the parameters. We see that amplitude sets of (0.050.1) and (0.10.2) lead to the selection of essentially the same parameters, with only one exception. This shows that the process has converged. This result shows that, after convergence, only 515 parameters are selected for a cutoff threshold of 0.6. At this point, it is necessary to make a choice between different options before starting GA optimization. The first is to continue with this cutoff threshold and number
1.72
165 GA trial (Refer Y axis)
th
18 16 14
1.7
Terr
1.68
10
1 GA trial
1.66
st
8 6 4
(Refer Y axis)
1.64 0 20 40 60 80 100 120 140
Parameter Vectors
Fig. 7 Terr as a function of vector number after the 1st and 165th GA trials. Ordinate shows 100Terr
Terr
12
1058
J Mol Model (2012) 18:10491061
of vectors; GA would then proceed with a rather small number of parameters, which is computationally attractive but may not yield a good optimum. The second is to reduce the cutoff, retaining the same number of vectors, but we might then include parameters that are known to be irrelevant. The third option is to retain a high cutoff
but reduce the number of vectors. The third option ensures that only parameters that are highly correlated are included in GA, although the vector space has not been adequately sampled. In the present study, the purpose is to illustrate the overall technique rather than to make a judgement about the
Table 6 Deviations in molecular quantities corresponding to GA trial 165 Name of molecule CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH2O CH2O CH2O CH2O CH2O CH2O CH2O CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO OH NO Terr =0.0164 Property Atom type 1 1 1 1 5 5 3 4 5 1 1 2 3 2 4 1 1 1 3 4 4 4 1 1 1 1 6 4 5 5 6 2 4 5 6 1 1 Atom type 2 3 4 5 6 7 1 1 1 5 5 3 2 1 1 2 3 4 1 1 1 1 2 3 4 5 5 1 1 1 5 1 1 1 5 2 2 Atom type Atom type Weight Literature/exp. value 1.109 1.10856 1.10862 1.5456 1.20976 1.20976 110.3 109.8 107.8 119.2 119.2 121.0 119.3 89.9 150.8 1.1061 1.1061 1.208 117.463 121.267 121.271 179.99 1.094 1.092 1.1101 1.482 1.211 109.3 111.1 107.3 113.3 108.8 118.45 115.75 122.22 0.9396 1.1223 ReaxFF computed value 1.1055 1.10372 1.10376 1.5446 1.2224 1.2224 110.1 110.2 108.1 119.2 119.3 121.6 119.2 89.5 151.2811 1.09007 1.0901 1.2558 111.034 124.474 124.49 180.0 1.1046 1.095 1.0853 1.487 1.214 108.3 109.9 113.0 117.7 105.962 115.34 118.67 139.58 0.9369 1.1223 Square error 0.941610 5 0.190510 4 0.191710 4 0.406610 6 0.109310 3 0.109510 3 0.864610 6 0.312410 5 0.147710 5 0.156410 6 0.140710 7 0.585910 5 0.174110 6 0.458510 5 0.257210 5 0.207810 3 0.206310 3 0.156610 2 0.748310 3 0.17510 3 0.17610 3 0.771410 9 0.939510 4 0.952510 5 0.498810 3 0.966810 5 0.761810 5 0.212810 4 0.255810 4 0.710 3 0.38107 10 3 0.172610 3 0.173410 3 0.161210 3 0.50626 10 2 0.257210 5 0.318210 9
2 3 2 6 7 1 1 5 5
4 5 6 7
1.109 1.108 1.108 1.545 1.209 1.209 220.0 218.0 214.0 238.0 238.0 242.0 238.0 178.0 300.0 1.106 1.106 1.208 235.0 242.5 242.5 360.0 1.094 1.092 1.1101 1.482 1.211 218.6 222.0 214.0 226.0 216.0 236.0 230.0 244.0 0.9396 1.1223
2 2 3 3
3 2 3 1 3 3 2 1
2 3 2
J Mol Model (2012) 18:10491061
1059
best choice for CC calculation. Hence, we have chosen to use the results for the smallest-sized vector set (28), but with a high cutoff of 0.7. Table 5 shows the CC results. A cutoff of 0.7 yields 51 useful parameters for an amplitude set of 0.050.1. This parameter set is used in the next stage: GA optimization.
deviations are generally below acceptable limits, except for a torsion angle in CH3NO.
Limitations of this study 1. The literature on genetic algorithms describes a number of ways of performing mutation and mating operations [7, 17, 33]. Only one of these schemes has been applied in this work. A more detailed study, trialing different schemes, could yield better results. 2. The cross-correlation study yields a reduced parameter set. The number of parameters in that set depends upon the amplitudes, cutoff level and the number of vectors chosen. The final optimization results should be determined for different choices in the CC process, in order to increase the probability of finding a global minimum. 3. Chemical reactions such as molecular decomposition involve transition states far from equilibrium. Therefore, a reduced parameter set optimized for fits to equilibrium molecular structures may not necessarily represent reactions properly. We plan to extend this methodology to match sections of the potential energy surface corresponding to distorted/extended states of these species. 4. The genetic algorithm converges slowly. The RMS error reduces from 0.02 to 0.0164 in the 165th trial, and later on reduces to 0.015 after 1250 trials. One explanation for this slow rate of convergence is that the present GA algorithm retains the vectors yielding the lowest values of the objective function. This means that the vectors get progressively closer to a single minimum, reducing the rate of convergence and also restricting the result to a single minimum which may turn out to be a local minimum. There are alternate GA schemes where vectors yielding higher Terr values are also retained, although with a probability that is inversely related to the value of Terr [33]. That technique is more likely to locate the global minimum.
Results based on the genetic algorithm The GA process is now started with the reference vector yielded by single-parameter optimization, corresponding to Terr =0.0513, as shown in Table 3. The optimization varies only the 51 significant parameters determined by the CC study. We have performed a large number of GA trials following the procedure given in the Genetic algorithm procedure section, using a population of 256 vectors. For the 1st and 165th GA trials, Fig. 7 shows the lowest 128 Terr values as a function of the vector number. The minimum Terr value yielded by the first trial is 0.02, and this goes down to 0.0164 after the 165th trial. The individual deviations in molecular properties yielded by the 165th trial are shown in Table 6. The best vector obtained from 165 trials with a population size of 256 was then used to initiate a new GA study, with a population size of 1024. Results are shown in Fig. 8. After the completion of 74 and 134 GA trials, we obtain minimum Terr values of 0.01578 and 0.01556, respectively, indicating that the optimization has approached its best result. The best result of 0.015 is obtained after 1250 GA trials. Table 7 presents the deviations at this point. The individual
0.0168 0.0166 Refer Y Axis 0.0164 0.0162 0.0176
GA Trial = 134 GA Trial = 74 Refer Y Axis
0.0174 0.0172 0.017 0.0168
Terr
0.016 0.0158 0.0156 0.0154 0
0.01576 0.01572 0.01568 0.01564 0.0156 0.01556

100
0.0161 0.01605 0.016 0.01595 0.0159 0.01585 0.0158 0.01575 0 2 4 6 8 10

300 400 500
0.0166 0.0164 0.0162 0.016 0.0158 0.0156 600
Conclusions We have obtained a best-fit form of the ReaxFF potential for nitromethane and its decomposition products by matching experimentally known molecular parameters, such as bond lengths, valence angles and torsion angles. This work has two novel features. The first is the use of a parameter reduction technique to determine which subset of the 611 ReaxFF parameters plays a significant
200
Parameter Vectors
Fig. 8 Terr as a function of vector number after the 74th and 134th GA trials
1060 Table 7 Results corresponding to GA trial 1250 Name of molecule CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH3NO2 CH2O CH2O CH2O CH2O CH2O CH2O CH2O CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO CH3NO OH NO Terr = 0.015 Property Atom type 1 1 1 1 5 5 3 4 5 1 1 2 3 2 4 1 1 1 3 4 4 4 1 1 1 1 6 4 5 5 6 2 4 5 6 1 1 Atom type 2 3 4 5 6 7 1 1 1 5 5 3 2 1 1 2 3 4 1 1 1 1 2 3 4 5 5 1 1 1 5 1 1 1 5 2 2 Atom type Atom type Weight Literature/exp. value 1.109 1.10856 1.10862 1.5456 1.20976 1.20976 110.3 109.8 107.8 119.2 119.2 121.0 0 119.3 89.9 150.8 1.1061 1.1061 1.208 117.463 121.267 121.271 179.99 1.094 1.092 1.1101 1.482 1.211 109.3 111.1 0 107 113.3 108.8 118.45 115.75 122.22 0.9396 1.1223
J Mol Model (2012) 18:10491061
ReaxFF computed value 1.1086 1.106 1.1063 1.538 1.214 1.214 0 110.042 110.0347 108.0077 119.2 119.2 121.3 119.3 89.7 151.11 1.0961 1.0961 1.184 110.58 124.7 124.719 180.0 1.102 1.0954 1.0803 1.485 1.203 109.2 109.2 3 108 118.1 109.2 115.34 118.67 139.58 0.937 1.1223
Error 0.992110 7 0.503710 5 0.415310 5 0.22410 4 0.138510 4 0.12335 10 4 0.13647 10 5 0.11594 10 5 0.94234 10 6 0.225910 8 0.895610 7 0.22617 10 5 0.43410 7 0.18777 10 5 0.134610 5 0.82295 10 4 0.81514 10 4 0.40849 10 3 0.85778 10 3 0.200410 3 0.202110 3 0.771410 9 0.572310 4 0.985110 5 0.72236 10 3 0.50418 10 5 0.48259 10 4 0.166302 10 4 0.77436 10 4 7 0.40815 104 0.45154 10 3 0.311210 5 0.173410 3 0.161210 3 0.50626 10 2 0.763739 10 5 0.271710 9
2 3 2 6 7 1 1 5 5 1 2 2 3 3 1
4 5 6 7
1.109 1.108 1.108 1.545 1.209 1.209 220.0 218.0 214.0 238.0 238.0 242.0 238 178.0 300.0 1.106 106 1.208 235.0 242.5 242.5 360.0 1.094 1.092 1.1101 1.482 1.211 218.6 222.0 214 1 226.0 216.0 236.0 230.0 244.0 0.9396 1.1223
3 2 3 3 3 2 1
2 3 2
role for the species of interest; this is necessary to reduce the optimization space to manageable levels. The second is the application of GA techniques to a complex potential (ReaxFF) with a very large number of adjustable parameters, which implies a large parameter space for optimization.
Using a subset of 51 ReaxFF parameters, we have obtained a reasonably good match to 37 molecular properties for nitromethane and its decomposition products, with a root mean square deviation of 1.5%. It is expected that the use of more sophisticated GA algorithms would yield an even better match to reference data.
J Mol Model (2012) 18:10491061 Acknowledgment It is a pleasure to acknowledge the help given by Dr. A.C.T. van Duin in providing the Reaxff MD code and also helping with the use of that code.
1061 16. Thompson DL, Wagner AF, Minkoff M (2006) J Phys Conf Ser 46:234 17. Xu YG, Liu GR (2003) J Micromech Microeng 13:254 18. Tersoff J (1988) Phys Rev Lett 61:2879 19. Haskins PJ, Cook M, Fellows J, Wood A (1998) In: Proc 11th Int Symp on Detonation, Aspen, CO, USA, 30 Aug4 Sept 1998, p 897 20. Johnston HS, Parr C (1963) J Am Chem Soc 85:2544 21. Johnston HS (1963) J Am Chem Soc 85:2544 22. Root DM, Landis CM (1993) J Am Chem Soc 115:4201 23. Cleveland T, Landis CM (1996) J Am Chem Soc 118:6020 24. Brenner DW, Shendrova OA, Harrison A, Stuart SJ, Boris N, Sinnott SB (2002) J Phys Condens Matter 14:783 25. Boris N, Lee KH, Sinnott SB (2004) J Physi Condens Matter 16:7261 26. Strachan A, van Duin A, Chakraborty D, Dasgupta S, Goddard WA III (2003) Phys Rev Lett 91:098301 27. Strachen A, Kober EM, van Duin A, Oxaggard J, Goddard WA III (2005) J Chem Phys 122:054502 28. University of Waterloo Webpage (2011) http://www.science. uwaterloo.ca/~cchieh/cact/c120/bondel.html 29. DOlgov EY, Batev VA, Godunov IA (2004) Int J Quant Chem 96:193 30. SoftWare (2011) SoftWare: 64-bit operating system (webpage). http://www.cachesoftware.com/mopac/index.shtml 31. Croxton FE, Cowden DJ, Klein S (1939) Applied general statistic. Prentice Hall Inc, New York 32. Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor 33. Deaven DM, Ho KM (1995) Phys Rev Lett 75:288
References
1. Rapaport DC (1995) The art of molecular dynamics simulation. Cambridge University Press, Cambridge 2. Tersoff J (1986) Phys Rev Lett 56:632 3. Brenner DW (1990) Phys Rev B 42:9458 4. Stuart SJ, Tutein B, Harrison JA (2000) J Chem Phys 112:6472 5. van Duin A, Dasgupta S, Lorant F, Goddard WA (2001) J Phys Chem A 105:9396 6. Smeyers YG, Bellido MN (2004) Int J Quant Chem 23:507 7. Makarov DE, Metiu H (1998) J Chem Phys 108:590 8. Melandri S, Favero G, Caminati W, Favero B, Esposti AD (1997) J Chem Soc Faraday Trans 93:2131 9. Graham AP, Hofmann F, Toennies JP, Chen LY, Ying SC (1997) Phys Rev Lett 78:3900 10. Carlson AF, Madix RJ (2000) Surf Sci 470:62 11. Kryachko ES, Lwdin O, Brndas E (2004) Fundamental world of quantum chemistry: a tribute to the memory of Per-Olov Lowdin, vol II. Kluwer, Dordrecht 12. Hutson JM, Ernesti A, Law MM, Roche CF, Wheatley RJ (1996) J Chem Phys 105:9130 13. Atkins KM, Hutson JM (1996) J Chem Phys 105:440 14. Prudente FV, Acioli H, Neto JJS (1998) J Phys Chem 109:8801 15. Saad D, Rattray M (1997) Phys Rev Lett 79:2578

Determination of Best-Fit Potential Parameters For A Reactive Force Field Using A Genetic Algorithm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Determination of Best-Fit Potential Parameters For A Reactive Force Field Using A Genetic Algorithm

Uploaded by

Copyright:

Available Formats

J Mol Model (2012) 18:10491061 DOI 10.

J Mol Model (2012) 18:10491061

J Mol Model (2012) 18:10491061

J Mol Model (2012) 18:10491061

J Mol Model (2012) 18:10491061

J Mol Model (2012) 18:10491061

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40

Deviations for two different parameters

Fig. 3 Maximum cross-correlation value for each of the 190 parameters

Maximum No. of Parameters selected with given cutoff

45 40 35 30 25 20 15 10 5 0 190 Cutoff : 0.4 Cutoff : 0.6 Cutoff : 0.2

Maximum No. of Parameters selected with given cutoff

Variation in the amplitude :10-20% 50 45 40 35 30 25 20 15 Cutoff : 0.4 10 5 0 190 Cutoff : 0.6

J Mol Model (2012) 18:10491061

% Variation in the amplitude

Results for nitromethane and its decomposition products

1057 No. of significant parameters for a cutoff of 0.7 51 86 48

0.050.1 0.10.2 0.40.5

165 GA trial (Refer Y axis)

1.64 0 20 40 60 80 100 120 140

J Mol Model (2012) 18:10491061

J Mol Model (2012) 18:10491061

GA Trial = 134 GA Trial = 74 Refer Y Axis

0.0174 0.0172 0.017 0.0168

0.016 0.0158 0.0156 0.0154 0

0.01576 0.01572 0.01568 0.01564 0.0156 0.01556

0.0161 0.01605 0.016 0.01595 0.0159 0.01585 0.0158 0.01575 0 2 4 6 8 10

0.0166 0.0164 0.0162 0.016 0.0158 0.0156 600

J Mol Model (2012) 18:10491061

You might also like