Bayesian approach to the calculation of lateral interactions: NO/Rh(111)

(1)

Bayesian approach to the calculation of lateral interactions:

NO/Rh(111)

Citation for published version (APA):

Jansen, A. P. J., & Popa, C. (2008). Bayesian approach to the calculation of lateral interactions: NO/Rh(111). Physical Review B, 78(8), 085404-1/9. [085404]. https://doi.org/10.1103/PhysRevB.78.085404

DOI:

10.1103/PhysRevB.78.085404 Document status and date: Published: 01/01/2008

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

Bayesian approach to the calculation of lateral interactions: NO/Rh(111)

A. P. J. Jansen and C. Popa

Laboratory of Inorganic Chemistry and Catalysis, ST/SKA, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands

共Received 9 April 2008; published 5 August 2008兲

We show how Bayesian statistics and density-functional theory can be combined to compute reliable values for the interactions in a cluster expansion for adsorbates on a surface. The method is an alternative to the leave-one-out cross-validation method. We show that it easily selects which interactions can be determined even if the total number of possible interactions is very large. We have applied the method to NO/Rh共111兲. Based on the interactions we have determined for this system we have predicted some structures, which have been confirmed by scanning-tunneling microscopy.

DOI:10.1103/PhysRevB.78.085404 PACS number共s兲: 68.43.Bc, 02.50.Tt, 34.35.⫹a

I. INTRODUCTION

Interactions between adsorbates, or lateral interactions, on transition-metal surfaces have at least been known for as long as diffraction techniques have revealed that adlayers can form very well-defined structures at low temperatures. The importance of these interactions for kinetics at higher tem-peratures has only more recently been acknowledged but forms now an active area of research. This is understandable if one realizes that even small interactions between adsor-bates can be of the same magnitude or larger than the ther-mal energy and can therefore change rate constants by an order of magnitude or more, especially at low temperatures. There are various experimental techniques one can use to determine lateral interactions. Using scanning-tunneling mi-croscopy one can determine the statistical distribution of the adsorbates over the surface. This distribution can be con-verted into a radial distribution function, which in turn can be converted into effective lateral interactions. One has to be able to image the exact position of each adsorbate to obtain the radial distribution function. This requires a low adsorbate mobility, which means that it is mainly applied to atoms on low-temperature surfaces.1–3

Low energy electron diffraction allows one to study the ordered phases of an adsorbate. One usually compares the different ordered phases found for an adsorbate and the tem-perature ranges in which they appear. This experimental re-sults can then be fitted using a lattice-gas model, thus yield-ing values for the lateral interactions.4–6 _{Several ordered}

phases are needed for this method to be of practical use. Accurate estimates of 共differences in兲 binding energies can be obtained by analyzing temperature programmed de-sorption traces.7–10 _{The differences in desorption}

tempera-tures can be directly related to differences in binding ener-gies. It is, however, in general difficult to relate these differences in binding energies to lateral interactions, since the local adsorbate configuration 共the number of adsorbates interacting with the desorbing molecule兲 is unknown. One option is to fit desorption spectra using kinetic Monte Carlo simulations.11

In single-crystal adsorption calorimetry binding energies of adsorbates can be directly measured. The differences in binding energies due to lateral interactions can therefore also

be accurately determined. The configuration of the adsorbate adlayer is unknown, however, and relating these differences in binding energies to lateral interactions is therefore—just as for temperature programmed desorption—difficult. If the local adsorbate configuration is known or can be guessed using another technique, then lateral interactions can be extracted.12–15

With computer hardware becoming faster it is possible to do quantum chemical calculations on quite realistic models of adsorbates on transition-metal surfaces especially using density-functional theory 共DFT兲. Such calculations have the advantage over experiments for the determination of lateral interactions that one knows precisely the system one is deal-ing with, and one does not have to worry about possible effects of steps, impurities, shortcomings of apparatuses, etc. When calculating lateral interactions they are most com-monly described with the cluster expansion. The adsorption energy Eadsof an adsorbate in a particular adlayer structure is

then written as

E_ads=

兺

m

cmVm, 共1兲 with Vmas the value of the interaction of type m and cmas the number of interactions of type m per adsorbate. The in-teractions Vmstand for the interaction of the adsorbate with the substrate 共adsorption energy of an isolated adsorbate兲, pair interactions between adsorbates at various distances, all possible three-particle interactions, four-particle interactions, etc. This expansion can be made to reproduce the calculated adsorption energy as accurately as one wants.16 _This

gener-ally takes however a large number of terms. Moreover, it may lead to overfitting; i.e., the cluster expansion will not only describe the interactions but also the errors one makes in the calculations of the adsorption energies. To avoid this one needs to truncate the cluster expansion.

The truncation of the cluster expansion for lateral interac-tions between adsorbates has so far mainly been done based on the desired accuracy with which the truncated expansion reproduces the calculated results, the number of acceptable terms in the expansion, the type of terms in the expansion 共pair, three particle, etc.兲, the estimated accuracy of the cal-culated results, and possibly other factors. Often these factors involve a trade-off; e.g., one prefers a short expansion, but it

(3)

should also be accurate. Researchers have usually dealt with this using their personal experience and insight, but few ob-jective criteria have been used.

A similar problem was encountered for the calculation between atoms forming an alloy. This has led to the devel-opment of the leave-one-out cross-validation 共LOO-CV兲 method.17,18 _{This is a statistical technique that uses part of}

the results of a set of calculations to determine values for the interactions between atoms and the rest to test these values. Because determination and testing of the interactions is done on independent results of calculations, one obtains an esti-mate of how well the values for interactions one calculates will predict energies of unknown structures. This method has recently also been applied to the determination of lateral interactions.19–21

In the LOO-CV method one starts with a model with few interaction parameters and determines the CV score. This is a measure for how much a prediction of the energy of an ad-layers based on the model will differ from the energy ob-tained from a calculation. Adding parameters will initially lower the CV score, which means that the model becomes better. Adding too many parameters should however increase the CV score because of overfitting. The minimum of the CV score indicates a best set of interaction parameters. The prob-lem with the method is that often the CV score becomes almost constant when more parameters are added, and it is very hard to determine the minimum of the CV score.22–24

This paper presents an alternative to the LOO-CV method. It is based on Bayesian statistics. Instead of the CV score, we assign a probability to each model of lateral inter-actions or set of terms of the cluster expansion. The best model is the one with the highest probability. We will show that this approach does not have the drawback of the LOO-CV method. The model with the maximum probability is well defined. Moreover, it seems that the approach leads to model with fewer interaction parameters. The method also lends itself well to an analysis of the importance of param-eters; it is easy to compute probabilities for individual pa-rameters and correlation between papa-rameters. We will show as an illustrative example the interactions between NO mol-ecules in adlayers on Rh共111兲.

II. THEORY

We assume that we have done calculations on various adlayer structures with the same substrate in all calculations and only one type of adsorbate in all adlayer structures. The calculations have resulted in the adsorption energy per adsor-bate E_adscalc.共n兲 with n as an index to distinguish the adlayer structures. We want to describe the energy E_adscalc.共n兲 using a cluster expansion for the lateral interactions. This means we write

E_adsfit共n兲 =

兺

m

cm共n兲Vm, 共2兲 with Vmas the value of the interaction of type m and cm共n兲 as the number of interactions of type m per adsorbate in struc-ture n. The interactions Vm stand for the interaction of the adsorbate with the substrate共adsorption energy of an isolated

adsorbate兲, pair interactions between adsorbates at various distances, all possible three-particle interactions, four-particle interactions, etc. The expression E_adsfit共n兲 should ap-proximate E_adscalc.共n兲. Note that we treat the adsorption energy of an isolated adsorbate in the same way as the lateral inter-actions. It is possible to single out adlayer structures with low coverage and therefore no lateral interactions to deter-mine the adsorption energy of isolated molecules separately. The errors that are made in such determination of adsorption energies will however skew the subsequent fit of the lateral interactions. We have shown that this increases the error in the lateral interactions by a factor of

冑

2.20 _{One also misses}

the advantage that systematic errors in the E_adscalc.共n兲’s do not affect the lateral interactions as will be shown below.

The summation in the expression for E_adsfit共n兲 runs in prin-ciple over an infinite number of interactions. The question is which of the interactions can be determined from the adsorp-tion energies E_adscalc.共n兲. This is the important difference with a straightforward multivariate linear regression. Each subset of all interactions forms a model for the lateral interactions in the system. We want to know which interaction model de-scribes the calculated adsorption energies best. We use S for a subset of all interaction parameters Vm. We use V as a shorthand for the set of all values of the interaction param-eters in S. For all calculated adsorption energies we use E. We will determine which model of the interactions is best by calculating P共S兩E兲, which stands for the probability that the calculated adsorption energies E can be described by inter-action parameters in S.

We can use Bayes’s theorem to relate the probability of S given E 关i.e., P共S兩E兲兴 to the probability of E given S 关i.e., P共E兩S兲兴.25–27

P共S兩E兲 ⬀ P共E兩S兲P共S兲. 共3兲 The proportionality constant that is missing in this expres-sion can be determined by normalizing P共S兩E兲 as a function of S. The probability P共E兩S兲 is often called the likelihood of S given E, the probability P共S兲 is called the prior 共probabil-ity兲 of S, and the probability P共S兩E兲 is called the posterior 共probability兲.

Bayes’s theorem is used as follows in the selection of models for the lateral interactions. We want to calculate P共S兩E兲. How good a model is will also depend on whether we can find good values for the lateral interactions. It is important to distinguish between S 共the parameters in the model兲 and V 共the values of these parameters兲. We can intro-duce the values by regarding P共E兩S兲 as a marginal distribu-tion of P共E,V兩S兲 via27

P共E兩S兲 =

冕

dVP共E,V兩S兲. 共4兲

The integrand can be written as

P共E,V兩S兲 = P共E兩S,V兲P共V兩S兲. 共5兲 Substitution in the Bayes’s expression for P共S兩E兲 then gives

A. P. J. JANSEN AND C. POPA PHYSICAL REVIEW B 78, 085404共2008兲

(4)

P共S兩E兲 ⬀ P共S兲

冕

dVP共E兩S,V兲P共V兩S兲. 共6兲 This allows us to compute P共S兩E兲 because we can make a good guess of what the calculated adsorption energies should be given the lateral interactions 关i.e., P共E兩S,V兲兴, and it should be possible to think of reasonable priors P共V兩S兲 and P共S兲.

Suppose we have a set of interaction parameters S with values V. A normal way to obtain such a set is via a least-squares procedure to fit E_adsfit共n兲 to E_adscalc.共n兲. Suppose that the set has all interaction parameters to describe the system and that they have the correct values. The most likely values for E_adscalc.共n兲 should then be equal to E_adsfit共n兲. Due to errors in the calculations the calculated adsorption energies will not be exactly equal. Instead it seems reasonable to assume that the difference can be described by a Gaussian probability distri-bution; i.e., P共E兩S,V兲 = exp

冋

−1 2␹1 2

册

兿

n=1 N_str 1

冑

2␲␴n2 , 共7兲 with ␹12⬅

兺

n=1 N_str

冋

E_adsfit共n兲 − E_adscalc.共n兲

␴n

册

2

, 共8兲

with N_str as the number of adlayer structures for which we have calculated adsorption energies and ␴n as an error esti-mate of the calculated adsorption energies E_adscalc.共n兲. Note that P共E兩S,V兲 is a function of the adsorption energies E_adscalc.共n兲, but the integration in Eq.共6兲 is over V. This means that we should regard ␹₁2 as a function of the interaction parameters V.

Usually one does not know much about the interaction parameters before one starts with the calculations of the ad-sorption energies nor about which interaction parameters to include in the model. In Eq. 共6兲 ideas on which model S is appropriate are split from ideas on which values V seem reasonable. The former have a prior P共S兲, the latter a prior P共V兩S兲. A reasonable expression for P共V兩S兲, which is also computationally convenient, is a Gaussian distribution,

P共V兩S兲 = exp

冋

−1 2␹2 2

册

兿

m=1 N_par 1

冑

2␲sm 2, 共9兲 with ␹2 2_⬅

兺

m=1 N_par

冋

−共Vm− Vm 共0兲_兲2 sm 2

册

, 共10兲

with Npar as the number of parameters in the interaction

model, V_m共0兲as the most likely prior value for parameter m, and sm as the standard deviation. The lack of prior knowl-edge of the values of the interactions can be implemented by choosing large values for these deviations. We find it harder to give a general expression for S. If a model contains a pair interaction for adsorbates at a certain distance, then pair in-teractions at shorter distances should be included as well. Also, if there is a three-particle interaction in a model, then

all pair interactions between these three particles having such three-particle interaction should be included too. One also want to cut off the summation in Eq. 共2兲. Apart from these considerations it seems natural to take all interaction models equally likely.

The integration in Eq.共6兲 can be done easily because the integrand is a Gaussian expression in the integration vari-ables. We define a column vector c via cn⬅Eadscalc.共n兲, a

col-umn vector e via en⬅Eads

fit_{共n兲, and a matrix A via A}

nm ⬅␴n−2␦nm. With these we can write ␹12=共e−c兲TA共e−c兲. The

fitted adsorption energies e can be written as e = Mv, with v as a column vector with the interaction parameters 共vm = Vm兲 and the matrix M contains the coefficients of the in-teraction parameters in Eq. 共2兲. Similarly we write ␹₂2=共v − v兲T_B_{共v−v兲, with v}

n⬅Vn共0兲and Bnm⬅sm

−2_␦

nm. We can com-bine␹₁2and␹₂2and write the result as a quadratic function of the interaction parameters. This gives us

␹1 2 +␹₂2=共v − v˜兲T关MTAM + B兴共v − v˜兲 +␮, 共11兲 with v ˜ =关MT AM + B兴−1共MTAc + Bv兲共12兲 and ␮= cTAc + vTBv − v˜T_关MT AM + B兴v˜. 共13兲

With this expression the integration in Eq. 共6兲 then becomes

P共S兩E兲 = P共S兲 P共E兲 1

冑

兿

n共2␲␴n 2_兲_m=1

兿

N_par

冋

˜sm sm

册

e −␮/2_, _共14兲

with 1/s˜m2 being an eigenvalue of the matrix MTAM + B. Equation共14兲 can be interpreted as follows. The probabil-ity of an interaction model is higher when␮is smaller. If we assume that the prior distribution P共V兩S兲 is very broad, so that we can set B = 0, then ␹₂2= 0, v˜ minimizes ␹₁2_{, and}_␮ _is

the least-squares sum of the difference between the calcu-lated and fitted adsorption energies. This means that the probability P共S兩E兲 becomes higher if the fit becomes better, as expected.

The probability P共S兩E兲 also becomes higher when the prior errors sm become smaller. This is to be expected too because it more or less means that we know a parameter already in advance. Less easy to interpret is the dependence on the s˜_m2_{’s. It seems that a high value for s}_˜

mwould improve the model. This however would be incorrect and also coun-terintuitive. As will be shown below the s˜m’s are error esti-mates for a set of statistically independent interaction param-eters. So a high value for s˜mwould mean a parameter that is ill-defined, which one would not expect to improve an inter-action model. The solution of this paradox is hidden in ␮ which also depends on the s˜m’s. To see this let us compare two interaction models S共1兲and S共2兲with the difference that S共2兲 has one additional parameter compared to S共1兲. Let us also assume, for simplicity, they have the same prior 兵P关S共2兲_{兴= P关S}共1兲_{兴其, that B=0, and that this additional}

param-eter is independent from the others. This means that each s˜m of S共1兲is also found for S共2兲, but S共2兲has an additional factor in the product of Eq. 共14兲 which we call s˜N. We then get

(5)

P共S共2兲兩E兲 P共S共1兲兩E兲= s ˜N sNe −共␮共2兲−␮共1兲兲/2_, _共15兲

with ␮共n兲 as the least-squares sum for S共n兲. When the addi-tional parameter in S共2兲 is independent from the others, the matrix MTAM blocks and ␮共2兲−␮共1兲= −共MTAM兲NN共v˜N兲2 holds. This leads to

P共S共2兲兩E兲 P共S共1兲兩E兲= s ˜N sNe 共v˜N/s˜N兲2/2_. 共16兲

This function is for small values of s˜Na decreasing function. Consequently, if the interaction parameters are defined better 共i.e., smaller s˜m兲, then the probability P共S兩E兲 becomes higher, again as expected. Equation 共16兲 shows also how adding a parameter can reduce the probability of an interac-tion model because the ratio s˜N/sN will usually be smaller than one and because the error estimates in the prior P共V兩S兲 will be large.

The results above depend on the error estimates␴n. This means that the posterior really has these estimates as a pa-rameter and should be written as P共S兩E,␴兲. Because we need to know the ␴n’s but have no good information on them, they are often called nuisance parameters. There are various ways to deal with this.26,27 _{We find it the most}

con-venient to set ␴n=␴ for all n and to get a value for ␴ by determining the maximum of its probability distribution P共␴兩E兲. This probability can be related to probability distri-butions that we have dealt with before.

P共␴兩E兲 ⬀ P共␴兲

兺

S

冕

dVP共E兩S,V,␴兲P共S,V兩␴兲. 共17兲 P共E兩S,V,␴兲 is given by Eq. 共7兲 and P共S,V兩␴兲 by Eq. 共9兲. The proportionality constant can be determined by normal-ization. Only P共␴兲 is new. However, if we assume that it is a uniform distribution then we only need to do the integral and sum over all models. The integral has already been done before, so we only need to add all results for the different models.

The parameters Vm共0兲and sm can also be regarded as nui-sance parameters, but we handle them differently. We do not want to use any prior information for the interaction param-eters, so we take large values for the error estimates sm. As a consequence the values that we take for the Vm共0兲’s have then only a negligible effect on the final results共see Sec. III兲.

Once we have determined S we also need the parameters V. For this we can used P共S,V兩E兲, which equals P共E兩S,V兲P共S,V兲. We have already seen the two factors in this product; the only difference with what we have done before is that we do not need to integrate out the interaction parameters. The interaction parameters themselves can be obtained from P共S,V兩E兲 by computing the expectation val-ues of V. The derivation above shows that the parameters should be chosen equal to components of the vector v˜. P共S,V兩E兲 can give us also error estimates for the parameters: the covariance matrix is given by 共MTAM + B兲−1.

We compare the Bayesian approach with the LOO-CV method.17,18,28 _{This method works as follows. We take all}

structures except structure k. We then do a multivariate linear regression for a particular interaction model. This gives us a set of interaction parameters that we can use to predict the adsorption energy for structure k. We then compare this en-ergy E_adspred共k兲 with the calculated energy E_adscalc.共k兲. We do this not just for one structure k but for all structures and define the cross-validation score or leave-one-out error,

R_CV2 = 1 Nstr

兺

k

关Eadspred共k兲 − Eadscalc.共k兲兴2. 共18兲

This error indicates how well an interaction model predicts the energy. Adding parameters will not necessarily decrease this error. An increase indicates overfitting.

III. LATERAL INTERACTIONS BETWEEN NO MOLECULES ON Rh(111)

In a previous paper we have used the LOO-CV method to determine the lateral interactions in O/Pt共111兲.19_{The results}

obtained with the Bayesian approach are essentially the same. The lateral interactions to be included in the cluster expansion are the same, and also the values of these interac-tions are the same if we assume that the prior errors of the interactions are large. This is because B in Eq.共12兲 can then be ignored and the expression reduces to the same expression we would obtain in the case of LOO-CV or a multivariate linear regression. The only difference between the Bayesian approach and LOO-CV is that they give slightly different values for the error estimates of the interactions, and the former also gives probabilities for the inclusion of param-eters in the interaction model. The advantages of the Baye-sian approach are better demonstrated for NO/Rh共111兲, which is a system with many more parameters. In O/Pt共111兲 the oxygen atoms only occupy threefold fcc sites. The NO molecules on Rh共111兲 prefer the threefold hcp site, but the difference in energy with an NO on the fcc site is small.29,30 Moreover, at coverages above 0.5 ML the repulsion between the NO molecules becomes so large that even top sites be-come occupied.31,32 _{This means that compared to O/Pt共111兲}

there are three times the number of interaction parameters between adsorbates on like sites plus parameters for the in-teraction parameters for adsorbates on different types of sites.

We have done DFT calculations of various adlayer struc-tures of NO/Rh共111兲 with the VASP code,33 which uses a

plane-wave basis set and the 共relativistic兲 ultrasoft pseudo-potentials introduced by Vanderbilt34 _{and generated by}

Kresse and Hafner.35_{The generalize gradient approximation}

of Perdew and Wang 共PW-91兲 has been used.36_All

calcula-tions on NO/Rh共111兲 were done with a surface model con-sisting of a supercell with a slab of five metal layers sepa-rated by five metal layers replaced by vacuum. Grids of size 3⫻3⫻1−9⫻9⫻1 共depending on the unit cell of the ad-layer兲 for Brillouin-zone sampling obtained via the Monckhorst-package were used and a cutoff of 400 eV. This yielded in all cases adsorption energies per NO molecule converged to within 5 kJ/mol with respect to k-point sam-pling, energy cutoff, number of slab and vacuum layers, and cell size.

(6)

The adsorption energy per NO molecule was computed for 74 different structures of the adlayer. Of these structures 19 contained adsorbates only on hcp sites, 18 adsorbates only on fcc sites, 14 adsorbates only on top sites, and 23 adsorbates on different sites. An initial set of about 65 struc-tures was created to have a good representation of all inter-actions that might possibly be present in the system. It also contained all experimentally observed structures. The tures not in this initial set were added because these struc-tures were observed in kinetic Monte Carlo simulations with interactions obtained from using a limited set of structures or related to these structures by shifting the whole adlayer so that the adsorbates became adsorbed at different sites.37–39

For each adsorption site共hcp, fcc, and top兲 we have one interaction parameter, the adsorption energy, for an isolated NO molecule at such site. Figure1shows the lateral interac-tions that we have considered as well. In all there are 91 candidate parameters for the interaction model. The pair in-teractions between NO molecules on different types of sites, a distance a/

冑

3 apart, with a as the nearest distance between two Rh atoms, were not included. The reason is that at this distance there is such a strong repulsion between the NO molecules that the molecules in the geometry optimization of the DFT calculations are pushed to other sites. In the kinetic Monte Carlo simulations we used somewhat arbitrarily a value for this interaction of 100 kJ/mol, which prevented the molecules getting so close. In the real system this approach is also avoided.

To get some idea what to choose for the priors P共V兩S兲 we did some linear regressions using only a small number of

interaction parameters. These showed that the most impor-tant interactions were the adsorption energies and the nearest-neighbor interaction for NO molecules at like sites. Based on these calculations we chose for Vm共0兲 in the prior −250 kJ/mol for all three adsorption energies and 15 kJ/mol for the interaction between nearest neighbors on like sites and next-nearest neighbors on different types of sites. For the other lateral interactions we chose 2.5 kJ/mol. As these esti-mates are rather crude, and we did not want the prior to affect the final values of the interaction parameters, we used a value of 100 kJ/mol for all error estimates sm.

Because the number of interaction parameters is so large the total number of subsets S is enormous 共291_{− 1兲. It is}

therefore not possible to generate all terms in summations over S as in Eq.共17兲. It is also not possible to consider each interaction model of course. To deal with this problem we did the following. We started with a guess for the accuracy of the DFT calculations ␴. We used ␴= 10 kJ/mol. Then we took as a first set of interaction parameters only the three adsorption energies of the isolated NO molecules. Next we added in turn each of the other interaction parameters and determined which one gave the largest increase in P共S兩E兲. This one was then added to the set. With this extended set this procedure of adding each interaction parameter in turn and determining the change in P共S兩E兲 was repeated. This led to an iterative procedure of adding again and again one pa-rameter to the set until a maximum for P共S兩E兲 was obtained. To check that the model of the interaction parameters did not correspond to a local maximum of P共S兩E兲, we also looked how the probability changed when two interaction param-eters instead of one were added, and we looked at the changes in P共S兩E兲 upon removing interaction parameters. Because we never considered an interaction model with more than about 25 parameters, we never had the problem that we did not have sufficient structures to determine the parameters.

This procedure gave us a smaller set of parameters for which we could generate all subsets S. We used this subset to determine␴by maximizing P共␴兩E兲 in Eq. 共17兲. This implies that we assumed that for interaction models with other pa-rameters P共S兩E兲=0. With this value for ␴ the procedure in the previous paragraph was repeated. Occasionally this led to the inclusion of new interaction parameters. In that case the whole procedure had to be repeated. We also repeated the whole procedure when new interaction parameters had to be included because we added new adlayer structures.

The final results gave us an error estimate for the DFT calculations of␴= 3.5 kJ/mol. This is about the same as the root-mean-square deviation of the fitted adsorption energies with respect to the calculated ones. In fact, it seems that this root-mean-square deviation can be used as an error estimate for ␴ that is somewhat easier to obtain. The value of ␴ is rather small. We have noted before that this is in part due to the fact that there is probably a systematic component in the error of the DFT adsorption energies.20 _{Such a systematic}

component does not affect␴ or the root-mean-square devia-tion. This can be seen best when we have only one type of site. In Eq. 共2兲 the adsorption energy of that site then has

cm共n兲=1 for all structures n. Consequently, a systematic er-ror then only shifts the adsorption energy but does not affect

(a)

(b)

FIG. 1. Definition of all lateral interactions. The large circles are Rh atoms. Only the first two layers of the substrate are shown.共a兲 shows the interactions between NO molecules on hcp sites. There is a similar set of interactions for fcc and top sites. 共b兲 shows the interactions between NO molecules on different types of sites. Only a selection of all of these interactions is shown. The others have the same distances between the molecules and angles between the in-termolecular vectors but have molecules at different sites.

(7)

the other interaction parameters. Because we have three ad-sorption sites in NO/Rh共111兲, the situation is a bit more com-plicated, but a systematic error will again only cause a shift of the adsorption energies, which moreover is the same for all sites. This is because the sum of the coefficients of the adsorption energies in Eq.共2兲 is equal to 1 for all structures. Note that by systematic error we mean a contribution to E_adscalc. that is numerically the same for all structures. This may, but need not, be related to some particular physical mechanism of an interaction that DFT does not properly take into ac-count.

Figure 2 shows how the probabilities P共S兩E兲 change when parameters are added to and removed from the inter-action model. Adding parameters initially increases a mod-el’s probability, but there is a well-defined maximum after which adding parameters only decreases the probability. The fact that there is such a clear maximum is an important ad-vantage of the Bayesian approach over the LOO-CV method.

The results of that method are shown in the figure too. The CV score is shown to decrease initially, but then it becomes almost constant when more parameters are added. It is very hard to determine the minimum of the CV score. Moreover, we found that the interaction model corresponding to the minimum depends quite sensitively on adding or removing structures in the fit. So it is difficult to determine the inter-action model with the LOO-CV method. This problem has been observed before for the determination of interactions in alloys.22–24

Figure 2 shows that the model with the maximum prob-ability has 13 parameters. It is also seen that the CV score is still clearly decreasing for that model. The Bayesian ap-proach thus gives a cluster expansion with fewer terms than LOO-CV. The smallest CV score is 2.7 kJ/mol, which is only a bit smaller than the 3.5 kJ/mol for the root-mean-square deviation for the 13 parameter models. As this score is achieved at the cost of almost doubling the number of inter-action parameters, one may wonder if LOO-CV really pre-vents overfitting.

Because of the difficulty of determining the minimum in the CV score the LOO-CV method takes also somewhat more time to do. Compared to a single calculation of a prob-ability of an interaction model a calculation of a single CV score takes more time. This is because for the Bayesian ap-proach Eqs. 共12兲–共14兲 have to be calculated only once, whereas for the CV score such a calculation has to be done for each calculation of E_adspred共k兲 in Eq. 共18兲. On the other hand the determination of the error estimate ␴ in Eq. 共8兲 means that the whole procedure in the Bayesian approach has to be repeated several times. The time to do any of these methods is negligible however compared to the time it takes to do the DFT calculations.

The interaction model with the maximum probability has a probability of 0.15. This may seem low, but one should realize that there are many interaction models that also have a relatively high probability 共but still much lower than the one with the maximum probability兲 and that differ in only a few parameters from the interaction model with the highest probability. There are three interaction models that can be obtained from the most likely model by leaving out one of the next-nearest pair interaction for NO molecules on differ-ent types of sites with probabilities of 0.06, 0.06, and 0.04. Adding a next-next-nearest pair interaction for one NO on an hcp and one NO on an fcc site gives a model with probability of 0.06. All other models have lower probabilities, but there are 17 models with probabilities of 0.01 or higher.

It is more insightful to look at the probabilities of the individual parameters. We can calculate these probabilities by summing the probabilities of all interaction models that contain a particular parameter. The results are shown in Table I. From that table we see that there is a clear distinc-tion between parameters that should be included in the inter-action model and those that can be left out. There are only a few exceptions: the next- and next-next-nearest pair interac-tions for NO’s on hcp sites and the next-next-nearest pair interaction for one NO on hcp and one NO on fcc. In the most likely model the first two are included, but the last one is not.

From Eq.共15兲 it seems that adding an interaction param-eter can be made to lead to a lower probability P共S兩E兲 by

1e−10 1e−08 1e−06 1e−04 0.01 1 8 10 12 14 interaction model probability number of parameters 2 4 6 8 10 5 10 15 20 25 number of parameters CV score (a) (b)

FIG. 2. The probabilities of the共a兲 interaction models and 共b兲 CV score共in kJ/mol兲 as a function of the number of parameters in the model. The lines are guides for the eyes and show what happens when a parameter is added. Models not connected by a line were obtained by removing a parameter. The fat lines indicate that adding a parameter improves the model.

(8)

simply taking a large value of the prior error estimate sN. This is indeed the case. However, the probability P共S兩E兲 varies by many orders of magnitude as can be seen in Fig.2. To change the ordering of the probabilities of the interaction models appreciably, the prior error estimates would have to be increased by an order of magnitude or more as well, giv-ing exaggerated values for them. Reducgiv-ing the values of these error estimates can lead to the acceptance of more in-teraction parameters, but then one is effectively saying that one already knows the value of a parameter in advance. We have found that there is a large range for the prior error estimates that hardly fixes the values of the interaction pa-rameters in advance, yet always leads to the same preferred interaction model.

TableIIshows the values of the interaction parameters for the most likely interaction model. All the lateral interactions are repulsive. Note that some parameters have values close to the value Vm共0兲of the prior, but some differ quite substantially. In particular the adsorption energy of the top site differs about 60 kJ/mol. The error estimates in the table are obtained from the diagonal elements of the matrix MTAM + B, which

is the inverse of the covariance matrix for the interaction parameters. The relative error can be quite large for the weaker interactions, but this is only because these interac-tions are so weak. The covariances are generally in absolute value smaller than 1 kJ2/mol2except the one for the nearest-neighbor pair interaction and the linear three-particle inter-action which are −2.2 kJ2_/mol2 _{for NO’s on hcp and}

−2.5 kJ2_/mol2_{for NO’s on fcc sites. This means that if we}

would leave out the three-particle interactions, we would find a too high nearest-neighbor pair interaction for NO’s on hcp and fcc sites. This was indeed found in a fit without these three-particle interactions.

Initially we did not use all the 74 structures. We started with a somewhat smaller set. We used this set to determine the lateral interactions and then did kinetic Monte Carlo simulations. These simulations showed some structures that were not in our initial set. So we did DFT calculations for these structures as well and calculated the lateral interactions again. This caused however only very small changes in the interaction parameters, which indicates that the method is robust; i.e., the results do not depend on details of the calcu-lations. Because these structures show up in simulations, it should be possible to observe them in experiments as well. Indeed, this has been done in the meantime with scanning-tunneling microscopy.40

Figure 3shows the calculated and fitted adsorption ener-gies as a function of coverages. The figure confirms that the difference between the calculated and fitted results is small. The exceptions can be mainly found among the less relevant structures with a high energy. These are structures that con-tain structures with NO molecules on top sites at low cover-age. Stable structures with adsorbates on top site can only be found above 0.5 ML, but then the fit is good. The tangent construct shows the stable phases at 0 K. The calculated and the fitted results show that there are phase transitions at 0.5 and 0.778 ML. The structures at which the transition takes place are the c共4⫻2兲−2NO 共Refs. 31 and 41兲 and the 3

TABLE I. Probabilities for the some important interactions. Interaction parameters that are not shown have a small probability; the largest being 0.03 for the linear four-particle interactions for hcp and fcc sites. Note that the nearest pair interaction between NO molecules on different types of site has not been deter-mined共see text兲.

hcp fcc top hcp-fcc hcp-top fcc-top Adsorption energy 1.00 1.00 1.00

Nearest pair interaction 1.00 1.00 1.00

Next-nearest pair interaction 0.67 0.01 0.01 0.87 0.90 0.91 Next-next-nearest pair interaction 0.59 0.00 0.00 0.30 0.02 0.02 Linear three-particle interaction 0.99 1.00 0.02

Triangular three-particle interaction 0.04 0.06 0.04 Bent three-particle interaction 0.01 0.01 0.01

TABLE II. Values for the interaction parameters with standard deviations in parentheses共in kJ/mol兲 for the most likely interaction model. The parameter with the largest probability共0.30兲 that has not been included in this model is the next-next-nearest pair interaction between one NO molecule on an hcp and one NO on an fcc site.

hcp Fcc top hcp-fcc hcp-top fcc-top Adsorption energy −254.4 共1.4兲 −245.6 共1.1兲 −190.4 共1.2兲

Nearest pair interaction 15.6共1.8兲 17.1共1.6兲 19.1共1.0兲

Next-nearest pair interaction 3.0共1.0兲 4.1共1.4兲 5.7 共2.0兲 5.8 共1.9兲 Next-next-nearest pair interaction 3.1共0.9兲

(9)

⫻3−7NO structures.32_{Both have been observed}

experimen-tally before. Other structures have been observed as well, but they can be shown to be entropy stabilized.40

IV. SUMMARY AND CONCLUSIONS

We have presented an alternative to the LOO-CV method for the truncation problem of the cluster expansion for lateral interactions when using DFT calculations. The method uses Bayesian statistics. The main advantage seems to be that the method clearly shows which parameters include in the inter-action model even if the set of all candidate parameters be-comes large. The Bayesian approach seems to yield a more compact interaction model with fewer terms in the cluster

expansion. The resulting model fits the DFT results only slightly worse than the model obtained with LOO-CV. As the latter yields many more terms of the expansion, it might be that we still get overfitting with LOO-CV. The Bayesian method also has the advantage that the results are easier to analyze. Apart from the values of the parameters we get probabilities that they should be included in an interaction model and correlations between these parameters. A draw-back of the Bayesian approach is that the theory is more complicated than that of LOO-CV, and it requires the deter-mination of some parameters 共prior probabilities and error estimates of the data to be fitted兲 that is not needed in LOO-CV.

We used the Bayesian approach to compute the interac-tions for NO/Rh共111兲. This is a complicated system because the NO molecules adsorb at three different sites depending on coverage. The approach yields an interaction model with 13 parameters out of a total of 91 possible parameters. Com-parison of the resulting structures using these parameters in kinetic Monte Carlo simulations shows good agreements with established experimental results. We also found struc-tures that we could later confirm using scanning-tunneling microscopy.40

We have taken a system from surface science as an illus-trative example because we are working in that field. How-ever, the Bayesian approach should also be useful for the study of alloys. In fact, it might even be more useful there because the number of terms in the cluster expansion for alloys increases more rapidly than for layers of adsorbates.

ACKNOWLEDGMENTS

This research was financially supported by the Foundation for Fundamental Research on Matter 共FOM兲, The Nether-lands, and by the Deutsche Forschungsgemeinschaft共DFG兲, Germany, in the frame of the DFG-priority program No. SPP 1091, “Bridging the gap between ideal and real systems in heterogeneous catalysis.”

1_{J. Trost, T. Zambelli, J. Wintterlin, and G. Ertl, Phys. Rev. B 54,} 17850共1996兲.

2_{L. Österlund, M. O. Pedersen, I. Stensgaard, E. Lægsgaard, and} F. Besenbacher, Phys. Rev. Lett. 83, 4812共1999兲.

3_{S. Renisch, R. Schuster, J. Wintterlin, and G. Ertl, Phys. Rev.} Lett. 82, 3839共1999兲.

4_{N. V. Petrova and I. N. Yakovkin, Surf. Sci. 519, 90}_共2002兲. 5_{C. Schwennicke and H. Pfnür, Phys. Rev. B 56, 10558}_共1997兲. 6_{P. Piercy, K. De’Bell, and H. Pfnür, Phys. Rev. B 45, 1869}

共1992兲.

7_{D. L. S. Nieskens, A. P. van Bavel, and J. W. Niemantsverdriet,} Surf. Sci. 546, 159共2003兲.

8_{L. J. Whitman and W. Ho, J. Chem. Phys. 90, 6018}_共1989兲. 9_{M. Kiskinova, A. Szabo, and J. T. Yates, Jr., J. Chem. Phys. 89,}

7599共1988兲.

10_{A. Szabo, M. Kiskinova, and J. T. Yates, Jr., J. Chem. Phys. 90,} 4604共1989兲.

11_{A. P. J. Jansen, Phys. Rev. B 69, 035414}_共2004兲.

12_{J. T. Stuckless, N. Al-Sarraf, C. Wartnaby, and D. A. King, J.} Chem. Phys. 99, 2202共1993兲.

13_{L. Vattuone, Y. Y. Yeo, and D. A. King, J. Chem. Phys. 104,} 8096共1996兲.

14_{Y. Y. Yeo, L. Vattuone, and D. A. King, J. Chem. Phys. 106,} 1990共1997兲.

15_{R. Kose, W. A. Brown, and D. A. King, J. Phys. Chem. B 103,} 8722共1999兲.

16_{J. N. Murrell, S. Carter, P. Huxley, S. C. Farantos, and A. J. C.} Varandas, Molecular Potential Energy Functions共Wiley, Chich-ester, 1984兲.

17_{A. van der Walle and G. Ceder, J. Phase Equilib. 23, 348}_共2002兲. 18_{V. Blum and A. Zunger, Phys. Rev. B 69, 020103}_{共R兲共2004兲.} 19_{A. P. J. Jansen and W. K. Offermans, in Computational Science}

and its Applications—ICCSA-2005, LNCS 3480, edited by O.

Gervasi共Springer, Berlin, 2005兲.

−240 −220 −200 −180 −160 −140 0.2 0.4 0.6 0.8 1.0 ad sorpt ion energy coverage

FIG. 3. Adsorption energy 共in kJ/mol兲 per NO molecule as a function of coverage. The closed circles are DFT results; the open circles are results from the most likely interaction model. Corre-sponding structures are connected by thin lines. The fat lines are the tangent constructs for the DFT results共solid兲 and the model results 共dashed兲.

(10)

20_{C. G. M. Hermse and A. P. J. Jansen, in Catalysis, edited by J. J.} Spivey and K. M. Dooley共Royal Society of Chemistry, London, 2006兲, Vol. 19.

21_{Y. Zhang, V. Blum, and K. Reuter, Phys. Rev. B 75, 235406} 共2007兲.

22_{N. A. Zarkevich and D. D. Johnson, Phys. Rev. Lett. 92, 255702} 共2004兲.

23_{R. Drautz and A. Díaz-Ortiz, Phys. Rev. B 73, 224207}_共2006兲. 24_{D. E. Nanu, Y. Deng, and A. J. Böttger, Phys. Rev. B 74, 014113}

共2006兲.

25_{E. T. Jaynes and G. L. Bretthorst, Probability Theory: The Logic}

of Science共Cambridge University Press, Cambridge, 2003兲.

26_{A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian}

Data Analysis 共Chapman & Hall, London/CRC, Boca Raton,

FL, 2003兲.

27_{D. Sivia and J. Skilling, Data Analysis: A Bayesian Tutorial} 共Oxford University Press, Oxford, 2006兲.

28_{D. M. Hawkins, J. Chem. Inf. Comput. Sci. 44, 1}_共2004兲. 29_{D. Loffreda, D. Simon, and P. Sautet, Chem. Phys. Lett. 291, 15}

共1998兲.

30_{D. Loffreda, F. Delbecq, D. Simon, and P. Sautet, J. Chem. Phys.}

115, 8101共2001兲.

31_{I. Zasada, M. A. V. Hove, and G. A. Somorjai, Surf. Sci. 418,} L89共1998兲.

32_{K. B. Rider, K. S. Hwang, M. Salmeron, and G. A. Somorjai,} Phys. Rev. Lett. 86, 4330共2001兲.

33_{G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169}_共1996兲. 34_{D. Vanderbilt, Phys. Rev. B 41, 7892}_共1990兲.

35_{G. Kresse and J. Hafner, J. Phys.: Condens. Matter 6, 8245} 共1994兲.

36_{J. P. Perdew, in Electronic Structure of Solids ’91, edited by P.} Ziesche and H. Eschrig共Akademie, Berlin, 1991兲, p. 11. 37_{J. J. Lukkien, J. P. L. Segers, P. A. J. Hilbers, R. J. Gelten, and A.}

P. J. Jansen, Phys. Rev. E 58, 2598共1998兲.

38_{A. P. J. Jansen, an introduction to Monte Carlo simulations of} surface reactions, arXiv:cond-mat/0303028.

39_{A. P. J. Jansen, The kinetic Monte Carlo website, http://} www.catalysis.nl/chembond/kMC/, 2006.

40_{J. H. A. Hagelaar, The Role of the Electron Trajectory in}

Scan-ning Tunneling Microscopy: Elastic and Inelastic Tunneling Through NO On Rh(111)共Eindhoven University of Technology,

Eindhoven, 2008兲.

41_{C.-T. Kao, G. S. Blackman, M. A. V. Hove, G. A. Somorjai, and} C.-M. Chan, Surf. Sci. 224, 77共1989兲.