Basis set (chemistry)


A basis set in theoretical and computational chemistry is a set of functions that is used to represent the electronic wave function in the Hartree–Fock method or density-functional theory in order to turn the partial differential equations of the model into algebraic equations suitable for efficient implementation on a computer.
The use of basis sets is equivalent to the use of an approximate resolution of the identity. The single-particle states are then expressed as linear combinations of the basis functions.
The basis set can either be composed of atomic orbitals, which is the usual choice within the quantum chemistry community, or plane waves which are typically used within the solid state community. Several types of atomic orbitals can be used: Gaussian-type orbitals, Slater-type orbitals, or numerical atomic orbitals. Out of the three, Gaussian-type orbitals are by far the most often used, as they allow efficient implementations of Post-Hartree–Fock methods.

Introduction

In modern computational chemistry, quantum chemical calculations are performed using a finite set of basis functions. When the finite basis is expanded towards an complete set of functions, calculations using such a basis set are said to approach the complete basis set limit. In this article, basis function and atomic orbital are sometimes used interchangeably, although the basis functions are usually not true atomic orbitals, because many basis functions are used to describe polarization effects in molecules.
Within the basis set, the wavefunction is represented as a vector, the components of which correspond to coefficients of the basis functions in the linear expansion. In such a basis, one-electron operators correspond to matrices, whereas two-electron operators are rank four tensors.
When molecular calculations are performed, it is common to use a basis composed of atomic orbitals, centered at each nucleus within the molecule. The physically best motivated basis set are Slater-type orbitals,
which are solutions to the Schrödinger equation of hydrogen-like atoms, and decay exponentially far away from the nucleus. It can be shown that the molecular orbitals of Hartree-Fock and density-functional theory also exhibit exponential decay. Furthermore, S-type STOs also satisfy Kato's cusp condition at the nucleus, meaning that they are able to accurately describe electron density near the nucleus. However, hydrogen-like atoms lack many-electron interactions, thus the orbitals do not accurately describe electron state correlations.
Unfortunately, calculating integrals with STOs is computationally difficult and it was later realized by Frank Boys that STOs could be approximated as linear combinations of Gaussian-type orbitals instead. Because the product of two GTOs can be written as a linear combination of GTOs, integrals with Gaussian basis functions can be written in closed form, which leads to huge computational savings.
Dozens of Gaussian-type orbital basis sets have been published in the literature. Basis sets typically come in hierarchies of increasing size, giving a controlled way to obtain more accurate solutions, however at a higher cost.
The smallest basis sets are called minimal basis sets. A minimal basis set is one in which, on each atom in the molecule, a single basis function is used for each orbital in a Hartree–Fock calculation on the free atom. For atoms such as lithium, basis functions of p type are also added to the basis functions that correspond to the 1s and 2s orbitals of the free atom, because lithium also has a 1s2p bound state. For example, each atom in the second period of the periodic system would have a basis set of five functions .
The minimal basis set is close to exact for the gas-phase atom. In the next level, additional functions are added to describe polarization of the electron density of the atom in molecules. These are called polarization functions. For example, while the minimal basis set for hydrogen is one function approximating the 1s atomic orbital, a simple polarized basis set typically has two s- and one p-function. This adds flexibility to the basis set, effectively allowing molecular orbitals involving the hydrogen atom to be more asymmetric about the hydrogen nucleus. This is very important for modeling chemical bonding, because the bonds are often polarized. Similarly, d-type functions can be added to a basis set with valence p orbitals, and f-functions to a basis set with d-type orbitals, and so on.
Another common addition to basis sets is the addition of diffuse functions. These are extended Gaussian basis functions with a small exponent, which give flexibility to the "tail" portion of the atomic orbitals, far away from the nucleus. Diffuse basis functions are important for describing anions or dipole moments, but they can also be important for accurate modeling of intra- and intermolecular bonding.

Minimal basis sets

The most common minimal basis set is STO-nG, where n is an integer. This n value represents the number of Gaussian primitive functions comprising a single basis function. In these basis sets, the same number of Gaussian primitives comprise core and valence orbitals. Minimal basis sets typically give rough results that are insufficient for research-quality publication, but are much cheaper than their larger counterparts. Commonly used minimal basis sets of this type are:
There are several other minimum basis sets that have been used such as the MidiX basis sets.

Split-valence basis sets

During most molecular bonding, it is the valence electrons which principally take part in the bonding. In recognition of this fact, it is common to represent valence orbitals by more than one basis function. Basis sets in which there are multiple basis functions corresponding to each valence atomic orbital are called valence double, triple, quadruple-zeta, and so on, basis sets. Since the different orbitals of the split have different spatial extents, the combination allows the electron density to adjust its spatial extent appropriate to the particular molecular environment. In contrast, minimal basis sets lack the flexibility to adjust to different molecular environments.

Pople basis sets

The notation for the split-valence basis sets arising from the group of John Pople is typically X-YZg. In this case, X represents the number of primitive Gaussians comprising each core atomic orbital basis function. The Y and Z indicate that the valence orbitals are composed of two basis functions each, the first one composed of a linear combination of Y primitive Gaussian functions, the other composed of a linear combination of Z primitive Gaussian functions. In this case, the presence of two numbers after the hyphens implies that this basis set is a split-valence double-zeta basis set. Split-valence triple- and quadruple-zeta basis sets are also used, denoted as X-YZWg, X-YZWVg, etc. Here is a list of commonly used split-valence basis sets of this type:
The 6-31G* basis set is a valence double-zeta polarized basis set that adds to the 6-31G set five d-type Cartesian-Gaussian polarization functions on each of the atoms Li through Ca and ten f-type Cartesian Gaussian polarization functions on each of the atoms Sc through Zn.
Pople basis sets are somewhat outdated, as correlation-consistent or polarization-consistent basis sets typically yield better results with similar resources. Also note that some Pople basis sets have grave deficiencies that may lead to incorrect results.

Correlation-consistent basis sets

Ones of the most widely used basis sets are those developed by Dunning and coworkers, since they are designed for converging Post-Hartree–Fock calculations systematically to the complete basis set limit using empirical extrapolation techniques.
For first- and second-row atoms, the basis sets are cc-pVNZ where N=D,T,Q,5,6,.... The 'cc-p', stands for 'correlation-consistent polarized' and the 'V' indicates they are valence-only basis sets. They include successively larger shells of polarization functions. More recently these 'correlation-consistent polarized' basis sets have become widely used and are the current state of the art for correlated or post-Hartree–Fock calculations. Examples of these are:
For period-3 atoms, additional functions have turned out to be necessary; these are the cc-pVZ basis sets. Even larger atoms may employ pseudopotential basis sets, cc-pVNZ-PP, or relativistic-contracted Douglas-Kroll basis sets, cc-pVNZ-DK.
While the usual Dunning basis sets are for valence-only calculations, the sets can be augmented with further functions that describe core electron correlation. These core-valence sets can be used to approach the exact solution to the all-electron problem, and they are necessary for accurate geometric and nuclear property calculations.
Weighted core-valence sets have also been recently suggested. The weighted sets aim to capture core-valence correlation, while neglecting most of core-core correlation, in order to yield accurate geometries with smaller cost than the cc-pCVXZ sets.
Diffuse functions can also be added for describing anions and long-range interactions such as Van der Waals forces, or to perform electronic excited-state calculations, electric field property calculations. A recipe for constructing additional augmented functions exists; as many as five augmented functions have been used in second hyperpolarizability calculations in the literature. Because of the rigorous construction of these basis sets, extrapolation can be done for almost any energetic property. However, care must be taken when extrapolating energy differences as the individual energy components converge at different rates: the Hartree-Fock energy converges exponentially, whereas the correlation energy converges only polynomially.
H-HeLi-NeNa-Ar
cc-pVDZ → 5 func. → 14 func. → 18 func.
cc-pVTZ → 14 func. → 30 func. → 34 func.
cc-pVQZ → 30 func. → 55 func. → 59 func.
aug-cc-pVDZ → 9 func. → 23 func. → 27 func.
aug-cc-pVTZ → 23 func. → 46 func. → 50 func.
aug-cc-pVQZ → 46 func. → 80 func. → 84 func.

To understand how to get the number of functions take the cc-pVDZ basis set for H:
There are two s orbitals and one p orbital that has 3 components along the z-axis corresponding to px, py and pz. Thus, five spatial orbitals in total. Note that each orbital can hold two electrons of opposite spin.
For example, Ar has 3 s orbitals and 2 sets of p orbitals. Using cc-pVDZ, orbitals are , with 4 s orbitals, 3 sets of p orbitals, and 1 set of d orbitals. Adding up the basis functions gives a total of 18 functions for Ar with the cc-pVDZ basis-set.

Polarization-consistent basis sets

has recently become widely used in computational chemistry. However, the correlation-consistent basis sets described above are suboptimal for density-functional theory, because the correlation-consistent sets have been designed for Post-Hartree–Fock, while density-functional theory exhibits much more rapid basis set convergence than wave function methods.
Adopting a similar methodology to the correlation-consistent series, Frank Jensen introduced polarization-consistent basis sets as a way to quickly converge density functional theory calculations to the complete basis set limit. Like the Dunning sets, the pc-n sets can be combined with basis set extrapolation techniques to obtain CBS values.
The pc-n sets can be augmented with diffuse functions to obtain augpc-n sets.

Karlsruhe basis sets

Some of the various valence adaptations of Karlsruhe basis sets are
Gaussian-type orbital basis sets are typically optimized to reproduce the lowest possible energy for the systems used to train the basis set. However, the convergence of the energy does not imply convergence of other properties, such as nuclear magnetic shieldings, the dipole moment, or the electron momentum density, which probe different aspects of the electronic wave function.
Manninen and Vaara have proposed completeness-optimized basis sets, where the exponents are obtained by maximization of the one-electron completeness profile instead of minimization of the energy. Completeness-optimized basis sets are a way to easily approach the complete basis set limit of any property at any level of theory, and the procedure is simple to automatize.
Completeness-optimized basis sets are tailored to a specific property. This way, the flexibility of the basis set can be focused on the computational demands of the chosen property, typically yielding much faster convergence to the complete basis set limit than is achievable with energy-optimized basis sets.

Plane-wave basis sets

In addition to localized basis sets, plane-wave basis sets can also be used in quantum-chemical simulations. Typically, the choice of the plane wave basis set is based on a cutoff energy. The plane waves in the simulation cell that fit below the energy criterion are then included in the calculation. These basis sets are popular in calculations involving three-dimensional periodic boundary conditions.
The main advantage of a plane-wave basis is that it is guaranteed to converge in a smooth, monotonic manner to the target wavefunction. In contrast, when localized basis sets are used, monotonic convergence to the basis set limit may be difficult due to problems with over-completeness: in a large basis set, functions on different atoms start to look alike, and many eigenvalues of the overlap matrix approach zero.
In addition, certain integrals and operations are much easier to program and carry out with plane-wave basis functions than with their localized counterparts. For example, the kinetic energy operator is diagonal in the reciprocal space. Integrals over real-space operators can be efficiently carried out using fast Fourier transforms. The properties of the Fourier Transform allow a vector representing the gradient of the total energy with respect to the plane-wave coefficients to be calculated with a computational effort that scales as NPW*ln where NPW is the number of plane-waves. When this property is combined with separable pseudopotentials of the Kleinman-Bylander type and pre-conditioned conjugate gradient solution techniques, the dynamic simulation of periodic problems containing hundreds of atoms becomes possible.
In practice, plane-wave basis sets are often used in combination with an 'effective core potential' or pseudopotential, so that the plane waves are only used to describe the valence charge density. This is because core electrons tend to be concentrated very close to the atomic nuclei, resulting in large wavefunction and density gradients near the nuclei which are not easily described by a plane-wave basis set unless a very high energy cutoff, and therefore small wavelength, is used. This combined method of a plane-wave basis set with a core pseudopotential is often abbreviated as a PSPW calculation.
Furthermore, as all functions in the basis are mutually orthogonal and are not associated with any particular atom, plane-wave basis sets do not exhibit basis-set superposition error. However, the plane-wave basis set is dependent on the size of the simulation cell, complicating cell size optimization.
Due to the assumption of periodic boundary conditions, plane-wave basis sets are less well suited to gas-phase calculations than localized basis sets. Large regions of vacuum need to be added on all sides of the gas-phase molecule in order to avoid interactions with the molecule and its periodic copies. However, the plane waves use a similar accuracy to describe the vacuum region as the region where the molecule is, meaning that obtaining the truly noninteracting limit may be computationally costly.

Real-space basis sets

Analogous to the plane wave basis sets, where the basis functions are eigenfunctions of the momentum operator, there are basis sets whose functions are eigenfunctions of the position operator, that is, points on a uniform mesh in real space. The actual implementation may use finite differences, finite elements or Lagrange sinc-functions, or wavelets.
Since functions form an orthonormal, analytical, and complete basis set. The convergence to the complete basis set limit is systematic and relatively simple. Similarly to plane wave basis sets, the accuracy of sinc basis sets is controlled by an energy cutoff criterion.
In the case of wavelets and finite elements, it is possible to make the mesh adaptive, so that more points are used close to the nuclei. Wavelets rely on the use of localized functions that allow for the development of linear-scaling methods.

Even-tempered basis sets

In 1974 Bardo and Ruedenberg proposed a simple scheme to generate the exponents of a basis set that spans the Hilbert space evenly by following a geometric progression of the form:
for each angular momentum, where is the number of primitives functions. Here, only the two parameters and must be optimized, significantly reducing the dimension of the search space or even avoiding the exponent optimization problem. In order to properly describe electronic delocalized states, a previously optimized standard basis set can be complemented with additional delocalized Gaussian functions with small exponent values, generated by the even-tempered scheme. This approach has also been employed to generate basis sets for other types of quantum particles rather than electrons, like quantum nuclei, negative muons or positrons.