HAB79: A New Molecular Dataset for Benchmarking DFT and DFTB Electronic Couplings Against High-Level Ab-initio Calculations

A new molecular dataset called HAB79 is introduced to provide ab-initio reference values for electronic couplings (transfer integrals) and to benchmark density functional theory (DFT) and density functional tight-binding (DFTB) calculations. The HAB79 dataset is comprised of 79 planar heterocyclic polyaromatic hydrocarbon molecules frequently encountered in organic (opto)electronics, arranged to 921 structurally diverse dimer configurations. We show that CASSCF/NEVPT2 with a minimal active space provides a robust reference method that can be applied to the relatively large molecules of the dataset. Electronic couplings are largest for cofacial dimers, in particular sulfur-containing polyaromatic hydrocarbons, with values in excess of 0.5 eV, followed by parallel displaced cofacial dimers. V-shaped dimer motifs, often encountered in the herringbone layers of organic crystals, exhibit medium-sized couplings whereas T-shaped dimers have the lowest couplings. DFT values obtained from the projector operator-based diabatization (POD) method are initially benchmarked against the smaller databases HAB11 (HAB7-) and found to systematically improve B97X. Cost effective POD in combination with a GGA functional (PBE) and very efficient DFTB calculations on the dimers of the HAB79 database give a good linear correlation with the CASSCF/NEVPT2 reference data, which, after scaling with a multiplicative constant, gives reasonably small MRUEs of 17.9% and 40.1%, respectively, bearing in mind that couplings in HAB79 vary over 4 orders of magnitude. The ab-initio reference data reported here are expected to be useful for benchmarking other DFT or semi-empirical approaches for electronic coupling calculations.

B97X. Cost effective POD in combination with a GGA functional (PBE) and very efficient DFTB calculations on the dimers of the HAB79 database give a good linear correlation with the CASSCF/NEVPT2 reference data, which, after scaling with a multiplicative constant, gives reasonably small MRUEs of 17.9% and 40.1%, respectively, bearing in mind that couplings in HAB79 vary over 4 orders of magnitude. The ab-initio reference data reported here are expected to be useful for benchmarking other DFT or semi-empirical approaches for electronic coupling calculations.

I. Introduction
One of the key parameters in molecular charge transport modeling and simulation is the electronic coupling between two diabatic charge transfer electronic states [1][2][3][4].
Although accurate ab-initio methods such as n-electron valence state perturbation theory NEVPT2 [25][26][27] and multi-reference configuration interaction [28] are considered a "golden standard" for charge transfer integral calculations, their practical implementation is hindered by their inherent complexity and the sheer amount of computational resources required in the case of extended systems or when a significant number of electronic coupling evaluations is needed. In order to overcome this obstacle, benchmark datasets are utilized as to evaluate the accuracyand calibrate if neededcomputationally less demanding methods [19,21].
In this work, we introduce the HAB79 dataset: a selection of 79 heterocyclic polyaromatic hydrocarbons, either already utilized or inspired by contemporary organic (opto)electronic applications [29], arranged to 921 structurally diverse homomolecular dimers. Having such structural information at hand, ab-initio electronic couplings are calculated at the CASSCF/NEVPT2 level of theory, forming this way the HAB79 charge transfer integral reference dataset. DFT and density functional tight binding (DFTB) electronic couplings are calculated using the same dimeric configurations, thus enabling the direct comparison with reference ab-initio coupling values.
A general linear correlation between reference and DFT and DFTB electronic coupling values is identified, hence allowing the determination of a universal multiplicative Accepted to J. Chem. Phys. 10.1063/5.0076010 3 scaling constant for each method and level of theory that can be used for a-posteriori corrections. Furthermore, the HAB11 and HAB7-datasets from the literature are revisited by means of projection operator-based diabatization (POD) DFT electronic coupling calculations in order to examine the effect of various exchange-correlation functionals to the predictive capability of the POD method with respect to reference ab-initio data.
The layout of the paper is organized follows: all computational details regarding the construction of the HAB79 dataset and the employed simulation methods are outlined in Sec. II, followed by the results for all datasets in Sec. III. The conclusions of this work are presented in Sec. IV.

II.1 HAB79 database
The HAB79 dataset is a compilation of planar heterocyclic polyaromatic hydrocarbons exhibiting either electron or hole conductivity, with N, O, F, and S heteroatoms significantly participating to the non-degenerate frontier molecular orbitals. This particular selection of molecules is based on either current or potential applications in the field or organic (opto)electronics [29][30][31][32][33]. The molecules of the HAB79 dataset are illustrated in Fig.   1, numbered in an ascending order with respect to the total number of electrons. A more comprehensive record of the HAB79 dataset including formal compound names and InChIKey values (where available) is listed in the supplementary material. The geometry of each molecule was optimized using DFT at the B3LYP/6-311g(d) [34,35] level of theory as implemented in the NWChem package [36]. Due to the planarity of all molecules in the dataset, one can define a local orthonormal system of coordinates for each molecule spanned by the normal vector to the conjugation Accepted to J. Chem. Phys. 10.1063/5.0076010 plane, ⃗ , and two coplanar vectors, long and short with respect to the conjugation plane that characterize the long and short molecular axes according to the molecular mass distribution. These vectors are derived by the diagonalization of the molecular inertia tensor, thus retrieving the normal vector and short and long molecular axes vectors in an ascending order with respect to the sorted eigenvalue spectrum. Such a local system of coordinates can be used to create a series of homo-molecular dimeric configurations. The simplest form is that of cofacial dimers: monomers are translated along ⃗ by a given distance, thus forming totally eclipsed structures, typically associated with high electronic couplings. Following prior work [19,21], the distances of 3.5 Å, 4.0 Å, 4.5 Å, and 5.0 Å are chosen, resulting in four cofacial dimers per dataset entry. The next conformation of dimers under consideration are of the parallel-displaced (PD) type. A cofacial dimer with a vertical separation of 3.5 Å is initially created and is used to create three different PD dimers via appropriate translations along long , short , and long + short , respectively, as described in the supplementary material. Finally, a series of T-shaped dimers is created by means of proper displacements and rotations: as in the case of PD dimers, an initial cofacial dimer is created, followed by rigid body rotations of the second monomer about either long or short by 45° (also referred to as V-shaped) and 90°, making sure to alleviate any steric overlap by augmenting the vertical separation. Further information on Tand V-shaped dimers is included in the supplementary material. All dimer configurations are included in the supplementary material in xyz format.

II.2.1 Ab-initio reference calculations
The reference couplings are obtained with the generalized Mulliken-Hush (GMH) approach [5,[37][38][39][40][41][42][43]. Within this method, the electronic coupling matrix element in a two-state donor-acceptor system is obtained from quantities calculated in an adiabatic (delocalized) basis using the following expression: where Δ 12 is a vertical excitation energy from the ground state 1 to the excited state 2, 11 and 22 denote the respective states' dipole moments, and 12 is the transition dipole moment between the two states. In the case of symmetric dimers that possess a mirror plane perpendicular to the ⃗ vector, the Eq. (1) reduces to half the excitation energy Δ 12 .
The quantities that enter the Eq. (1) are calculated with the n-electron valence state perturbation theory (NEVPT2) [25][26][27] using state-averaged complete active space selfconsistent field (CASSCF) reference wavefunction [44]. The averaging was over two doublet states for each dimer. Due to the size of the systems under study, the CASSCF active space was chosen as a minimal set comprising of two orbitals and three electrons (p-type carriers) or two orbitals and one electron (n-type carriers).
Our reference calculations utilize aug-cc-pVDZ basis set for heavy atoms and a smaller cc-pVDZ basis for hydrogen atoms [45,46]. Such combination was shown to balance computational cost and accuracy for electronic coupling calculations [19]. The NEVPT2 calculations are carried out with an active "frozen core" approximation so that chemically inert core electrons are not correlated. The Coulomb and exchange integrals are efficiently evaluated with the resolution-of-identity (RI) [47,48] and chain-of-spheres (COSX) approximations [49], respectively. The RI auxiliary basis set is aug-cc-pVDZ/C or cc-pVDZ/C [50].

II.2.2 DFT approaches
Projection operator-based diabatization (POD). The POD method [10,22,24] is described in detail in Ref. [22] and we only briefly summarize the method here. First a Kohn-Sham calculation is carried out on the neutral dimer, unless stated otherwise, followed by orbitals of the donor and acceptor, respectively. Note that this procedure differs from the one we have previously employed, where calculations of coupling matrix elements were carried out on cationic (anionic) dimers. The current definition is the same as the one used in Ref. [24]. We found that both definitions give very similar results at POD/PBE level, though the definition used here is computationally more efficient.
POD calculations were first carried out on two smaller databases, HAB11 and HAB7- [19,21] to test the performance of the exchange-correlation functional (XC) and basis set.

II.2.3 DFTB approach
DFTB is derived from DFT by a Taylor expansion of the total energy around a welldefined reference density and is found to be 2−3 orders of magnitude faster than DFT-GGA functionals with midsized basis sets [62,63]. Recent studies have shown that DFTB can provide electronic couplings in good agreement with other DFT approaches, and the application of a uniform scaling factor allows the accuracy comparable to high-level electronic-structure methods [19,21]. In the present work, all calculations are performed within the framework of the fragment-orbital first-order DFTB (FODFTB1) with MIO parameter set [64]. Since charge transfer typically occurs in a narrow energy window around the Fermi level, we simplify the description of the charge by considering frontier orbitals of the fragments, i.e., HOMOs for hole transfer and LUMO for electron transfer.

III.1 Ab-initio electronic couplings for HAB79
The size of systems that enter the HAB79 database constitutes a challenge for highlevel ab-initio calculations. In our previous study [19] we showed that MRCI+Q with large CASSCF active space provides results in excellent agreement with the full CI method. We also found a multireference perturbative approach such as NEVPT2 to provide couplings with a mean relative unsigned error of 6.9 % w.r.t. MRCI+Q. We also explored the CC2 method as an economical approach for larger systems and found that it has accuracy slightly lower than NEVPT2 [19,21]. We note, however, that ROHF-based CC2 calculations have unfavorable scaling behavior with the system size, in particular when the calculations are performed without the use of symmetry.
The present manuscript explores an alternative route -NEVPT2 calculations with a minimal active space CASSCF reference wavefunction instead of a complete valence active orbital set. Such significant active space reduction compared to our previous study [19] is well justified owing to the nature of the systems under investigation. The geometries of the donor and acceptor molecules are identical, and the diabatic energies of the sites are either equal (cofacial orientation) or only marginally different (PD and T-shaped dimers). We also note that the dimers selected to enter the HAB79 do not display pronounced multireference character. Moreover, the two orbitals that form the active space in each case always constitute a bonding and anti-bonding pair. Thus, the two-orbital space provides a balanced description of the two adiabatic states already at the CASSCF level, and small differential dynamic correlation contribution is well approximated with a NEVPT2 approach. Two additional elements of our protocol made the NEVPT2 calculations feasible on the entire HAB79 set: a compact aug-cc-pVDZ basis set and the frozen core approximation. carried out with frozen core) improves. Active space reduction from (11,10) to minimal (3,2) and reduction of basis set along with the release of symmetry restrictions did not deteriorate NEVPT2 couplings. These observations are further confirmed by calculations on two geometries of the dimer 1cofacial at an intermolecular separation of 4.0 Å and T-shaped structure (see Table II). Here, the active space reduction decreases the couplings by 2 meV   In an attempt to classify all HAB79 dataset molecules regarding their potential charge transfer capabilities, one can examine the relationship between the electronic coupling of a given dimeric configuration and the single molecule reorganization energy λ [3,65,66]. If the coupling value is larger than λ/2, then no activation barrier for electron transfer exists (assuming zero driving force) and, as a consequence, the charge delocalizes over donor and acceptor. If this configuration is periodically replicated in a crystal the charge transport is likely to occur via (fast) diffusion of a delocalized polaron, unless structural defects or other charge localizing effects are present. We simply refer to this case as "band-like" mechanism.
In the opposite regime, if the coupling is smaller than λ/2, an activation barrier for electron transfer exists and the mechanism would be (slow) hopping of a relatively localized polaron.
Having this categorization in mind, all cofacial dimers with a separation of 3.5 Å and the majority of 4.0 Å dimers clearly lie in the band-like charge transfer regime, as illustrated in

III.2 DFT POD electronic couplings for HAB11 and HAB7-
Ab-initio calculated electronic couplings from the HAB79 dataset are used as reference in order to evaluate the suitability of computationally less intensive methods, namely DFT and DFTB approaches. Before we present the DFT POD results for HAB79, we investigate the performance of different XC functionals and basis sets on two smaller databases of pi-conjugated organic homo-dimers for which similar ab-initio reference data are available, HAB11 and HAB7- [19,21].
A suitable metric of the correlation between electronic coupling values obtained using a specific calculation method and the reference dataset is the inverse of the slope coefficient Scaling constants and statistical error metrics for the HAB7-and HAB11 datasets are listed in Table III and Table IV. Due to the larger size of molecules belonging to the HAB7dataset, DFT calculations were restricted to DZVP and TZVP basis sets, whereas for the smaller molecules of the HAB11 dataset, the TZV2P basis set was used. Discrepancies with previously reported values on the HAB11 dataset are due to differences in dimer charge and an erroneous definition of the PBE0 XC functional in Ref. [22] which is here corrected.   This behavior is typical for POD where the charge density is optimized on the whole dimer and the HFX in LR enhances the overlap between the monomer densities in the intermolecular region due to an improved description of the asymptotic behaviour of the XC potential. This is in contrast to electronic couplings obtained from constrained density functional theory (CDFT) [9,15], which are generally too high with GGA XC and decrease to values in good agreement with ab-initio reference data upon inclusion of HFX [15, 19, 21 81-83]. The reason is that in CDFT the charge constrained states are optimized for charged donor-acceptor pairs and these states are still subject to the electron delocalization error, albeit to a lesser extent than adiabatic electronic states. For GGA XC the delocalization of the charge constrained states leads to too large overlaps and couplings, and this situation drastically improves when HFX functionals are used.
Finally, the effect of basis set size to the electronic couplings error statistics for the HAB7-dataset indicate that the utilization of a smaller basis set such as DZVP-GTH has a minimal effect. As a result, an alternative to using computationally demanding levels of theory, such as TZVP-GTH/ωB97X, is to calculate electronic couplings using a GGA XC functional and the small DZVP-GTH basis set and apply an a-posteriori scaling of the results by an appropriate multiplicative constant. As we show in the following, this approach can give very accurate numbers and at the same time allows for a very large number of coupling calculations, which could be beneficial in computational screening studies.

III.3 DFT POD, ET-FMO and DFTB electronic couplings for HAB79
DFT-and DFTB-based calculations under study exhibit strong linear correlations, both in comparison with the reference data and between them, as illustrated in the diagrams in Fig. 3. Moreover, the statistical distribution of the exponential distance decay constants for reference and DFT and DFTB calculations are depicted in Fig. 4. Scaling constants and statistical error metrics for the HAB79 dataset are listed in Table V. All electronic couplings error statistical metrics are also recalculated using the linear scaling constants, resulting in significant statistical improvements regarding the POD and DFTB calculations.  A more in-depth analysis of the electronic couplings correlations between ab-initio reference values and DFT and DFTB results is carried out by examining the data according to the carrier type (p-type and n-type for hole and electron conductors, respectively) and the dimer type. This breakdown of the original set of electronic couplings data for the HAB79 dataset is illustrated for p-type molecules in Fig. 5 and n-type molecules in Fig. 6.   Regarding possible structure-property relationships arising from a direct comparison between HAB79 electronic couplings and reorganization energy, most cofacial dimers up to a separation of 4.0 Å can potentially manifest band-like charge transport characteristicsif such motifs are periodically replicated in a molecular crystal. In the case of PD dimers, some systems fall into the aforementioned transport regime, whereas for T-shaped dimers, such a behavior is only evident for selected dimers with a rotation of 45° about the short molecular axis.
The HAB7-and HAB11 datasets are revisited using POD charge transfer integral calculations with a variety of XC functionals. All functionals tend to capture the exponential decay constant β for cofacial dimers very well, but pure GGA XC exhibit too low absolute coupling values. The agreement with ab-initio couplings improves considerably for hybrids and range-separated hybrids. The optimal level of theory with respect to POD electronic coupling error minimization is based on the LR-corrected hybrid ωB97X XC functional and a triple-zeta basis set, giving MRUEs of 5.2% and 7.5% for HAB11 and HAB7-databases, respectively (without scaling).
We anticipate the newly introduced HAB79 dataset to provide useful ab-initio reference electronic coupling values to the computational charge transport community, enabling further benchmark work using various computational methodologies not covered by this work. program and access to computing resources provided by InstitutsCluster II (Karlsruhe).

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.