Hetero-Polycyclic Aromatic Systems: a Data-Driven Analysis
Polycyclic aromatic systems (PASs) are a class of molecules containing multiple fused aromatic rings. They are found in diverse environments, from natural coal to the interstellar medium, and have various industrial applications, including in organic semiconductors, catalysis, and medicine. Our group has taken on the challenge of creating the first large-scale COMPutational database of Polycyclic Aromatic Systems (COMPAS). In this work, we present the expansion of the COMPAS database to include heteroatom-substituted PASs. This new dataset comprises 500k molecules, spanning up to 10 cata-condensed rings and includes 11 unique heterocycles ranging in size from 4 to 6 atoms. We calculated electronic properties using the semi-empirical GFN1-xTB method. Additionally, we developed a correction scheme based on values obtained from the CAM-B3LYP-D3BJ/def2-SVP DFT method for a subset of 50k molecules. In this seminar, we will explore the enumeration algorithm and the subsequent workflow that we developed to access this diverse chemical space. Additionally, we will showcase the structure-property relationships present in the dataset. We believe this work will provide a solid foundation for further exploration of PAS chemical space and will enable future data-driven approaches for the design of novel and functional PASs.