Synopsis
RoboBohr is a machine learning tool for predicting electronic structure of molecules. It operates on data collected from the PubChem database and constructs feature vectors to describe each molecule. The feature vectors can then be fed into machine learning algorithms for predicting atomization energies.
Operation
RoboBohr currently has 4 modes of operation:
- query: Reads input sdf files and creates list of objects that contain types of atoms and coordinates for each entry in the input. The list of these objects are then used to create input files for the pwscf code of the Quantum Espresso package.
- createFeatures: From the list of objects obtained in the query step, generates a design matrix and saves on file.
- cluster: Creates job submission files for running the pwscf input files in a high performance computing (HPC) environment. Torque and Slurm scheduling systems are supported.
- outcomes: Analyzes the output files generated from pwscf runs and stores relevant outcome quantities (e.g. ground state energies) and creates a log file.
Utilities are also provided to construct raw data in JSON format which allows feature engineering capabilities beyond what is provided by default in RoboBohr.
Articles
- Preprint of the published paper
- A minimally technical blog post about RoboBohr
- A notebook (in R) that illustrates the use of the data generated by RoboBohr to train models
- Two Kaggle datasets: