This directory contains the scripts/codes used to generate the data for machine learning, in addition to the multi-objective code 
running machine learning and predicting the potential of new MOFs for atmospheric water harvesting. 

The first step is using Python to run the feature_generation.py code (which also calls RAC-getter.py) 
to calculate the features using MolSimplify (RAC) and zeo++ (structural). So, these two programs should be installed, in advance. 

NOTE 1: In the feature_generation.py code, the directory of zeo++, the *.cif directory and the output name must be specified. 
For example: base_dir = "/scratch/ml/coremof/"
             zeo_path = "/scratch/ml/zeo/"
             final_output_path = f'{cwd}/test-MOFs.csv'

NOTE 2: The MOFs with missing features will not not be printed to the output *.csv file. 

NOTE 3: In the datasets used in our study, there are some features and targets calculated by RASPA for both CoreMOF and QMOF databases. 
These include the water uptake and selectivity targets, in addition to the KHW feature used to predict water stability.

Warning 1: Make sure the heading of the test-MOFs.csv file is "MOF_name". 
Warning 2: Use MOF names that do not contain "," otherwise the program will split the columns inappropriately, resulting in error.

After preparation of the new data file (for example test-MOFs.csv), the multi-objective-AWH-predictor.ipynb code should be run in a jupyter notebook. 
Alternatively, the multi-objective-AWH-predictor.py script can be run using python. 
The code, the Wstability-data.csv and WUSdata-mod.csv datasets and the file containing the data of the new MOFs (for example test-MOFs.csv) 
should be all placed in the same directory. 
When running the code, it will pop up a window (in jupyter notebook) or command line (using python) to ask the name of the new data file (for example test-MOF.csv). 
Then, it predicts the suitability of the structures for AWH. 
