Mattergen & CHGNet#

Overview#

Mattergen is a generative AI model developed by Microsoft Research for materials discovery. Trained on vast crystallographic databases, it generates novel, theoretically stable crystal structures from scratch, significantly expanding the search space for new materials.

CHGNet (Crystal Hamiltonian Graph Neural Network) is a universal graph neural network potential. It predicts the energy, forces, and stresses of atomic structures with high efficiency and accuracy comparable to density functional theory (DFT), enabling rapid structural relaxation and property evaluation.

This methodology integrates the strengths of both tools into a powerful, automated pipeline. Mattergen acts as the generator, creating a diverse pool of candidate structures. These candidates are then passed to CHGNet, which serves as the evaluator and refiner, performing fast structural relaxation and energy ranking. This synergistic approach enables high-throughput screening for stable, promising materials, accelerating the discovery process.

Installation#

1. Prerequisites: UV and DeepH-Dock#

First, please ensure you have uv and deeph-dock installed on your system by following the official installation guide at: https://deeph-dock.readthedocs.io/en/latest/installation_and_setup.html.

2. Installing Mattergen-ever#

DeepH-dock provides the core scripts to orchestrate the workflow, but Mattergen and CHGNet must be installed separately in dedicated virtual environments.

Note on Compatibility: The official version of Mattergen has minor incompatibilities with CHGNet for this specific workflow. Therefore, we must install a modified version called Mattergen-ever (kYangLi/mattergen-ever).

Follow these steps to set up the Mattergen environment:

# Create a dedicated directory for virtual environments (optional but recommended)
mkdir -p ~/.uvenv
cd ~/.uvenv

# Create a virtual environment for Mattergen using Python 3.10
uv venv mattergen --python=3.10

# Activate the environment
source ~/.uvenv/mattergen/bin/activate

# Clone and install the modified Mattergen-ever package
git clone https://github.com/kYangLi/mattergen-ever.git
cd mattergen-ever
uv pip install -e .

The Mattergen-ever environment is now ready.

3. Installing CHGNet#

Next, set up a separate virtual environment for CHGNet:

# Navigate back to your virtual environment directory
cd ~/.uvenv

# Create a virtual environment for CHGNet using Python 3.13
uv venv chgnet --python=3.13

# Activate the environment
source ~/.uvenv/chgnet/bin/activate

# Install the CHGNet package
uv pip install chgnet

With both mattergen and chgnet virtual environments successfully configured, you are now fully prepared to run the integrated structure discovery pipeline provided by DeepH-dock. The workflow scripts will handle activating the appropriate environment for each step.

Structure Generation Workflow Template#

DeepH-dock provides a built-in command to generate a standard workflow template for structure generation using Mattergen and CHGNet.

Creating the Template#

To create the template, use the following command:

dock design mattergen-chgnet create -h

This will display the help information:

Usage: dock design mattergen-chgnet create [OPTIONS] TARGET_DIR

  Establish the searching template for the structure generator.

Options:
  -h, --help  Show this message and exit.

Run the command without the -h flag to create the template in the current directory:

dock design mattergen-chgnet create .

This will generate two key files in the current directory: config.sh and run.sh. The config.sh file contains the configuration parameters for the workflow.

Configuration File (config.sh)#

Below is an example of the generated config.sh file with detailed explanations for each parameter.

# Path to the activation script for the Mattergen virtual environment
readonly UV_MATTERGEN_ACT_SOURCE="/home/deeph/py_env/mattergen/bin/activate"
# Path to the activation script for the CHGNet virtual environment
readonly UV_CHGNET_ACT_SOURCE="/home/deeph/py_env/chgnet/bin/activate"

# Directory paths for different stages of the structure pool
readonly STRUCTURE_POOL_BUFFER_PATH="POOL/BUFFER"   # Temporary storage for generated structures
readonly STRUCTURE_POOL_ORIGIN_PATH="POOL/ORIGIN"   # Original structures from Mattergen
readonly STRUCTURE_POOL_RELAXED_PATH="POOL/RELAXED" # Structures after CHGNet relaxation
readonly STRUCTURE_POOL_ERROR_PATH="POOL/ERROR"     # Structures that failed during processing

# Mattergen model selection (e.g., "chemical_system")
readonly MATTERGEN_MODEL_NAME="chemical_system"
# Condition for conditional generation (refer to Mattergen documentation for details)
readonly MATTERGEN_CONDITION='{"chemical_system":"B-**!La,Ce,Pr,Nd,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu"}'
# Diffusion factor controlling the generation diversity
readonly MATTERGEN_DIFFUSION_FACTOR=2.0
# Number of structures generated per batch
readonly MATTERGEN_BATCH_SIZE=32
# Number of batches to generate
readonly MATTERGEN_NUM_BATCH=512
# Distribution of the number of atoms per structure (e.g., "ALEX_MP_20")
readonly MATTERGEN_NUM_ATOMS_DISTRIBUTION="ALEX_MP_20"

# CHGNet relaxation parameters
readonly CHGNET_F_MAX=0.02      # Force convergence threshold (in eV/Å)
readonly CHGNET_N_MAX=500       # Maximum number of relaxation steps
# Additional options for CHGNet (e.g., saving trajectory information)
readonly CHGNET_EXTRA_OPTIONS="--save_trajectory_info"

Key Configuration Notes#

  1. Virtual Environment Paths:

    • UV_MATTERGEN_ACT_SOURCE: Set this to the absolute path of the activate script for your Mattergen virtual environment (created during installation).

    • UV_CHGNET_ACT_SOURCE: Set this to the absolute path of the activate script for your CHGNet virtual environment.

  2. Mattergen Parameters:

    • MATTERGEN_MODEL_NAME: The specific Mattergen model to use. The example uses "chemical_system".

    • MATTERGEN_CONDITION: A JSON string defining the chemical system and constraints. In the example, "B-**!La,Ce,Pr,Nd,Sm,Eu,Gd,Tb,Dy,Ho,Er,Tm,Yb,Lu" means boron (B) with any number of atoms, but excluding lanthanides from La to Lu. Adjust according to your target materials.

    • MATTERGEN_DIFFUSION_FACTOR: Controls the diversity of generated structures. Higher values lead to more diverse structures but may reduce stability.

    • MATTERGEN_BATCH_SIZE and MATTERGEN_NUM_BATCH: The total number of structures generated is MATTERGEN_BATCH_SIZE * MATTERGEN_NUM_BATCH. Adjust these values based on your computational resources and search scope.

    • MATTERGEN_NUM_ATOMS_DISTRIBUTION: Defines the distribution of the number of atoms per structure. The example uses "ALEX_MP_20", which is a distribution derived from the Materials Project database for up to 20 atoms.

  3. CHGNet Parameters:

    • CHGNET_F_MAX: The force convergence threshold for relaxation (in eV/Å). Lower values lead to stricter convergence.

    • CHGNET_N_MAX: The maximum number of ionic steps during relaxation.

    • CHGNET_EXTRA_OPTIONS: Additional command-line options for CHGNet. The example includes --save_trajectory_info to save intermediate relaxation steps.

Running the Workflow#

After configuring config.sh according to your needs, execute the workflow by running:

bash run.sh

This script will sequentially:

  1. Activate the Mattergen environment and generate structures.

  2. Activate the CHGNet environment and perform structural relaxation.

  3. Organize the generated and relaxed structures into the specified directory structure (POOL/ORIGIN, POOL/RELAXED, etc.).

You can monitor the progress and check the output in the POOL subdirectories and the running logging files in run/*.log. The POOL/RELAXED directory will contain the final relaxed structures, which can then be further analyzed or used for downstream tasks.