ABACUS Interface

ABACUS Interface#

Overview#

The ABACUS Interface module facilitates the conversion of ABACUS (Atomic-orbital Based Ab-initio Computation at USTC) calculation outputs into the DeepH DFT data format. ABACUS is a modern, high-performance, open-source DFT package renowned for its computational efficiency and pioneering design for AI-driven research workflows. Developed at the Peking University, it represents a new generation of electronic structure codes that natively embraces both high-performance computing and machine learning paradigms.

A key strength of ABACUS lies in its dual-basis-set architecture, which seamlessly supports both plane-wave and numerical atomic orbital (NAO) bases within the same framework, coupled with various pseudopotentials. This flexibility allows researchers to choose the optimal balance between accuracy and efficiency for different materials systems. Furthermore, ABACUS is engineered from the ground up with exceptional parallel scalability and low memory footprint, making it exceptionally capable for large-scale simulations—a perfect source for generating the extensive, high-quality training data required by DeepH.

Beyond raw performance, ABACUS is at the forefront of the AI for Science movement. Its modular, data-aware architecture facilitates seamless integration with machine learning pipelines, making it an ideal partner for DeepH. The Hamiltonian and overlap matrices produced by ABACUS’s NAO mode are intrinsically compatible with DeepH’s localized orbital representation, ensuring a theoretically coherent and efficient data conversion process. By bridging ABACUS’s efficient, large-scale DFT engine with DeepH’s neural network models, this interface unlocks a powerful synergy: the rapid generation of precise training data meets scalable, ultra-fast property prediction, dramatically accelerating the cycle of computational materials discovery.

This conversion module empowers researchers to leverage the growing ecosystem of ABACUS datasets, harnessing their inherent compatibility with AI workflows to train robust DeepH models for accelerated electronic structure calculations while maintaining rigorous first-principles accuracy.

Note: The current interface is developed for ABACUS version 3.10 LTS. Please note that output file structures may differ in other versions.

Preparing ABACUS Calculations#

Required ABACUS Settings#

To obtain the raw data required by DeepH-pack (Hamiltonian, overlap matrix, etc.) from ABACUS calculations, you must configure your ABACUS input files appropriately. The following settings are essential in the INPUT file:

basis_type     lcao      # Required: Use atomic orbital basis
out_mat_hs2    1         # Required: Enable sparse matrix output for Hamiltonian and Overlap
calculation    scf       # Optional: Default is 'scf', use 'get_S' for overlap matrix only

For inference process which only needs the overlap matrix, the option calculation get_S should be used and the program will exit after dumping the SR.csr file.

Important Considerations for ABACUS Calculations#

Atomic Species Labeling#

ABACUS allows different settings for the atoms of the same element (e.g., different basis sets and pseudopotentials for bulk and surface silicon atoms), but DeepH-pack can not distinguish these differences and treats them as the same species. To avoid this problem, the species labels in the ATOMIC_SPECIES block should be the standard element symbols instead of user-defined symbols (e.g., only Si is allowed for silicon atoms and "Si1" is not allowed).

File Structure Organization#

Assuming you have already read the ABACUS documentation and are familiar with conducting DFT calculations using ABACUS, to prepare your ABACUS data for DeepH training, you must organize the data for different material structures according to the following convention:

ABACUS Data Structure#

abacus_datasets/
├── structure_1/
│   └── OUT.{suffix}/
│       ├── running_scf.log
│       ├── data-HR-sparse_SPIN0.csr
│       ├── data-SR-sparse_SPIN0.csr
│       ├── data-rR-sparse.csr             # Optional
│       └── data-DMR-sparse_SPIN0.csr      # Optional
├── structure_2/
│   └── OUT.{suffix}/
│       └── ...
└── ...

Here, structure_1, structure_2, etc., represent the names of individual datasets and can be any combination of characters. Structure information is stored in the running_*.log file and physical properties are stored in the data-*.csr files, all of which are dumped under OUT.{suffix} directory.

The DeepH-dock data conversion tool will automatically transform this organized data into the format recognized by DeepH-pack. It will be converted into the DeepH format like:

Converted DeepH Data Structure#

deeph_datasets/
├── structure_1/
│   ├── info.json
│   ├── POSCAR
│   ├── hamiltonian.h5      # Exported by default
│   ├── overlap.h5          # Exported by default
│   ├── density_matrix.h5   # Optional - requires --export-rho flag
│   └── position_matrix.h5  # Optional - requires --export-r flag
├── structure_2/
│   └── ...
└── ...

Command Line Interface#

Basic Conversion Command#

User can use the command line interface to convert Abacus format to DeepH format.

dock convert abacus to-deeph ./abacus_data /tmp/deeph_data -p 2

Expected output:

Data: 2it [00:00, 400.72it/s]
[done] Translation completed successfully!

Complete Command Line Options#

For detailed parameter information, use the help command:

dock convert abacus to-deeph -h

Usage: dock convert abacus to-deeph [OPTIONS] ABACUS_DIR DEEPH_DIR

  Translate ABACUS output data to DeepH DFT data training set forma

Options:
  -s, --abacus-suffix, --suffix TEXT
                                  Only look for OUT.suffix in
                                  abacus_output_dir.
  --ignore-S                      Do not export overlap.h5
  --ignore-H                      Do not export hamiltonian.h5
  --export-rho                    Export density_matrix.h5
  --export-r                      Export position_matrix.h5
  -p, --parallel-num INTEGER      The parallel processing number, -1 for using
                                  all of the cores.  [default: -1]
  -t, --tier-num INTEGER          The tier number of the ABACUS source data,
                                  -1 for [abacus_dir], 0 for
                                  <abacus_dir>/<data_dirs>, 1 for
                                  <abacus_dir>/<tier1>/<data_dirs>, etc.
                                  [default: 0]
  --force                         Force to overwrite the existing files.
  -h, --help                      Show this message and exit.

Parameter Details#

Export Options#

Default exports: hamiltonian.h5 and overlap.h5 are exported by default
Optional exports:
- --export-rho: Exports density matrices (density_matrix.h5)
- --export-r: Exports position matrices (position_matrix.h5)
- --ignore-S: Disables overlap matrix export
- --ignore-H: Disables Hamiltonian matrix export

Parallel Processing#

-p, --parallel-num: Controls parallel processing (default: -1, uses all cores)
Performance scales with core count, but memory usage increases proportionally
Use lower values for memory-constrained environments

Tier Number Specification#

-t, --tier-num: Defines the directory hierarchy level for data access
- 0: Access to <abacus_dir>/<data_dirs> (default)
- 1: Access to <abacus_dir>/<tier1>/<data_dirs>
- Higher numbers for deeper nesting

Advanced Usage#

For programmatic control, DeepH-dock also provides class API:

from deepx_dock.convert.abacus.translate_abacus_to_deeph import AbacusDatasetTranslator

translator = AbacusDatasetTranslator(
    abacus_data_dir="abacus_data",
    deeph_data_dir="/tmp/deeph_data",
    abacus_suffix="ABACUS",
    n_jobs=2,
)
translator.transfer_all_abacus_to_deeph()

Data: 2it [00:00, 183.59it/s]