SIESTA Interface#

Overview#

The SIESTA Interface provides bidirectional conversion capabilities between SIESTA (Spanish Initiative for Electronic Simulations with Thousands of Atoms) output files and the DeepH training data format. SIESTA is not only a method and its computer program implementation for efficient electronic structure calculations but also stands as one of the most influential and pioneering codes in the realm of numerical atomic orbital (NAO) methods. First released in 1996, SIESTA helped establish and popularize the use of strictly localized, finite-support atomic orbitals as a practical and efficient basis set for large-scale density functional theory (DFT) calculations, directly inspiring a generation of subsequent software, including OpenMX.

The core of SIESTA’s enduring success and efficiency lies in its use of a basis set of strictly-localized atomic orbitals, a design philosophy that aligns perfectly and naturally with the localized orbital representation fundamental to DeepH. This inherent theoretical and practical synergy makes SIESTA an ideal data source for training DeepH models. The code is renowned for its robustness, scalability, and widespread adoption in both academic and industrial research for simulating diverse systems—from molecules and nanotubes to surfaces and bulk materials.

This conversion module empowers researchers to leverage their extensive existing SIESTA datasets, capitalizing on this methodological congruence. By bridging SIESTA’s proven, scalable DFT framework with DeepH’s machine learning acceleration, it enables rapid and accurate electronic structure predictions for systems of unprecedented scale while rigorously preserving first-principles accuracy.

Note: This converter is designed for SIESTA version 5.x, with automatic version detection from SIESTA.log headers. We welcome community contributions to expand support to additional SIESTA versions.

Preparing SIESTA Calculations#

Required SIESTA Settings#

To obtain the raw data required by DeepH-pack (Hamiltonian, overlap matrix, etc.) from SIESTA calculations, the following input flags are essential in the SIESTA input file (*.fdf):

  • SaveHS .true.: Enables output of Hamiltonian and overlap matrices to the .HSX file.

  • ForceAuxCell .true.: Crucial for outputting the correct Hamiltonian and overlap matrix in the .HSX file, especially for gamma-only calculation of extended systems. Without this flag, matrix elements for lattice vectors other than R=[0,0,0] may be incorrectly folded into the R=[0,0,0] block, resulting in wrong atom-pair information.

For inference processes that only require the overlap matrix, the flag HSetupOnly .true. can be used. In this mode, the program will exit after generating the .HSX file (or .0.HSX file, which can be renamed to .HSX for compatibility).

File Structure Organization#

Assuming you have already read the SIESTA documentation and are familiar with conducting DFT calculations using SIESTA, to prepare your SIESTA data for DeepH training, you must organize the data for different material structures according to the following convention:

SIESTA Data Structure#

siesta_data/
├── structure_1/
│   ├── SYSTEM.fdf
│   ├── SYSTEM.HSX
│   ├── SYSTEM.XV
│   ├── SYSTEM.ORB_INDX
│   ├── SYSTEM.DM          # Density Matrix(Optional)   ├── SYSTEM.EIG         # Eigenvalues(Optional)   └── SIESTA.log
├── structure_2/
│   └── ...
└── ...

Here, structure_1, structure_2, etc., represent the names of individual datasets and can be any combination of characters. Structure information is stored in the .XV file and physical properties are stored in the .HSX, .DM, etc. Note that the output of the program should be redirected into SIESTA.log, which will be read by the translation tool.

The DeepH-dock data conversion tool will automatically transform this organized data into the format recognized by DeepH-pack. It will be converted into the DeepH format like:

Converted DeepH Data Structure#

deeph_datasets/
├── structure_1/
│   ├── info.json
│   ├── POSCAR
│   ├── hamiltonian.h5      # Exported by default   ├── overlap.h5          # Exported by default   ├── density_matrix.h5   # Optional - requires --export-rho flag   └── position_matrix.h5  # Optional - requires --export-r flag
├── structure_2/
│   └── ...
└── ...

Command Line Interface#

Basic Conversion Command#

You can convert SIESTA format data to DeepH format using the command line interface:

dock convert siesta to-deeph ./siesta_data /tmp/deeph_data -p 1

Expected output:

Data: 1it [00:00, 12.50it/s]
[done] Translation completed successfully!

Complete Command Line Options#

For detailed parameter information, use the help command:

dock convert siesta to-deeph -h
Usage: dock convert siesta to-deeph [OPTIONS] SIESTA_DIR DEEPH_DIR

  Translate SIESTA output data to DeepH DFT data training set format.

Options:
  --ignore-S                  Do not export overlap.h5
  --ignore-H                  Do not export hamiltonian.h5
  --export-rho                Export density_matrix.h5
  --export-r                  Export position_matrix.h5
  -p, --parallel-num INTEGER  The parallel processing number, -1 for using all
                              of the cores.  [default: -1]
  -t, --tier-num INTEGER      The tier number of the SIESTA source data, -1
                              for [siesta_dir], 0 for
                              <siesta_dir>/[data_dirs], 1 for
                              <siesta_dir>/<tier1>/[data_dirs], etc.
                              [default: 0]
  --force                     Force to overwrite the existing files.
  -h, --help                  Show this message and exit.

Parameter Details#

Export Options#

  • Default exports: hamiltonian.h5 and overlap.h5 are exported by default

  • Optional exports:

    • --export-rho: Exports density matrices (density_matrix.h5)

    • --export-r: Exports position matrices (position_matrix.h5)

    • --ignore-S: Disables overlap matrix export

    • --ignore-H: Disables Hamiltonian matrix export

Parallel Processing#

  • -p, --parallel-num: Controls parallel processing (default: -1, uses all cores)

  • Performance scales with core count, but memory usage increases proportionally

  • Use lower values for memory-constrained environments

Tier Number Specification#

  • -t, --tier-num: Defines the directory hierarchy level for data access

    • 0: Access to <openmx_dir>/<data_dirs> (default)

    • 1: Access to <openmx_dir>/<tier1>/<data_dirs>

    • Higher numbers for deeper nesting

Advanced Usage#

For advanced users, deeph-dock provides the SIESTADatasetTranslator class.

from deepx_dock.convert.siesta.translate_siesta_to_deeph import SIESTADatasetTranslator

translator = SIESTADatasetTranslator(
    siesta_data_dir="./siesta_data",
    deeph_data_dir="/tmp/deeph_data",
    n_jobs=2,
)
translator.transfer_all_siesta_to_deeph()
Data: 1it [00:00, 381.30it/s]

Conversion from DeepH format to SIESTA format is to be added.