SIESTA Interface#
Overview#
The SIESTA Interface provides bidirectional conversion capabilities between SIESTA (Spanish Initiative for Electronic Simulations with Thousands of Atoms) output files and the DeepH training data format. SIESTA is not only a method and its computer program implementation for efficient electronic structure calculations but also stands as one of the most influential and pioneering codes in the realm of numerical atomic orbital (NAO) methods. First released in 1996, SIESTA helped establish and popularize the use of strictly localized, finite-support atomic orbitals as a practical and efficient basis set for large-scale density functional theory (DFT) calculations, directly inspiring a generation of subsequent software, including OpenMX.
The core of SIESTA’s enduring success and efficiency lies in its use of a basis set of strictly-localized atomic orbitals, a design philosophy that aligns perfectly and naturally with the localized orbital representation fundamental to DeepH. This inherent theoretical and practical synergy makes SIESTA an ideal data source for training DeepH models. The code is renowned for its robustness, scalability, and widespread adoption in both academic and industrial research for simulating diverse systems—from molecules and nanotubes to surfaces and bulk materials.
This conversion module empowers researchers to leverage their extensive existing SIESTA datasets, capitalizing on this methodological congruence. By bridging SIESTA’s proven, scalable DFT framework with DeepH’s machine learning acceleration, it enables rapid and accurate electronic structure predictions for systems of unprecedented scale while rigorously preserving first-principles accuracy.
Note: This converter is designed for SIESTA version 5.x, with automatic version detection from SIESTA.log headers. We welcome community contributions to expand support to additional SIESTA versions.
Preparing SIESTA Calculations#
Required SIESTA Settings#
To obtain the raw data required by DeepH-pack (Hamiltonian, overlap matrix, etc.) from SIESTA calculations, the following input flags are essential in the SIESTA input file (*.fdf):
SaveHS .true.: Enables output of Hamiltonian and overlap matrices to the.HSXfile.ForceAuxCell .true.: Crucial for outputting the correct Hamiltonian and overlap matrix in the.HSXfile, especially for gamma-only calculation of extended systems. Without this flag, matrix elements for lattice vectors other than R=[0,0,0] may be incorrectly folded into the R=[0,0,0] block, resulting in wrong atom-pair information.
For inference processes that only require the overlap matrix, the flag HSetupOnly .true. can be used. In this mode, the program will exit after generating the .HSX file (or .0.HSX file, which can be renamed to .HSX for compatibility).
File Structure Organization#
Assuming you have already read the SIESTA documentation and are familiar with conducting DFT calculations using SIESTA, to prepare your SIESTA data for DeepH training, you must organize the data for different material structures according to the following convention:
SIESTA Data Structure#
siesta_data/
├── structure_1/
│ ├── SYSTEM.fdf
│ ├── SYSTEM.HSX
│ ├── SYSTEM.XV
│ ├── SYSTEM.ORB_INDX
│ ├── SYSTEM.DM # Density Matrix(Optional)
│ ├── SYSTEM.EIG # Eigenvalues(Optional)
│ └── SIESTA.log
├── structure_2/
│ └── ...
└── ...
Here, structure_1, structure_2, etc., represent the names of individual datasets and can be any combination of characters. Structure information is stored in the .XV file and physical properties are stored in the .HSX, .DM, etc. Note that the output of the program should be redirected into SIESTA.log, which will be read by the translation tool.
The DeepH-dock data conversion tool will automatically transform this organized data into the format recognized by DeepH-pack. It will be converted into the DeepH format like:
Converted DeepH Data Structure#
deeph_datasets/
├── structure_1/
│ ├── info.json
│ ├── POSCAR
│ ├── hamiltonian.h5 # Exported by default
│ ├── overlap.h5 # Exported by default
│ ├── density_matrix.h5 # Optional - requires --export-rho flag
│ └── position_matrix.h5 # Optional - requires --export-r flag
├── structure_2/
│ └── ...
└── ...
Command Line Interface#
Basic Conversion Command#
You can convert SIESTA format data to DeepH format using the command line interface:
dock convert siesta to-deeph ./siesta_data /tmp/deeph_data -p 1
Expected output:
Data: 1it [00:00, 12.50it/s]
[done] Translation completed successfully!
Complete Command Line Options#
For detailed parameter information, use the help command:
dock convert siesta to-deeph -h
Usage: dock convert siesta to-deeph [OPTIONS] SIESTA_DIR DEEPH_DIR
Translate SIESTA output data to DeepH DFT data training set format.
Options:
--ignore-S Do not export overlap.h5
--ignore-H Do not export hamiltonian.h5
--export-rho Export density_matrix.h5
--export-r Export position_matrix.h5
-p, --parallel-num INTEGER The parallel processing number, -1 for using all
of the cores. [default: -1]
-t, --tier-num INTEGER The tier number of the SIESTA source data, -1
for [siesta_dir], 0 for
<siesta_dir>/[data_dirs], 1 for
<siesta_dir>/<tier1>/[data_dirs], etc.
[default: 0]
--force Force to overwrite the existing files.
-h, --help Show this message and exit.
Parameter Details#
Export Options#
Default exports:
hamiltonian.h5andoverlap.h5are exported by defaultOptional exports:
--export-rho: Exports density matrices (density_matrix.h5)--export-r: Exports position matrices (position_matrix.h5)--ignore-S: Disables overlap matrix export--ignore-H: Disables Hamiltonian matrix export
Parallel Processing#
-p, --parallel-num: Controls parallel processing (default: -1, uses all cores)Performance scales with core count, but memory usage increases proportionally
Use lower values for memory-constrained environments
Tier Number Specification#
-t, --tier-num: Defines the directory hierarchy level for data access0: Access to<openmx_dir>/<data_dirs>(default)1: Access to<openmx_dir>/<tier1>/<data_dirs>Higher numbers for deeper nesting
Advanced Usage#
For advanced users, deeph-dock provides the SIESTADatasetTranslator class.
from deepx_dock.convert.siesta.translate_siesta_to_deeph import SIESTADatasetTranslator
translator = SIESTADatasetTranslator(
siesta_data_dir="./siesta_data",
deeph_data_dir="/tmp/deeph_data",
n_jobs=2,
)
translator.transfer_all_siesta_to_deeph()
Data: 1it [00:00, 381.30it/s]
Conversion from DeepH format to SIESTA format is to be added.