Training TOML#
Parameters of the training process are configured through a TOML-formatted file. Each key systematically governs specific aspects of the computational workflow. An example TOML is shown below:
train.toml:
# ---------------------------------- SYSTEM ----------------------------------
[system]
note = "Enjoy DeepH-pack! ;-)"
device = "gpu*8:0"
float_type = "fp32"
random_seed = 137
log_level = "info"
jax_memory_preallocate = true
show_train_process_bar = true
# ----------------------------------- DATA ------------------------------------
[data]
inputs_dir = "./user/should/set/this/inputs"
outputs_dir = "./user/should/set/this/outputs"
[data.dft]
data_dir_depth = 0
[data.graph]
dataset_name = "DATASET-DEMO"
graph_type = "H"
storage_type = "memory"
common_orbital_types = ""
parallel_num = -1
only_save_graph = false
[data.model_save]
best = true
latest = true
latest_interval = 100
latest_num = 10
# ----------------------------- MODEL -----------------------------------------
[model]
net_type = "normal"
target_type = "H"
loss_type = "mse"
[model.advanced]
gaussian_basis_rmax = 7.5
net_irreps = ""
num_blocks = 3
consider_parity = true
standardize_gauge = false
# ------------------------------ PROCESS --------------------------------------
[process.train]
max_epoch = 10000
multi_way_jit_num = 1
ahead_of_time_compile = false
[process.train.dataloader]
batch_size = 1
train_size = 1
validate_size = 0
test_size = 0
dataset_split_json = ""
only_use_train_loss = false
[process.train.optimizer]
type = "adamw"
init_learning_rate = 2E-3
clip_norm_factor = -1.0
# sgd
momentum = 0.8
# adam(w)
betas = [0.9, 0.999]
weight = 0.001
eps = 1E-8
[process.train.scheduler]
min_learning_rate_scale = 1E-4
type = "reduce_on_plateau"
# Reduce on plateau
factor = 0.5
patience = 500
rtol = 0.05
cooldown = 100
accum_size = -1
# Warmup cosine decay
init_scale = 0.1
warmup_steps = 10
decay_steps = -1
end_scale = -1.0
[process.train.continued]
enable = false
new_training_data = false
new_optimizer = false
previous_output_dir = ""
load_model_type = "latest"
load_model_epoch = -1
Next, we will go through the semantics of these parameters in the TOML file in detail.
System#
[system.note]
Description: Name of this training project.
Default:
"Enjoy DeepH-pack! ;-)"Type:
<STRING>
[system.device]
Description: The device configuration follows the syntax
<type>*<num>:<id>, where<type>specifies hardware type (cpu,gpu,rocm,dcu, orcuda),<num>denotes either the total devices per node (for accelerators like GPU) or the number of CPU partitions (when usingcpu), and<id>defines target device indices (e.g.,gpu*8:1-4,7selects 5 GPUs from an 8-device node using indices 1,2,3,4,7). Note: For CPU configurations,<id>is ignored while<num>controls thread partitioning.Default:
"gpu*8:0"Type:
<STRING>
[system.float_type]
Description: The float type for building graph and training model. For most of the Hamiltonian tasks,
"fp32"is accurate enough with float point error around \(0.001\sim 0.01\) meV.Default:
"fp32"Type: [
"bf16","tf32","fp32","fp64"]
[system.random_seed]
Description: The seed for generating random numbers.
Default:
137Type:
<INT>
[system.log_level]
Description: The degree of severity for
deepx.log, increasing in the order of debug, info, warning, error, and critical.Default:
"info"Type: [
"debug","info","warning","critical"]
[system.jax_memory_preallocate]
Description: Whether to pre-allocate \(75\%\) of the remaining memory before running. Use
truefor formal training; usefalsefor debugging to observe memory usage on the GPU. Usingfalseduring training may cause the process to unexpectedly crash mid-way or slow down the training process.Default:
trueType:
<BOOL>
[system.show_train_process_bar]
Description: Whether to display a graphical progress bar in the command line.
Default:
trueType:
<BOOL>
Data#
[data.inputs_dir]
Description: Specify the root directory containing structured input data with required sub-folders
dft/orgraph/(see Sectionsec::analysis_data_features). Accepts both relative paths (resolved from execution context) and absolute paths (system-native format).Default:
<Invalid-Input>Type:
<STRING>
[data.outputs_dir]
Description: The output directory designates the parent location for storing training artifacts (log files and serialized models). By default, the system automatically generates timestamped subdirectories (ISO 8601 extended format:
%Y-%m-%d_%H-%M-%S) within user-specified paths, isolating each training session’s outputs. Full structural specifications detailed in Sectionsec::after_training(Post-Training Workflows).Default:
<Invalid-Input>Type:
<STRING>
[data.dft.data_dir_depth]
Description: When organizing DFT training data subdirectories, if the number of structural configurations becomes excessive (e.g., reaching 100,000 structures), a flat directory structure under
inputs_dir/dft/*becomes impractical. In such cases, a hierarchical folder architecture is recommended, such asdft/<t1>/<t2>/*. This configuration establishes a directory depth of 2,data_dir_depth=2, providing a scalable storage solution for massive structural datasets while maintaining systematic accessibility.Default:
0Type:
<INT>
[data.graph.dataset_name]
Description: Name of your dataset.
Default:
"DATASET-DEMO"Type:
<STRING>
[data.graph.graph_type]
Description: The physical quantity used for training. DeepH will build a corresponding graph file. One can choose from
H,S,Rho,HS, andSap. TheHfor Hamiltonian,Rhofor density matrix,Sfor overlap,HSfor both Hamiltonian and overlap, andSapfor shape-only overlap. The vertices of the graph are atoms, and there is an edge between two atoms if they are close.Default:
"H"Type: [
"H","HS","Rho","Sap","S"]
[data.graph.storage_type]
Description: Where to store the graph data. Choose
"memory"to store in memory. For very large datasets, choose"disk"to store on disk.Default:
"memory"Type: [
"memory","disk"]
[data.graph.common_orbital_types]
Description: The list of one atom’s orbitals arranged according to the value of
l(angular quantum number). e.g.,s2p2d1. Default (set to"") is the union of orbital types of all different atoms in the dataset. For example, for basis setsMo-s3p2d1andS-s2p2d1in theOpenMXcalculation,the union of orbital types is[0, 0, 0, 1, 1, 2], which corresponds tos3p2d1.Default:
""Type:
<STRING>
[data.graph.parallel_num]
Description: Determines the maximum concurrent parallel processes allocated for graph construction. When configured with non-positive integers or values exceeding the available compute resources (i.e., surpassing either the host’s physical CPU core count or accelerator device quantity), the system will dynamically scale the parallelism based on hardware availability - specifically adopting the greater value between detected CPU cores and accelerator devices (GPUs) to optimize computational throughput.
Default:
-1Type:
<INT>
[data.graph.only_save_graph]
Description: If set to
true, the program will only generate and save the graph file to file-system and quit.Default:
falseType:
<BOOL>
[data.model_save.best]
Description: Whether to save the model with the lowest loss.
Default:
trueType:
<BOOL>
[data.model_save.latest]
Description: Whether to save the model in the latest training epoch.
Default:
trueType:
<BOOL>
[data.model_save.latest_interval]
Description: Only functional when
data.model_save.latestistrue. This parameter governs the checkpointing frequency for model state preservation during training. When configured with non-positive values or exceeding the maximum epoch count, the system enforces periodic snapshots at epoch multiples of the specified integer - e.g., settinglatest_interval=10 systematically generates training state archives at epochs 10, 20, 30, etc., implementing epoch-aligned preservation of complete model states (weights, optimizer parameters, and metadata).Default:
100Type:
<INT>
[data.model_save.latest_num]
Description: Only functional when
data.model_save.latestistrue. The number of latest checkpoints that user wants to keep.Default:
10Type:
<INT>
Model#
[model.net_type]
Description: The neural network architecture.
sparrow(also namednormal) is a light-weighted architecture (typically \(<\)1M parameters) with both node and edge features, which is suitable for small tasks of DFT Hamiltonian learning.eagle(also namedaccurate) is an advanced architecture (typically \(\sim\)5M parameters) with both node and edge features, which is suitable for tasks of DFT Hamiltonian learning that requires high accuracy.
Default:
"normal"Type: [
"sparrow","normal","eagle","accurate"]
[model.target_type]
Description: The physical quantity to learn.
Hfor Hamiltonian, andRhofor density matrix.Default:
"H"Type: [
"H","Rho"]
[model.loss_type]
Description: Loss function during training. The loss type not only support the supervised ones (
mse,mae, etc.) but also support on-the-fly unsupervised loss (likeai2dft,ai2dft_node,hopad, andaims).Default:
"mse"Type: [
"mae","mse","wmae","huber","ai2dft","ai2dft_node","hopad","aims"]
Model: Advanced#
[model.advanced.gaussian_basis_rmax]
Description: The cutoff radius used for Gaussian basis sampling, in angstrom. We suggest set this cutoff to 2\(\times\)the maximum cutoff radius of your orbital basis. Refer the paper of
DeepH-E3for details.Default:
7.5Type:
<FLOAT>
[model.advanced.net_irreps]
Description: Irreducible representations of the neural network features, which ensure the equivariance of the network. \ For
sparrow, theIrrepscan be set to$\cdots$\verb|x0e+|$\cdots$\verb|x1o+|$\cdots$\verb|x2e+|$\cdots$\verb|x3o+|$\cdots$\verb|x4e+|$\cdots$''. The channel $l$ has parity $(-1)^{l}$. \\ For `eagle` and `owl`, set to\(\cdots\)\verb|x0e+|\(\cdots\)\verb|x1e+|\(\cdots\)\verb|x2e+|\(\cdots\)\verb|x3e+|\(\cdots\)\verb|x4e+|\(\cdots\)’’. All the channels must have even parity. \ Note:64x1o'' means that the feature is a 64-channel tensor, where each channel has odd (o’’) parity and carries the \(l=1\) representation of the \(SO(3)\) group.\o/erefers to odd/even parity.\ Set in the form ofe3nn.Irreps, namely ``irreducible representations’’, which describes the symmetry of input features. e.g. in the form of ([(mul_l, (l, p_val \cdot (p_arg)^l)) \ \text{for } l \in [0, \dots, l_{max}]]).Default:
<Invalid-Input>Type:
<STRING>
[model.advanced.num_blocks]
Description: The number of neural network layers in the model. For usual task, we recommend set 3 for solids, 4 for small molecules.
Default:
3Type:
<INT>
[model.advanced.consider_parity]
Description: Whether the network is equivariant under parity. If set to
false, thenet_irrepscannot appear odd (e.g.,2x3o) representations. Ineagleandowlnet, theconsider_paritymust befalse.Default:
trueType:
<BOOL>
[model.advanced.standardize_gauge]
Description: Whether to consider arbitrariness of gauge (zero-energy point in DFT) when training the model. If set to
true,graph_typemust beHS. We suggest settruefor a general-purpose dataset (or whenever the training data has a zero-energy arbitrariness problem, e.g., a dataset with 2D slabs of different thicknesses),falsefor a special-purpose dataset.Default:
falseType:
<BOOL>
Process: Train#
[process.train.max_epoch]
Description: The maximum number of epochs. Training will automatically stop when
epoch_numberreachesmax_epoch.Default:
10000Type:
<INT>
[process.train.multi_way_jit_num]
Description: This helps accelerate the training process when the number of edges between different structures in the dataset varies greatly. Recommended value is 10-20. This may cause the first epoch of training to be slow and may cause out-of-memory error.
Default:
1Type:
<INT>
[process.train.ahead_of_time_compile]
Description: Whether to use
ahead-of-time(AOT) compilation to accelerate the JIT (or more precisely, compile) process. With the AOT method help, the multi-way JIT can speed up from 2 hours to 10 minutes.Default:
falseType:
<BOOL>
Process: Train: Dataloader#
[process.train.dataloader.batch_size]
Description: Batch size, number of structures in a batch.
Default:
1Type:
<INT>
[process.train.dataloader.train_size]
Description: Number of structures in the training dataset.
Default:
1Type:
<INT>
[process.train.dataloader.validate_size]
Description: Number of structures in the validation dataset.
Default:
0Type:
<INT>
[process.train.dataloader.test_size]
Description: Number of structures in the test dataset.
Default:
0Type:
<INT>
[process.train.dataloader.dataset_split_json]
Description: A JSON file path to execute dataset partitioning with customized rules. To employ the framework’s default splitting mechanism, set this parameter to an empty string (
""). The JSON configuration must adhere to the following schema specification:{"train": ["1", "3", "5"], "validate": ["2"], "test":["4","6"]}
Default:
""Type:
<STRING>
[process.train.dataloader.only_use_train_loss]
Description: Whether to adjust the learning rate based only on the train loss rather than considering both train and validation loss. If the validation set is empty (although this is not common), the validation loss is always a non-sense value, so it should be set to consider only train loss.
Default:
falseType:
<BOOL>
Process: Train: Optimizer#
[process.train.optimizer.type]
Description: The optimizer core type, can choose from:
adamw,adam, orsgd.Default:
"adamw"Type: [
"sgd","adam","adamw"]
[process.train.optimizer.init_learning_rate]
Description: Initial learning rate. We suggest use
2E-3fornet_type=sparrow,1E-3fornet_type=eagleorowl. Larger learning rate can speed up convergence while may be confronted with instabilities.Default:
2E-3Type:
<FLOAT>
[process.train.optimizer.clip_norm_factor]
Description: Enable the
clipnormalization algorithm in the optimizer to prevent large gradients that can cause neuron death. This feature will be disabled, if it was set with negative value.Default:
-1.0Type:
<FLOAT>
[process.train.optimizer.momentum]
Description: For
sgdoptimizer only. Controls the influence of previous gradients on the current gradient update.Default:
0.8Type:
<FLOAT>
[process.train.optimizer.betas]
Description: For
adamandadamwoptimizers.betas = [beta1, beta2], wherebeta1controls the exponential decay rate of first moment estimate (i.e., momentum), andbeta2controls the exponential decay rate of second moment estimate (i.e., the uncentered variance) usually used for calculating the moving average of squared gradients.Default:
[0.9, 0.999]Type:
<LIST-OF-FLOAT>
[process.train.optimizer.eps]
Description: For
adamandadamwoptimizers. A small constant to improve numerical stability. In some cases it is useful to decrease it to1E-10or lower.Default:
1E-8Type:
<FLOAT>
[process.train.optimizer.weight]
Description: For
adamwoptimizer only. The learning weight decay rate inadamw, to avoid overfitting.Default:
0.001Type:
<FLOAT>
Process: Train: Scheduler#
Within the optax optimization framework, learning rate decay operates through a decoupled control mechanism governed by the scale parameter. The effective learning rate for neural network updates is determined by the product: learning_rate (lr) = init_lr \(\times\) scale. A dedicated scheduler module systematically modulates this scaling factor throughout training, enabling implementation of various decay strategies (step-wise, exponential, or cosine decay) while maintaining architectural isolation between initialization values and decay dynamics.
[process.train.scheduler.min_learning_rate_scale]
Description: The minimum learning rate scale. Training will automatically stop when the learning rate scale reaches
min_learning_rate_scale.Default:
1E-4Type:
<FLOAT>
[process.train.scheduler.type]
Description: One can choose from:
"ReduceOnPlateau", and"WarmupCosineDecay".Default:
"reduce_on_plateau"Type: [
"reduce_on_plateau","warmup_cosine_decay"]
Reduce LR On Plateau#
[process.train.scheduler.factor]
Description: For
"reduce_on_plateau"only. Every time the learning rate adjustment is triggered, learning rate scale will be updated tofactor\(\times\)scale.Default:
0.5Type:
<FLOAT>
[process.train.scheduler.patience]
Description: For
"reduce_on_plateau"only. The number of evaluation steps to wait before reducing the learning rate. This helps determine when there is no further improvement in model performance. Be careful that by default, the number of patience step is defined based on the train steps (each batch), rather than epoch (each train-set). In most cases, we suggest to define this patience according to your own dataset scale, computing resources, and expected time cost. Usuallypatience=120epochs gives a well-converged model, andpatience=60epochs or less gives a relatively quick result.Default:
500Type:
<INT>
[process.train.scheduler.rtol]
Description: For
"reduce_on_plateau"only. Relative tolerance. A loss is considered no longer improving if its relative improvement over the previous best validation loss is less thanrtol.Default:
0.05Type:
<FLOAT>
[process.train.scheduler.cooldown]
Description: For
"reduce_on_plateau"only. The minimum number of evaluation cycles between two learning rate adjustments. This prevents frequent changes to the learning rate. Be careful that by default, the number of cooldown step is defined based on the train steps (each batch), rather than epoch (each train-set). In most cases,cooldown=20$\sim$50epochs is enough.Default:
100Type:
<INT>
[process.train.scheduler.accum_size]
Description: For
"reduce_on_plateau"only. Thepatienceandcooldownparameters operate on gradient update steps rather than epochs, with their baseline values calculated as training batches per epoch \(\times\) target epochs. For instance, given 100 batches in each epoch:To monitor validation loss for 20 epochs without gradient accumulation (
accum_size=1), set patience=100\(\times\)20=2000.With
accum_size=100 (effectively creating macro-batches of 100\(\times\)batch_size), each “step” becomes equivalent to 1 epoch, thuspatience=20 suffices.When patience=-1, the system auto-configures it as total training batches to implement full-epoch monitoring cycles. Cooldown steps follow equivalent computational logic based on gradient update granularity.
Default:
-1Type:
<INT>
Warmup Cosine Decay#
By combining warmup and cosine decay, the scheduler helps deep learning models converge faster and improves their performance.
[process.train.scheduler.init_scale]
Description: For
"warmup_cosine_decay"only. The initial scaling factor for learning rate scale.Default:
0.1Type:
<FLOAT>
[process.train.scheduler.warmup_steps]
Description: For
"warmup_cosine_decay"only. The number of steps for the learning rate scale to linearly increase frominit_scaleto1.0, just like “warmup” a model. This helps stabilize the model’s behavior during the early stages of training.Default:
1000Type:
<INT>
[process.train.scheduler.decay_steps]
Description: For
"warmup_cosine_decay"only. The number of steps for the learning rate scale to decay from1.0toend_scale. Note that training will stop early when the number of epochs reaches themax_epoch.Default:
2E5Type:
<INT>
[process.train.scheduler.end_scale]
Description: For
"warmup_cosine_decay"only. The scaling factor at the end of the learning rate scheduling. Set to -1.0 for the learning rate to decay to 0. Note that training will stop early when the learning rate scale is lower than themin_learning_rate_scale.Default:
-1.0Type:
<FLOAT>
Process: Train: Continued#
This section is for fine-tuning or continuing training on an existing model.
[process.train.continued.enable]
Description: Set to
truefor continued training or fine-tuning from an existing model, orfalsefor starting from scratch.Default:
falseType:
<BOOL>
[process.train.continued.new_training_data]
Description: Whether to use new training data. Set to
truefor fine-tuning like task, andfalsewhen running on the same dataset as the previous one.Default:
falseType:
<BOOL>
[process.train.continued.new_optimizer]
Description: Whether to use a new optimizer. Set to
truefor fine-tuning like task, andfalsewhen running on the same optimizer and scheduler as the previous one.Default:
falseType:
<BOOL>
[process.train.continued.previous_output_dir]
Description: Previous output directory with time stamp, which contains the
deepx.logfile and themodelfolder.Default:
<Invalid-Input>Type:
<STRING>
[process.train.continued.load_model_type]
Description: Continue training from the
bestorlatestmodel as needed.Default:
"latest"Type: [
"best","latest"]
[process.train.continued.load_model_epoch]
Description: For
load_model_type="latest"only. Specify a number for particular epoch exist saved in latest model folder. Use-1for the most latest epoch.Default:
"-1"Type:
<INT>