gliner.training.trainer module¶

Custom Trainer implementation with enhanced loss functions and optimizer configuration.

This module extends the Hugging Face Transformers Trainer class to support custom loss functions (focal loss, label smoothing), flexible learning rates for different parameter groups, and robust error handling during training.

gliner.training.trainer.seed_worker(_)[source]¶

Set worker seed during DataLoader initialization.

Helper function to ensure reproducibility by seeding each DataLoader worker process with a unique but deterministic seed based on PyTorch’s initial seed.

Parameters:: _ – Worker ID (unused, but required by DataLoader worker_init_fn signature).

class gliner.training.trainer.TrainingArguments(output_dir=None, overwrite_output_dir=False, do_train=False, do_eval=False, do_predict=False, eval_strategy='no', prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, eval_delay=0, torch_empty_cache_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type='linear', lr_scheduler_kwargs=None, warmup_ratio=0.0, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir=None, logging_strategy='steps', logging_first_step=False, logging_steps=500, logging_nan_inf_filter=True, save_strategy='steps', save_steps=500, save_total_limit=None, save_safetensors=True, save_on_each_node=False, save_only_model=False, restore_callback_states_from_checkpoint=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=None, jit_mode_eval=False, bf16=False, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=-1, ddp_backend=None, tpu_num_cores=None, tpu_metrics_debug=False, debug='', dataloader_drop_last=False, eval_steps=None, dataloader_num_workers=0, dataloader_prefetch_factor=None, past_index=-1, run_name=None, disable_tqdm=None, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, fsdp=None, fsdp_min_num_params=0, fsdp_config=None, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=None, parallelism_config=None, deepspeed=None, label_smoothing_factor=0.0, optim='adamw_torch', optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=None, project='huggingface', trackio_space_id='trackio', ddp_find_unused_parameters=None, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=None, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy='every_save', hub_token=None, hub_private_repo=None, hub_always_push=False, hub_revision=None, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, include_for_metrics=<factory>, eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=1800, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, include_tokens_per_second=False, include_num_input_tokens_seen=False, neftune_noise_alpha=None, optim_target_modules=None, batch_eval_metrics=False, eval_on_start=False, use_liger_kernel=False, liger_kernel_config=None, eval_use_gather_object=False, average_tokens_across_devices=True, cache_dir=None, others_lr=None, others_weight_decay=0.0, focal_loss_alpha=-1, focal_loss_gamma=0, focal_loss_prob_margin=0, label_smoothing=0, loss_reduction='sum', negatives=1.0, masking='global')[source]¶

Bases: TrainingArguments

Extended training arguments with custom loss and optimization parameters.

Extends the standard Hugging Face TrainingArguments with additional parameters for focal loss, label smoothing, differential learning rates, and custom negative sampling strategies.

cache_dir¶

Directory to cache downloaded models and datasets.

Type:: str | None

optim¶

Optimizer to use. Defaults to “adamw_torch”.

Type:: str

others_lr¶

Optional separate learning rate for non-encoder parameters (e.g., classification heads). If None, uses the main learning rate.

Type:: float | None

others_weight_decay¶

Weight decay for non-encoder parameters when using others_lr. Defaults to 0.0.

Type:: float | None

focal_loss_alpha¶

Alpha parameter for focal loss. Values < 0 disable focal loss weighting. Defaults to -1.

Type:: float | None

focal_loss_gamma¶

Gamma (focusing parameter) for focal loss. Higher values increase focus on hard examples. Defaults to 0.

Type:: float | None

focal_loss_prob_margin¶

Probability margin for focal loss computation. Defaults to 0.

Type:: float | None

label_smoothing¶

Label smoothing factor. 0.0 means no smoothing. Defaults to 0.

Type:: float | None

loss_reduction¶

Reduction method for loss (‘sum’, ‘mean’, or ‘none’). Defaults to ‘sum’.

Type:: str | None

negatives¶

Ratio of negative samples to use. Defaults to 1.0.

Type:: float | None

masking¶

Masking strategy for training (‘global’ or other strategies). Defaults to ‘global’.

Type:: str | None

cache_dir: str | None = None¶

optim: str = 'adamw_torch'¶

others_lr: float | None = None¶

others_weight_decay: float | None = 0.0¶

focal_loss_alpha: float | None = -1¶

focal_loss_gamma: float | None = 0¶

focal_loss_prob_margin: float | None = 0¶

label_smoothing: float | None = 0¶

loss_reduction: str | None = 'sum'¶

negatives: float | None = 1.0¶

masking: str | None = 'global'¶

__init__(output_dir=None, overwrite_output_dir=False, do_train=False, do_eval=False, do_predict=False, eval_strategy='no', prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, eval_delay=0, torch_empty_cache_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, lr_scheduler_type='linear', lr_scheduler_kwargs=None, warmup_ratio=0.0, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir=None, logging_strategy='steps', logging_first_step=False, logging_steps=500, logging_nan_inf_filter=True, save_strategy='steps', save_steps=500, save_total_limit=None, save_safetensors=True, save_on_each_node=False, save_only_model=False, restore_callback_states_from_checkpoint=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=None, jit_mode_eval=False, bf16=False, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=-1, ddp_backend=None, tpu_num_cores=None, tpu_metrics_debug=False, debug='', dataloader_drop_last=False, eval_steps=None, dataloader_num_workers=0, dataloader_prefetch_factor=None, past_index=-1, run_name=None, disable_tqdm=None, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, fsdp=None, fsdp_min_num_params=0, fsdp_config=None, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=None, parallelism_config=None, deepspeed=None, label_smoothing_factor=0.0, optim='adamw_torch', optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=None, project='huggingface', trackio_space_id='trackio', ddp_find_unused_parameters=None, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=None, dataloader_pin_memory=True, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy='every_save', hub_token=None, hub_private_repo=None, hub_always_push=False, hub_revision=None, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, include_for_metrics=<factory>, eval_do_concat_batches=True, fp16_backend='auto', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=1800, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, include_tokens_per_second=False, include_num_input_tokens_seen=False, neftune_noise_alpha=None, optim_target_modules=None, batch_eval_metrics=False, eval_on_start=False, use_liger_kernel=False, liger_kernel_config=None, eval_use_gather_object=False, average_tokens_across_devices=True, cache_dir=None, others_lr=None, others_weight_decay=0.0, focal_loss_alpha=-1, focal_loss_gamma=0, focal_loss_prob_margin=0, label_smoothing=0, loss_reduction='sum', negatives=1.0, masking='global')¶

class gliner.training.trainer.Trainer(model=None, args=None, data_collator=None, train_dataset=None, eval_dataset=None, processing_class=None, model_init=None, compute_loss_func=None, compute_metrics=None, callbacks=None, optimizers=(None, None), optimizer_cls_and_kwargs=None, preprocess_logits_for_metrics=None)[source]¶

Bases: Trainer

Transformers v4/v5 compatible custom Trainer. - v5-safe method signatures (num_items_in_batch) - no hard dependency on self.use_apex - skips only OOM by default (other exceptions are raised so you don’t silently get 0 loss)

save_model(output_dir=None, _internal_call=False)[source]¶

Will save the model, so you can reload it using from_pretrained().

Will only save from the main process.

property use_apex: bool¶

compute_loss(model, inputs, return_outputs=False, num_items_in_batch=None)[source]¶

How the loss is computed by Trainer. By default, all models return the loss in the first element.

Parameters:

model (nn.Module) – The model to compute the loss for.
inputs (dict[str, Union[torch.Tensor, Any]]) – The input data for the model.
return_outputs (bool, optional, defaults to False) – Whether to return the model outputs along with the loss.
num_items_in_batch (Optional[torch.Tensor], optional) – The number of items in the batch. If num_items_in_batch is not passed,

Returns:

The loss of the model along with its output if return_outputs was set to True

Subclass and override for custom behavior. If you are not using num_items_in_batch when computing your loss, make sure to overwrite self.model_accepts_loss_kwargs to False. Otherwise, the loss calculating might be slightly inaccurate when performing gradient accumulation.

training_step(model, inputs, num_items_in_batch=None)[source]¶

Perform a training step on a batch of inputs.

Subclass and override to inject custom behavior.

Parameters:

model (nn.Module) – The model to train.
inputs (dict[str, Union[torch.Tensor, Any]]) –
The inputs and targets of the model.

The dictionary will be unpacked before being fed to the model. Most models expect the targets under the argument labels. Check your model’s documentation for all accepted arguments.

Returns:

The tensor with training loss on this batch.

Return type:

torch.Tensor

create_optimizer()[source]¶

Setup the optimizer.

We provide a reasonable default that works well. If you want to use something else, you can pass a tuple in the Trainer’s init through optimizers, or subclass and override this method in a subclass.

prediction_step(model, inputs, prediction_loss_only, ignore_keys=None)[source]¶

Perform an evaluation step on model using inputs.

Subclass and override to inject custom behavior.

Parameters:

model (nn.Module) – The model to evaluate.
inputs (dict[str, Union[torch.Tensor, Any]]) –
The inputs and targets of the model.

The dictionary will be unpacked before being fed to the model. Most models expect the targets under the argument labels. Check your model’s documentation for all accepted arguments.
prediction_loss_only (bool) – Whether or not to return the loss only.
ignore_keys (list[str], optional) – A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions.

Returns:

A tuple with the loss, logits and labels (each being optional).

Return type:

tuple[Optional[torch.Tensor], Optional[torch.Tensor], Optional[torch.Tensor]]

get_train_dataloader()[source]¶

Returns the training [~torch.utils.data.DataLoader].

Will use no sampler if train_dataset does not implement __len__, a random sampler (adapted to distributed training if necessary) otherwise.

Subclass and override this method if you want to inject some custom behavior.

get_eval_dataloader(eval_dataset=None)[source]¶

Returns the evaluation [~torch.utils.data.DataLoader].

Subclass and override this method if you want to inject some custom behavior.

Parameters:: eval_dataset (str or torch.utils.data.Dataset, optional) – If a str, will use self.eval_dataset[eval_dataset] as the evaluation dataset. If a Dataset, will override self.eval_dataset and must implement __len__. If it is a [~datasets.Dataset], columns not accepted by the model.forward() method are automatically removed.