gliner.modeling.decoder module¶

Decoder modules for autoregressive text generation with optional constraints.

This module provides decoder architectures built on causal language models, supporting both standard generation and prefix-constrained decoding using trie structures. It includes custom generation implementations and numerical stability improvements.

class gliner.modeling.decoder.NumericalStabilityProcessor(epsilon=1e-06)[source]¶

Bases: LogitsProcessor

Logits processor that ensures numerical stability during generation.

This processor handles edge cases in logit values by replacing negative infinity values with the minimum representable value for the dtype, clamping extreme values, and adding a small epsilon for stability.

epsilon¶: Small constant added to logits for numerical stability.

Initializes the numerical stability processor.

Parameters:: epsilon (float) – Small constant to add to logits. Defaults to 1e-6.

__init__(epsilon=1e-06)[source]¶

Initializes the numerical stability processor.

Parameters:: epsilon (float) – Small constant to add to logits. Defaults to 1e-6.

class gliner.modeling.decoder.DecoderTransformer(model_name, config, from_pretrained=False, cache_dir=None)[source]¶

Bases: Module

Wrapper for causal language model decoders with adapter support.

This class provides a unified interface for autoregressive decoder models, supporting loading from pretrained weights or initialization from config. It also handles PEFT/LoRA adapter loading when available.

model¶: The underlying causal language model instance.

config¶: Configuration object containing model hyperparameters.

Initializes the decoder transformer.

Parameters:

model_name (str) – Name or path of the pretrained model to load.
config (Any) – Configuration object containing model hyperparameters. Must have a labels_decoder_config attribute.
from_pretrained (bool) – If True, loads pretrained weights. If False, initializes from config only. Defaults to False.
cache_dir (str | Path | None) – Optional directory for caching downloaded models. Defaults to None.

Raises:

Warning – If adapter config is found but PEFT package is not installed.

__init__(model_name, config, from_pretrained=False, cache_dir=None)[source]¶

Initializes the decoder transformer.

Parameters:

model_name (str) – Name or path of the pretrained model to load.
config (Any) – Configuration object containing model hyperparameters. Must have a labels_decoder_config attribute.
from_pretrained (bool) – If True, loads pretrained weights. If False, initializes from config only. Defaults to False.
cache_dir (str | Path | None) – Optional directory for caching downloaded models. Defaults to None.

Raises:

Warning – If adapter config is found but PEFT package is not installed.

forward(*args, **kwargs)[source]¶

Forward pass through the decoder model.

Parameters:

*args (Any) – Variable positional arguments passed to the model.
**kwargs (Any) – Variable keyword arguments passed to the model.

Returns:

Logits tensor of shape (batch_size, seq_len, vocab_size).

Return type:

Tensor

class gliner.modeling.decoder.Decoder(config, from_pretrained=False, cache_dir=None)[source]¶

Bases: Module

High-level decoder interface for autoregressive generation.

This class provides a unified interface for text generation from embeddings, supporting both standard generation and constrained decoding using trie structures. It includes custom generation implementations and integrates with Hugging Face’s generation API.

decoder_layer¶: The underlying DecoderTransformer instance.

decoder_hidden_size¶: Hidden dimension size of the decoder model.

Initializes the decoder.

Parameters:

config (Any) – Configuration object containing model hyperparameters including labels_decoder (model name) and decoder-specific settings.
from_pretrained (bool) – If True, loads pretrained weights for the decoder. Defaults to False.
cache_dir (str | Path | None) – Optional directory for caching downloaded models. Defaults to None.

__init__(config, from_pretrained=False, cache_dir=None)[source]¶

Initializes the decoder.

Parameters:

config (Any) – Configuration object containing model hyperparameters including labels_decoder (model name) and decoder-specific settings.
from_pretrained (bool) – If True, loads pretrained weights for the decoder. Defaults to False.
cache_dir (str | Path | None) – Optional directory for caching downloaded models. Defaults to None.

ids_to_embeds(input_ids)[source]¶

Converts token IDs to their corresponding embeddings.

Parameters:: input_ids (LongTensor) – Token IDs of shape (batch_size, seq_len).
Returns:: Token embeddings of shape (batch_size, seq_len, hidden_size).
Return type:: FloatTensor

generate_from_embeds_custom(inputs_embeds, attention_mask=None, max_new_tokens=32, eos_token_id=None, pad_token_id=None, temperature=1.0, do_sample=False, labels_trie=None, **kwargs)[source]¶

Custom generation implementation from embeddings with optional trie constraints.

This method implements token-by-token generation with KV caching and support for trie-based constrained decoding. Unlike the standard generate method, this implementation provides more control over the generation process and handles trie constraints at each step.

Parameters:

inputs_embeds (Tensor) – Input embeddings of shape (batch_size, prefix_len, hidden_size) serving as the generation prefix.
attention_mask (Tensor | None) – Optional attention mask of shape (batch_size, prefix_len). If None, assumes all prefix tokens are valid. Defaults to None.
max_new_tokens (int) – Maximum number of new tokens to generate. Defaults to 32.
eos_token_id (int | None) – Token ID marking end of sequence. If None, uses model’s default. Defaults to None.
pad_token_id (int | None) – Token ID for padding. If None, uses model’s default or eos_token_id. Defaults to None.
temperature (float) – Sampling temperature for controlling randomness. Values < 1 make distribution sharper, > 1 make it more uniform. Defaults to 1.0.
do_sample (bool) – If True, uses multinomial sampling. If False, uses greedy decoding (argmax). Defaults to False.
labels_trie (LabelsTrie | None) – Optional trie structure for constrained decoding. At each step, only tokens that follow valid trie paths are allowed. Defaults to None.
**kwargs (Any) – Additional keyword arguments (currently unused).

Returns:

Generated token IDs of shape (batch_size, generated_len) where generated_len varies per sequence based on when EOS is reached. Sequences are padded to the same length with pad_token_id.

Return type:

LongTensor

generate_from_embeds(inputs_embeds, attention_mask=None, max_new_tokens=32, eos_token_id=None, pad_token_id=None, temperature=1.0, do_sample=False, num_return_sequences=1, labels_trie=None, **kwargs)[source]¶

Generation from embeddings using Hugging Face’s generate API.

This method wraps the Hugging Face generate() function to support generation from embeddings with optional trie-based prefix constraints. It provides a more feature-complete interface than generate_from_embeds_custom but may be less flexible for custom generation logic.

Parameters:

inputs_embeds (Tensor) – Input embeddings of shape (batch_size, prefix_len, hidden_size) serving as the generation prefix.
attention_mask (Tensor | None) – Optional attention mask of shape (batch_size, prefix_len). If None, creates a mask of all ones. Defaults to None.
max_new_tokens (int) – Maximum number of new tokens to generate. Defaults to 32.
eos_token_id (int | None) – Token ID marking end of sequence. If None, uses model’s default. Defaults to None.
pad_token_id (int | None) – Token ID for padding. If None, uses model’s default or eos_token_id. Defaults to None.
temperature (float) – Sampling temperature for controlling randomness. Defaults to 1.0.
do_sample (bool) – If True, uses sampling. If False, uses greedy/beam search. Defaults to False.
num_return_sequences (int) – Number of sequences to generate per input. Also sets num_beams when > 1. Defaults to 1.
labels_trie (LabelsTrie | None) – Optional trie structure for constrained decoding via prefix_allowed_tokens_fn. Defaults to None.
**kwargs (Any) – Additional keyword arguments passed to model.generate().

Returns:

Generated token IDs of shape (batch_size * num_return_sequences, total_len) where total_len = prefix_len + generated_len. Includes both the input prefix and newly generated tokens.

Return type:

LongTensor

generate(*args, **kwargs)[source]¶

Flexible generation method supporting both embeddings and token IDs.

This method routes to the appropriate generation function based on whether inputs_embeds is provided. If inputs_embeds is in kwargs, uses generate_from_embeds(). Otherwise, delegates to the model’s native generate() method.

Parameters:

*args (Any) – Variable positional arguments passed to the generation method.
**kwargs (Any) – Variable keyword arguments. If ‘inputs_embeds’ is present, routes to generate_from_embeds(), otherwise routes to model.generate().

Returns:

Generated token IDs. Shape depends on the specific generation method used.

Return type:

LongTensor

forward(*args, **kwargs)[source]¶

Forward pass through the decoder.

Computes logits for the input sequence without generation.

Parameters:

*args (Any) – Variable positional arguments passed to the decoder layer.
**kwargs (Any) – Variable keyword arguments passed to the decoder layer.

Returns:

Logits tensor of shape (batch_size, seq_len, vocab_size).

Return type:

Tensor