gliner.modeling.layers module¶

class gliner.modeling.layers.LstmSeq2SeqEncoder(config, num_layers=1, dropout=0.0, bidirectional=True)[source]¶

Bases: Module

Bidirectional LSTM encoder for sequence-to-sequence models.

This encoder processes input sequences using a bidirectional LSTM and returns the encoded representations. It handles variable-length sequences through packing.

lstm¶: The bidirectional LSTM layer for encoding sequences.

Initializes the LSTM encoder.

Parameters:

config – Configuration object containing model hyperparameters. Must have a hidden_size attribute.
num_layers (int) – Number of recurrent layers. Defaults to 1.
dropout (float) – Dropout probability between LSTM layers. Defaults to 0.
bidirectional (bool) – If True, becomes a bidirectional LSTM. Defaults to True.

__init__(config, num_layers=1, dropout=0.0, bidirectional=True)[source]¶

Initializes the LSTM encoder.

Parameters:

config – Configuration object containing model hyperparameters. Must have a hidden_size attribute.
num_layers (int) – Number of recurrent layers. Defaults to 1.
dropout (float) – Dropout probability between LSTM layers. Defaults to 0.
bidirectional (bool) – If True, becomes a bidirectional LSTM. Defaults to True.

forward(x, mask, hidden=None)[source]¶

Encodes input sequences through the LSTM.

Parameters:

x (Tensor) – Input tensor of shape (batch_size, seq_len, hidden_size).
mask (Tensor) – Binary mask tensor of shape (batch_size, seq_len) where 1 indicates valid positions and 0 indicates padding.
hidden (Tuple[Tensor, Tensor] | None) – Optional initial hidden state tuple (h_0, c_0). Defaults to None.

Returns:

Encoded output tensor of shape (batch_size, seq_len, hidden_size).

Return type:

Tensor

gliner.modeling.layers.create_projection_layer(hidden_size, dropout, out_dim=None)[source]¶

Creates a two-layer projection network with ReLU activation and dropout.

The projection layer expands the input by 4x in the hidden layer before projecting to the output dimension.

Parameters:

hidden_size (int) – Size of the input hidden dimension.
dropout (float) – Dropout probability applied after the first layer.
out_dim (int | None) – Output dimension size. If None, uses hidden_size. Defaults to None.

Returns:

A Sequential module containing the projection layers.

Return type:

Sequential

class gliner.modeling.layers.MultiheadAttention(hidden_size, num_heads, dropout)[source]¶

Bases: Module

Multi-head scaled dot-product attention mechanism.

Implements multi-head attention where the hidden dimension is split across multiple attention heads. Uses PyTorch’s scaled_dot_product_attention for efficient computation.

hidden_size¶: Total hidden dimension size.

num_heads¶: Number of attention heads.

attention_head_size¶: Dimension of each attention head.

attention_probs_dropout_prob¶: Dropout probability for attention weights.

query_layer¶: Linear projection for query vectors.

key_layer¶: Linear projection for key vectors.

value_layer¶: Linear projection for value vectors.

Initializes the multi-head attention module.

Parameters:

hidden_size (int) – Size of the hidden dimension. Must be divisible by num_heads.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability for attention weights.

__init__(hidden_size, num_heads, dropout)[source]¶

Initializes the multi-head attention module.

Parameters:

hidden_size (int) – Size of the hidden dimension. Must be divisible by num_heads.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability for attention weights.

transpose_for_scores(x)[source]¶

Reshapes tensor for multi-head attention computation.

Transforms from (batch, seq_len, hidden) to (batch, num_heads, seq_len, head_dim).

Parameters:: x (Tensor) – Input tensor of shape (batch_size, seq_len, hidden_size).
Returns:: Reshaped tensor of shape (batch_size, num_heads, seq_len, attention_head_size).
Return type:: Tensor

forward(query, key=None, value=None, head_mask=None, attn_mask=None)[source]¶

Computes multi-head attention.

Parameters:

query (Tensor) – Query tensor of shape (batch_size, seq_len, hidden_size).
key (Tensor | None) – Optional key tensor. If None, uses query. Defaults to None.
value (Tensor | None) – Optional value tensor. If None, uses key or query. Defaults to None.
head_mask (Tensor | None) – Optional mask for attention heads. Defaults to None.
attn_mask (Tensor | None) – Optional attention mask. Defaults to None.

Returns:

context_layer: Attention output of shape (batch_size, seq_len, hidden_size).
None: Placeholder for attention weights (not returned).

Return type:

A tuple containing

class gliner.modeling.layers.SelfAttentionBlock(d_model, num_heads, dropout=0.1)[source]¶

Bases: Module

Self-attention block with pre-normalization and residual connection.

Implements a standard transformer-style self-attention block with layer normalization before and after the attention operation.

self_attn¶: Multi-head self-attention module.

pre_norm¶: Layer normalization applied before attention.

post_norm¶: Layer normalization applied after residual connection.

dropout¶: Dropout layer for attention output.

q_proj¶: Linear projection for queries.

k_proj¶: Linear projection for keys.

v_proj¶: Linear projection for values.

Initializes the self-attention block.

Parameters:

d_model (int) – Model dimension size.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability. Defaults to 0.1.

__init__(d_model, num_heads, dropout=0.1)[source]¶

Initializes the self-attention block.

Parameters:

d_model (int) – Model dimension size.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability. Defaults to 0.1.

forward(x, mask=None)[source]¶

Applies self-attention to input tensor.

Parameters:

x (Tensor) – Input tensor of shape (batch_size, seq_len, d_model).
mask (Tensor | None) – Optional attention mask. Defaults to None.

Returns:

Output tensor of shape (batch_size, seq_len, d_model).

Return type:

Tensor

class gliner.modeling.layers.CrossAttentionBlock(d_model, num_heads, dropout=0.1)[source]¶

Bases: Module

Cross-attention block with pre-normalization and residual connection.

Implements cross-attention between query and key-value pairs, typically used for attending from one sequence to another.

cross_attn¶: Multi-head cross-attention module.

pre_norm¶: Layer normalization applied to query before attention.

post_norm¶: Layer normalization applied after residual connection.

dropout¶: Dropout layer for attention output.

v_proj¶: Linear projection for values.

Initializes the cross-attention block.

Parameters:

d_model (int) – Model dimension size.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability. Defaults to 0.1.

__init__(d_model, num_heads, dropout=0.1)[source]¶

Initializes the cross-attention block.

Parameters:

d_model (int) – Model dimension size.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability. Defaults to 0.1.

forward(query, key, value=None, mask=None)[source]¶

Applies cross-attention from query to key-value pairs.

Parameters:

query (Tensor) – Query tensor of shape (batch_size, query_len, d_model).
key (Tensor) – Key tensor of shape (batch_size, key_len, d_model).
value (Tensor | None) – Optional value tensor. If None, derived from key. Defaults to None.
mask (Tensor | None) – Optional attention mask. Defaults to None.

Returns:

Output tensor of shape (batch_size, query_len, d_model).

Return type:

Tensor

class gliner.modeling.layers.CrossFuser(d_model, query_dim, num_heads=8, num_layers=1, dropout=0.1, schema='l2l-l2t')[source]¶

Bases: Module

Flexible cross-attention fusion module with configurable attention patterns.

Fuses two sequences using a configurable schema of self-attention and cross-attention operations. The schema defines the order and type of attention operations to apply.

Schema notation:

‘l2l’: Self-attention on label sequence
‘t2t’: Self-attention on text sequence
‘l2t’: Cross-attention from label to text
‘t2l’: Cross-attention from text to label

d_model¶: Model dimension size.

schema¶: List of attention operation types parsed from schema string.

layers¶: ModuleList of attention layers organized by depth.

Initializes the cross-fusion module.

Parameters:

d_model (int) – Model dimension size.
query_dim (int) – Dimension of query input (currently unused).
num_heads (int) – Number of attention heads. Defaults to 8.
num_layers (int) – Number of attention layers. Defaults to 1.
dropout (float) – Dropout probability. Defaults to 0.1.
schema (str) – String defining attention pattern (e.g., ‘l2l-l2t-t2t’). Defaults to ‘l2l-l2t’.

__init__(d_model, query_dim, num_heads=8, num_layers=1, dropout=0.1, schema='l2l-l2t')[source]¶

Initializes the cross-fusion module.

Parameters:

d_model (int) – Model dimension size.
query_dim (int) – Dimension of query input (currently unused).
num_heads (int) – Number of attention heads. Defaults to 8.
num_layers (int) – Number of attention layers. Defaults to 1.
dropout (float) – Dropout probability. Defaults to 0.1.
schema (str) – String defining attention pattern (e.g., ‘l2l-l2t-t2t’). Defaults to ‘l2l-l2t’.

forward(query, key, query_mask=None, key_mask=None)[source]¶

Applies cross-fusion between query and key sequences.

Parameters:

query (Tensor) – Query tensor of shape (batch_size, query_len, d_model).
key (Tensor) – Key tensor of shape (batch_size, key_len, d_model).
query_mask (Tensor | None) – Optional binary mask for query (1 = valid, 0 = padding). Shape (batch_size, query_len). Defaults to None.
key_mask (Tensor | None) – Optional binary mask for key (1 = valid, 0 = padding). Shape (batch_size, key_len). Defaults to None.

Returns:

query: Fused query tensor of shape (batch_size, query_len, d_model).
key: Fused key tensor of shape (batch_size, key_len, d_model).

Return type:

A tuple containing

class gliner.modeling.layers.LayersFuser(num_layers, hidden_size, output_size=None)[source]¶

Bases: Module

Fuses multiple encoder layer outputs using squeeze-and-excitation mechanism.

Combines outputs from different encoder layers by learning adaptive weights for each layer using a squeeze-and-excitation style attention mechanism. The first layer in encoder_outputs is skipped during fusion.

num_layers¶: Number of encoder layers to fuse.

hidden_size¶: Hidden dimension size of encoder outputs.

output_size¶: Size of the final output projection.

squeeze¶: Linear layer for squeeze operation.

W1¶: First linear layer of excitation network.

W2¶: Second linear layer of excitation network.

output_projection¶: Final projection to output dimension.

Initializes the layer fusion module.

Parameters:

num_layers (int) – Number of encoder layers to fuse.
hidden_size (int) – Hidden dimension size.
output_size (int | None) – Output dimension size. If None, uses hidden_size. Defaults to None.

__init__(num_layers, hidden_size, output_size=None)[source]¶

Initializes the layer fusion module.

Parameters:

num_layers (int) – Number of encoder layers to fuse.
hidden_size (int) – Hidden dimension size.
output_size (int | None) – Output dimension size. If None, uses hidden_size. Defaults to None.

forward(encoder_outputs)[source]¶

Fuses multiple encoder layer outputs into a single representation.

Parameters:: encoder_outputs (List[Tensor]) – List of encoder output tensors, each of shape (batch_size, seq_len, hidden_size). The first element is skipped.
Returns:: Fused output tensor of shape (batch_size, seq_len, output_size).
Return type:: Tensor