gliner.modeling.layers module¶

class gliner.modeling.layers.LstmSeq2SeqEncoder(config, num_layers=1, dropout=0.0, bidirectional=True)[source]¶

Bases: Module

Bidirectional LSTM encoder for sequence-to-sequence models.

This encoder processes input sequences using a bidirectional LSTM and returns the encoded representations. It handles variable-length sequences through packing.

lstm¶

The bidirectional LSTM layer for encoding sequences.

Initializes the LSTM encoder.

Parameters:
  • config – Configuration object containing model hyperparameters. Must have a hidden_size attribute.

  • num_layers (int) – Number of recurrent layers. Defaults to 1.

  • dropout (float) – Dropout probability between LSTM layers. Defaults to 0.

  • bidirectional (bool) – If True, becomes a bidirectional LSTM. Defaults to True.

__init__(config, num_layers=1, dropout=0.0, bidirectional=True)[source]¶

Initializes the LSTM encoder.

Parameters:
  • config – Configuration object containing model hyperparameters. Must have a hidden_size attribute.

  • num_layers (int) – Number of recurrent layers. Defaults to 1.

  • dropout (float) – Dropout probability between LSTM layers. Defaults to 0.

  • bidirectional (bool) – If True, becomes a bidirectional LSTM. Defaults to True.

forward(x, mask, hidden=None)[source]¶

Encodes input sequences through the LSTM.

Parameters:
  • x (Tensor) – Input tensor of shape (batch_size, seq_len, hidden_size).

  • mask (Tensor) – Binary mask tensor of shape (batch_size, seq_len) where 1 indicates valid positions and 0 indicates padding.

  • hidden (Tuple[Tensor, Tensor] | None) – Optional initial hidden state tuple (h_0, c_0). Defaults to None.

Returns:

Encoded output tensor of shape (batch_size, seq_len, hidden_size).

Return type:

Tensor

gliner.modeling.layers.create_projection_layer(hidden_size, dropout, out_dim=None)[source]¶

Creates a two-layer projection network with ReLU activation and dropout.

The projection layer expands the input by 4x in the hidden layer before projecting to the output dimension.

Parameters:
  • hidden_size (int) – Size of the input hidden dimension.

  • dropout (float) – Dropout probability applied after the first layer.

  • out_dim (int | None) – Output dimension size. If None, uses hidden_size. Defaults to None.

Returns:

A Sequential module containing the projection layers.

Return type:

Sequential

class gliner.modeling.layers.MultiheadAttention(hidden_size, num_heads, dropout)[source]¶

Bases: Module

Multi-head scaled dot-product attention mechanism.

Implements multi-head attention where the hidden dimension is split across multiple attention heads. Uses PyTorch’s scaled_dot_product_attention for efficient computation.

hidden_size¶

Total hidden dimension size.

num_heads¶

Number of attention heads.

attention_head_size¶

Dimension of each attention head.

attention_probs_dropout_prob¶

Dropout probability for attention weights.

query_layer¶

Linear projection for query vectors.

key_layer¶

Linear projection for key vectors.

value_layer¶

Linear projection for value vectors.

Initializes the multi-head attention module.

Parameters:
  • hidden_size (int) – Size of the hidden dimension. Must be divisible by num_heads.

  • num_heads (int) – Number of attention heads.

  • dropout (float) – Dropout probability for attention weights.

__init__(hidden_size, num_heads, dropout)[source]¶

Initializes the multi-head attention module.

Parameters:
  • hidden_size (int) – Size of the hidden dimension. Must be divisible by num_heads.

  • num_heads (int) – Number of attention heads.

  • dropout (float) – Dropout probability for attention weights.

transpose_for_scores(x)[source]¶

Reshapes tensor for multi-head attention computation.

Transforms from (batch, seq_len, hidden) to (batch, num_heads, seq_len, head_dim).

Parameters:

x (Tensor) – Input tensor of shape (batch_size, seq_len, hidden_size).

Returns:

Reshaped tensor of shape (batch_size, num_heads, seq_len, attention_head_size).

Return type:

Tensor

forward(query, key=None, value=None, head_mask=None, attn_mask=None)[source]¶

Computes multi-head attention.

Parameters:
  • query (Tensor) – Query tensor of shape (batch_size, seq_len, hidden_size).

  • key (Tensor | None) – Optional key tensor. If None, uses query. Defaults to None.

  • value (Tensor | None) – Optional value tensor. If None, uses key or query. Defaults to None.

  • head_mask (Tensor | None) – Optional mask for attention heads. Defaults to None.

  • attn_mask (Tensor | None) – Optional attention mask. Defaults to None.

Returns:

  • context_layer: Attention output of shape (batch_size, seq_len, hidden_size).

  • None: Placeholder for attention weights (not returned).

Return type:

A tuple containing

class gliner.modeling.layers.SelfAttentionBlock(d_model, num_heads, dropout=0.1)[source]¶

Bases: Module

Self-attention block with pre-normalization and residual connection.

Implements a standard transformer-style self-attention block with layer normalization before and after the attention operation.

self_attn¶

Multi-head self-attention module.

pre_norm¶

Layer normalization applied before attention.

post_norm¶

Layer normalization applied after residual connection.

dropout¶

Dropout layer for attention output.

q_proj¶

Linear projection for queries.

k_proj¶

Linear projection for keys.

v_proj¶

Linear projection for values.

Initializes the self-attention block.

Parameters:
  • d_model (int) – Model dimension size.

  • num_heads (int) – Number of attention heads.

  • dropout (float) – Dropout probability. Defaults to 0.1.

__init__(d_model, num_heads, dropout=0.1)[source]¶

Initializes the self-attention block.

Parameters:
  • d_model (int) – Model dimension size.

  • num_heads (int) – Number of attention heads.

  • dropout (float) – Dropout probability. Defaults to 0.1.

forward(x, mask=None)[source]¶

Applies self-attention to input tensor.

Parameters:
  • x (Tensor) – Input tensor of shape (batch_size, seq_len, d_model).

  • mask (Tensor | None) – Optional attention mask. Defaults to None.

Returns:

Output tensor of shape (batch_size, seq_len, d_model).

Return type:

Tensor

class gliner.modeling.layers.CrossAttentionBlock(d_model, num_heads, dropout=0.1)[source]¶

Bases: Module

Cross-attention block with pre-normalization and residual connection.

Implements cross-attention between query and key-value pairs, typically used for attending from one sequence to another.

cross_attn¶

Multi-head cross-attention module.

pre_norm¶

Layer normalization applied to query before attention.

post_norm¶

Layer normalization applied after residual connection.

dropout¶

Dropout layer for attention output.

v_proj¶

Linear projection for values.

Initializes the cross-attention block.

Parameters:
  • d_model (int) – Model dimension size.

  • num_heads (int) – Number of attention heads.

  • dropout (float) – Dropout probability. Defaults to 0.1.

__init__(d_model, num_heads, dropout=0.1)[source]¶

Initializes the cross-attention block.

Parameters:
  • d_model (int) – Model dimension size.

  • num_heads (int) – Number of attention heads.

  • dropout (float) – Dropout probability. Defaults to 0.1.

forward(query, key, value=None, mask=None)[source]¶

Applies cross-attention from query to key-value pairs.

Parameters:
  • query (Tensor) – Query tensor of shape (batch_size, query_len, d_model).

  • key (Tensor) – Key tensor of shape (batch_size, key_len, d_model).

  • value (Tensor | None) – Optional value tensor. If None, derived from key. Defaults to None.

  • mask (Tensor | None) – Optional attention mask. Defaults to None.

Returns:

Output tensor of shape (batch_size, query_len, d_model).

Return type:

Tensor

class gliner.modeling.layers.CrossFuser(d_model, query_dim, num_heads=8, num_layers=1, dropout=0.1, schema='l2l-l2t')[source]¶

Bases: Module

Flexible cross-attention fusion module with configurable attention patterns.

Fuses two sequences using a configurable schema of self-attention and cross-attention operations. The schema defines the order and type of attention operations to apply.

Schema notation:
  • ‘l2l’: Self-attention on label sequence

  • ‘t2t’: Self-attention on text sequence

  • ‘l2t’: Cross-attention from label to text

  • ‘t2l’: Cross-attention from text to label

d_model¶

Model dimension size.

schema¶

List of attention operation types parsed from schema string.

layers¶

ModuleList of attention layers organized by depth.

Initializes the cross-fusion module.

Parameters:
  • d_model (int) – Model dimension size.

  • query_dim (int) – Dimension of query input (currently unused).

  • num_heads (int) – Number of attention heads. Defaults to 8.

  • num_layers (int) – Number of attention layers. Defaults to 1.

  • dropout (float) – Dropout probability. Defaults to 0.1.

  • schema (str) – String defining attention pattern (e.g., ‘l2l-l2t-t2t’). Defaults to ‘l2l-l2t’.

__init__(d_model, query_dim, num_heads=8, num_layers=1, dropout=0.1, schema='l2l-l2t')[source]¶

Initializes the cross-fusion module.

Parameters:
  • d_model (int) – Model dimension size.

  • query_dim (int) – Dimension of query input (currently unused).

  • num_heads (int) – Number of attention heads. Defaults to 8.

  • num_layers (int) – Number of attention layers. Defaults to 1.

  • dropout (float) – Dropout probability. Defaults to 0.1.

  • schema (str) – String defining attention pattern (e.g., ‘l2l-l2t-t2t’). Defaults to ‘l2l-l2t’.

forward(query, key, query_mask=None, key_mask=None)[source]¶

Applies cross-fusion between query and key sequences.

Parameters:
  • query (Tensor) – Query tensor of shape (batch_size, query_len, d_model).

  • key (Tensor) – Key tensor of shape (batch_size, key_len, d_model).

  • query_mask (Tensor | None) – Optional binary mask for query (1 = valid, 0 = padding). Shape (batch_size, query_len). Defaults to None.

  • key_mask (Tensor | None) – Optional binary mask for key (1 = valid, 0 = padding). Shape (batch_size, key_len). Defaults to None.

Returns:

  • query: Fused query tensor of shape (batch_size, query_len, d_model).

  • key: Fused key tensor of shape (batch_size, key_len, d_model).

Return type:

A tuple containing

class gliner.modeling.layers.LayersFuser(num_layers, hidden_size, output_size=None)[source]¶

Bases: Module

Fuses multiple encoder layer outputs using squeeze-and-excitation mechanism.

Combines outputs from different encoder layers by learning adaptive weights for each layer using a squeeze-and-excitation style attention mechanism. The first layer in encoder_outputs is skipped during fusion.

num_layers¶

Number of encoder layers to fuse.

hidden_size¶

Hidden dimension size of encoder outputs.

output_size¶

Size of the final output projection.

squeeze¶

Linear layer for squeeze operation.

W1¶

First linear layer of excitation network.

W2¶

Second linear layer of excitation network.

output_projection¶

Final projection to output dimension.

Initializes the layer fusion module.

Parameters:
  • num_layers (int) – Number of encoder layers to fuse.

  • hidden_size (int) – Hidden dimension size.

  • output_size (int | None) – Output dimension size. If None, uses hidden_size. Defaults to None.

__init__(num_layers, hidden_size, output_size=None)[source]¶

Initializes the layer fusion module.

Parameters:
  • num_layers (int) – Number of encoder layers to fuse.

  • hidden_size (int) – Hidden dimension size.

  • output_size (int | None) – Output dimension size. If None, uses hidden_size. Defaults to None.

forward(encoder_outputs)[source]¶

Fuses multiple encoder layer outputs into a single representation.

Parameters:

encoder_outputs (List[Tensor]) – List of encoder output tensors, each of shape (batch_size, seq_len, hidden_size). The first element is skipped.

Returns:

Fused output tensor of shape (batch_size, seq_len, output_size).

Return type:

Tensor