gliner.modeling.layers module¶
- class gliner.modeling.layers.LstmSeq2SeqEncoder(config, num_layers=1, dropout=0.0, bidirectional=True)[source]¶
Bases:
ModuleBidirectional LSTM encoder for sequence-to-sequence models.
This encoder processes input sequences using a bidirectional LSTM and returns the encoded representations. It handles variable-length sequences through packing.
- lstm¶
The bidirectional LSTM layer for encoding sequences.
Initializes the LSTM encoder.
- Parameters:
config – Configuration object containing model hyperparameters. Must have a hidden_size attribute.
num_layers (int) – Number of recurrent layers. Defaults to 1.
dropout (float) – Dropout probability between LSTM layers. Defaults to 0.
bidirectional (bool) – If True, becomes a bidirectional LSTM. Defaults to True.
- __init__(config, num_layers=1, dropout=0.0, bidirectional=True)[source]¶
Initializes the LSTM encoder.
- Parameters:
config – Configuration object containing model hyperparameters. Must have a hidden_size attribute.
num_layers (int) – Number of recurrent layers. Defaults to 1.
dropout (float) – Dropout probability between LSTM layers. Defaults to 0.
bidirectional (bool) – If True, becomes a bidirectional LSTM. Defaults to True.
- forward(x, mask, hidden=None)[source]¶
Encodes input sequences through the LSTM.
- Parameters:
x (Tensor) – Input tensor of shape (batch_size, seq_len, hidden_size).
mask (Tensor) – Binary mask tensor of shape (batch_size, seq_len) where 1 indicates valid positions and 0 indicates padding.
hidden (Tuple[Tensor, Tensor] | None) – Optional initial hidden state tuple (h_0, c_0). Defaults to None.
- Returns:
Encoded output tensor of shape (batch_size, seq_len, hidden_size).
- Return type:
Tensor
- gliner.modeling.layers.create_projection_layer(hidden_size, dropout, out_dim=None)[source]¶
Creates a two-layer projection network with ReLU activation and dropout.
The projection layer expands the input by 4x in the hidden layer before projecting to the output dimension.
- Parameters:
hidden_size (int) – Size of the input hidden dimension.
dropout (float) – Dropout probability applied after the first layer.
out_dim (int | None) – Output dimension size. If None, uses hidden_size. Defaults to None.
- Returns:
A Sequential module containing the projection layers.
- Return type:
Sequential
- class gliner.modeling.layers.MultiheadAttention(hidden_size, num_heads, dropout)[source]¶
Bases:
ModuleMulti-head scaled dot-product attention mechanism.
Implements multi-head attention where the hidden dimension is split across multiple attention heads. Uses PyTorch’s scaled_dot_product_attention for efficient computation.
Total hidden dimension size.
- num_heads¶
Number of attention heads.
- attention_head_size¶
Dimension of each attention head.
- attention_probs_dropout_prob¶
Dropout probability for attention weights.
- query_layer¶
Linear projection for query vectors.
- key_layer¶
Linear projection for key vectors.
- value_layer¶
Linear projection for value vectors.
Initializes the multi-head attention module.
- Parameters:
hidden_size (int) – Size of the hidden dimension. Must be divisible by num_heads.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability for attention weights.
- __init__(hidden_size, num_heads, dropout)[source]¶
Initializes the multi-head attention module.
- Parameters:
hidden_size (int) – Size of the hidden dimension. Must be divisible by num_heads.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability for attention weights.
- transpose_for_scores(x)[source]¶
Reshapes tensor for multi-head attention computation.
Transforms from (batch, seq_len, hidden) to (batch, num_heads, seq_len, head_dim).
- Parameters:
x (Tensor) – Input tensor of shape (batch_size, seq_len, hidden_size).
- Returns:
Reshaped tensor of shape (batch_size, num_heads, seq_len, attention_head_size).
- Return type:
Tensor
- forward(query, key=None, value=None, head_mask=None, attn_mask=None)[source]¶
Computes multi-head attention.
- Parameters:
query (Tensor) – Query tensor of shape (batch_size, seq_len, hidden_size).
key (Tensor | None) – Optional key tensor. If None, uses query. Defaults to None.
value (Tensor | None) – Optional value tensor. If None, uses key or query. Defaults to None.
head_mask (Tensor | None) – Optional mask for attention heads. Defaults to None.
attn_mask (Tensor | None) – Optional attention mask. Defaults to None.
- Returns:
context_layer: Attention output of shape (batch_size, seq_len, hidden_size).
None: Placeholder for attention weights (not returned).
- Return type:
A tuple containing
- class gliner.modeling.layers.SelfAttentionBlock(d_model, num_heads, dropout=0.1)[source]¶
Bases:
ModuleSelf-attention block with pre-normalization and residual connection.
Implements a standard transformer-style self-attention block with layer normalization before and after the attention operation.
- self_attn¶
Multi-head self-attention module.
- pre_norm¶
Layer normalization applied before attention.
- post_norm¶
Layer normalization applied after residual connection.
- dropout¶
Dropout layer for attention output.
- q_proj¶
Linear projection for queries.
- k_proj¶
Linear projection for keys.
- v_proj¶
Linear projection for values.
Initializes the self-attention block.
- Parameters:
d_model (int) – Model dimension size.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability. Defaults to 0.1.
- __init__(d_model, num_heads, dropout=0.1)[source]¶
Initializes the self-attention block.
- Parameters:
d_model (int) – Model dimension size.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability. Defaults to 0.1.
- forward(x, mask=None)[source]¶
Applies self-attention to input tensor.
- Parameters:
x (Tensor) – Input tensor of shape (batch_size, seq_len, d_model).
mask (Tensor | None) – Optional attention mask. Defaults to None.
- Returns:
Output tensor of shape (batch_size, seq_len, d_model).
- Return type:
Tensor
- class gliner.modeling.layers.CrossAttentionBlock(d_model, num_heads, dropout=0.1)[source]¶
Bases:
ModuleCross-attention block with pre-normalization and residual connection.
Implements cross-attention between query and key-value pairs, typically used for attending from one sequence to another.
- cross_attn¶
Multi-head cross-attention module.
- pre_norm¶
Layer normalization applied to query before attention.
- post_norm¶
Layer normalization applied after residual connection.
- dropout¶
Dropout layer for attention output.
- v_proj¶
Linear projection for values.
Initializes the cross-attention block.
- Parameters:
d_model (int) – Model dimension size.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability. Defaults to 0.1.
- __init__(d_model, num_heads, dropout=0.1)[source]¶
Initializes the cross-attention block.
- Parameters:
d_model (int) – Model dimension size.
num_heads (int) – Number of attention heads.
dropout (float) – Dropout probability. Defaults to 0.1.
- forward(query, key, value=None, mask=None)[source]¶
Applies cross-attention from query to key-value pairs.
- Parameters:
query (Tensor) – Query tensor of shape (batch_size, query_len, d_model).
key (Tensor) – Key tensor of shape (batch_size, key_len, d_model).
value (Tensor | None) – Optional value tensor. If None, derived from key. Defaults to None.
mask (Tensor | None) – Optional attention mask. Defaults to None.
- Returns:
Output tensor of shape (batch_size, query_len, d_model).
- Return type:
Tensor
- class gliner.modeling.layers.CrossFuser(d_model, query_dim, num_heads=8, num_layers=1, dropout=0.1, schema='l2l-l2t')[source]¶
Bases:
ModuleFlexible cross-attention fusion module with configurable attention patterns.
Fuses two sequences using a configurable schema of self-attention and cross-attention operations. The schema defines the order and type of attention operations to apply.
- Schema notation:
‘l2l’: Self-attention on label sequence
‘t2t’: Self-attention on text sequence
‘l2t’: Cross-attention from label to text
‘t2l’: Cross-attention from text to label
- d_model¶
Model dimension size.
- schema¶
List of attention operation types parsed from schema string.
- layers¶
ModuleList of attention layers organized by depth.
Initializes the cross-fusion module.
- Parameters:
d_model (int) – Model dimension size.
query_dim (int) – Dimension of query input (currently unused).
num_heads (int) – Number of attention heads. Defaults to 8.
num_layers (int) – Number of attention layers. Defaults to 1.
dropout (float) – Dropout probability. Defaults to 0.1.
schema (str) – String defining attention pattern (e.g., ‘l2l-l2t-t2t’). Defaults to ‘l2l-l2t’.
- __init__(d_model, query_dim, num_heads=8, num_layers=1, dropout=0.1, schema='l2l-l2t')[source]¶
Initializes the cross-fusion module.
- Parameters:
d_model (int) – Model dimension size.
query_dim (int) – Dimension of query input (currently unused).
num_heads (int) – Number of attention heads. Defaults to 8.
num_layers (int) – Number of attention layers. Defaults to 1.
dropout (float) – Dropout probability. Defaults to 0.1.
schema (str) – String defining attention pattern (e.g., ‘l2l-l2t-t2t’). Defaults to ‘l2l-l2t’.
- forward(query, key, query_mask=None, key_mask=None)[source]¶
Applies cross-fusion between query and key sequences.
- Parameters:
query (Tensor) – Query tensor of shape (batch_size, query_len, d_model).
key (Tensor) – Key tensor of shape (batch_size, key_len, d_model).
query_mask (Tensor | None) – Optional binary mask for query (1 = valid, 0 = padding). Shape (batch_size, query_len). Defaults to None.
key_mask (Tensor | None) – Optional binary mask for key (1 = valid, 0 = padding). Shape (batch_size, key_len). Defaults to None.
- Returns:
query: Fused query tensor of shape (batch_size, query_len, d_model).
key: Fused key tensor of shape (batch_size, key_len, d_model).
- Return type:
A tuple containing
- class gliner.modeling.layers.LayersFuser(num_layers, hidden_size, output_size=None)[source]¶
Bases:
ModuleFuses multiple encoder layer outputs using squeeze-and-excitation mechanism.
Combines outputs from different encoder layers by learning adaptive weights for each layer using a squeeze-and-excitation style attention mechanism. The first layer in encoder_outputs is skipped during fusion.
- num_layers¶
Number of encoder layers to fuse.
Hidden dimension size of encoder outputs.
- output_size¶
Size of the final output projection.
- squeeze¶
Linear layer for squeeze operation.
- W1¶
First linear layer of excitation network.
- W2¶
Second linear layer of excitation network.
- output_projection¶
Final projection to output dimension.
Initializes the layer fusion module.
- Parameters:
num_layers (int) – Number of encoder layers to fuse.
hidden_size (int) – Hidden dimension size.
output_size (int | None) – Output dimension size. If None, uses hidden_size. Defaults to None.
- __init__(num_layers, hidden_size, output_size=None)[source]¶
Initializes the layer fusion module.
- Parameters:
num_layers (int) – Number of encoder layers to fuse.
hidden_size (int) – Hidden dimension size.
output_size (int | None) – Output dimension size. If None, uses hidden_size. Defaults to None.
- forward(encoder_outputs)[source]¶
Fuses multiple encoder layer outputs into a single representation.
- Parameters:
encoder_outputs (List[Tensor]) – List of encoder output tensors, each of shape (batch_size, seq_len, hidden_size). The first element is skipped.
- Returns:
Fused output tensor of shape (batch_size, seq_len, output_size).
- Return type:
Tensor