gliner.onnx.model module¶

ONNX Runtime inference models for GLiNER.

This module provides ONNX Runtime implementations of various GLiNER model architectures, including uni-encoder and bi-encoder variants for both span-level and token-level named entity recognition, as well as relation extraction models.

class gliner.onnx.model.BaseORTModel(session)[source]¶

Bases: ABC

Base class for ONNX Runtime inference models.

Provides common functionality for preparing inputs, running inference, and managing ONNX session I/O. All concrete ORT model implementations should inherit from this class.

session¶: ONNX Runtime inference session.

input_names¶: Dictionary mapping input names to their indices.

output_names¶: Dictionary mapping output names to their indices.

Initialize the ONNX Runtime model.

Parameters:: session (InferenceSession) – ONNX Runtime inference session.

__init__(session)[source]¶

Initialize the ONNX Runtime model.

Parameters:: session (InferenceSession) – ONNX Runtime inference session.

prepare_inputs(inputs)[source]¶

Prepare inputs for ONNX model inference.

Converts PyTorch tensors to numpy arrays and filters out inputs that are not expected by the ONNX model.

Parameters:: inputs (Dict[str, Tensor]) – Dictionary of input names and PyTorch tensors.
Returns:: Dictionary of input names and numpy arrays ready for ONNX inference.
Raises:: ValueError – If inputs is not a dictionary.
Return type:: Dict[str, ndarray]

run_inference(inputs)[source]¶

Run the ONNX model inference.

Parameters:: inputs (Dict[str, ndarray]) – Prepared inputs for the model as numpy arrays.
Returns:: Dictionary mapping output names to their corresponding numpy arrays.
Return type:: Dict[str, ndarray]

abstract forward(input_ids, attention_mask, **kwargs)[source]¶

Perform forward pass through the model.

Abstract method that must be implemented by subclasses to define model-specific forward pass logic.

Parameters:

input_ids – Input token IDs.
attention_mask – Attention mask for input tokens.
**kwargs – Additional model-specific arguments.

Returns:

Dictionary containing model outputs.

Return type:

Dict[str, Any]

class gliner.onnx.model.UniEncoderSpanORTModel(session)[source]¶

Bases: BaseORTModel

ONNX Runtime model for uni-encoder span-level NER.

Uses a single encoder to process both text and entity labels, performing span-level entity recognition.

Initialize the ONNX Runtime model.

Parameters:: session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, span_idx, span_mask, **kwargs)[source]¶

Forward pass for span model using ONNX inference.

Parameters:

input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.
attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.
words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.
text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.
span_idx (Tensor) – Tensor containing indices of spans to classify.
span_mask (Tensor) – Tensor indicating which spans are valid (not padding).
**kwargs – Additional arguments (ignored).

Returns:

GLiNERBaseOutput containing logits for span classification.

Return type:

Dict[str, Any]

class gliner.onnx.model.BiEncoderSpanORTModel(session)[source]¶

Bases: BaseORTModel

ONNX Runtime model for bi-encoder span-level NER.

Uses separate encoders for text and entity labels, performing span-level entity recognition with bi-encoder architecture.

Initialize the ONNX Runtime model.

Parameters:: session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, span_idx, span_mask, labels_embeds=None, labels_input_ids=None, labels_attention_mask=None, **kwargs)[source]¶

Forward pass for bi-encoder span model using ONNX inference.

Parameters:

input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.
attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.
words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.
text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.
span_idx (Tensor) – Tensor containing indices of spans to classify.
span_mask (Tensor) – Tensor indicating which spans are valid (not padding).
labels_embeds (Tensor | None) – Optional pre-computed embeddings for entity labels. If provided, labels_input_ids and labels_attention_mask are ignored.
labels_input_ids (FloatTensor | None) – Optional tensor containing token IDs for entity labels. Used when labels_embeds is not provided.
labels_attention_mask (LongTensor | None) – Optional attention mask for entity label tokens. Used when labels_embeds is not provided.
**kwargs – Additional arguments (ignored).

Returns:

GLiNERBaseOutput containing logits for span classification.

Return type:

Dict[str, Any]

class gliner.onnx.model.UniEncoderTokenORTModel(session)[source]¶

Bases: BaseORTModel

ONNX Runtime model for uni-encoder token-level NER.

Uses a single encoder to process both text and entity labels, performing token-level entity recognition.

Initialize the ONNX Runtime model.

Parameters:: session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, **kwargs)[source]¶

Forward pass for token model using ONNX inference.

Parameters:

input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.
attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.
words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.
text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.
**kwargs – Additional arguments (ignored).

Returns:

GLiNERBaseOutput containing logits for token classification.

Return type:

Dict[str, Any]

class gliner.onnx.model.BiEncoderTokenORTModel(session)[source]¶

Bases: BaseORTModel

ONNX Runtime model for bi-encoder token-level NER.

Uses separate encoders for text and entity labels, performing token-level entity recognition with bi-encoder architecture.

Initialize the ONNX Runtime model.

Parameters:: session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, labels_embeds=None, labels_input_ids=None, labels_attention_mask=None, **kwargs)[source]¶

Forward pass for bi-encoder token model using ONNX inference.

Parameters:

input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.
attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.
words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.
text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.
labels_embeds (Tensor | None) – Optional pre-computed embeddings for entity labels. If provided, labels_input_ids and labels_attention_mask are ignored.
labels_input_ids (FloatTensor | None) – Optional tensor containing token IDs for entity labels. Used when labels_embeds is not provided.
labels_attention_mask (LongTensor | None) – Optional attention mask for entity label tokens. Used when labels_embeds is not provided.
**kwargs – Additional arguments (ignored).

Returns:

GLiNERBaseOutput containing logits for token classification.

Return type:

Dict[str, Any]

class gliner.onnx.model.UniEncoderSpanRelexORTModel(session)[source]¶

Bases: BaseORTModel

ONNX Runtime model for uni-encoder span-level relation extraction.

Uses a single encoder to process text and perform both entity recognition and relation extraction at the span level.

Initialize the ONNX Runtime model.

Parameters:: session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, span_idx, span_mask, **kwargs)[source]¶

Forward pass for span relation extraction model using ONNX inference.

Parameters:

input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.
attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.
words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.
text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.
span_idx (Tensor) – Tensor containing indices of spans to classify.
span_mask (Tensor) – Tensor indicating which spans are valid (not padding).
**kwargs – Additional arguments (ignored).

Returns:

GLiNERRelexOutput containing logits for span classification, relation indices, relation logits, and relation mask.

Return type:

Dict[str, Any]