gliner.onnx.model moduleΒΆ

ONNX Runtime inference models for GLiNER.

This module provides ONNX Runtime implementations of various GLiNER model architectures, including uni-encoder and bi-encoder variants for both span-level and token-level named entity recognition, as well as relation extraction models.

class gliner.onnx.model.BaseORTModel(session)[source]ΒΆ

Bases: ABC

Base class for ONNX Runtime inference models.

Provides common functionality for preparing inputs, running inference, and managing ONNX session I/O. All concrete ORT model implementations should inherit from this class.

sessionΒΆ

ONNX Runtime inference session.

input_namesΒΆ

Dictionary mapping input names to their indices.

output_namesΒΆ

Dictionary mapping output names to their indices.

Initialize the ONNX Runtime model.

Parameters:

session (InferenceSession) – ONNX Runtime inference session.

__init__(session)[source]ΒΆ

Initialize the ONNX Runtime model.

Parameters:

session (InferenceSession) – ONNX Runtime inference session.

prepare_inputs(inputs)[source]ΒΆ

Prepare inputs for ONNX model inference.

Converts PyTorch tensors to numpy arrays and filters out inputs that are not expected by the ONNX model.

Parameters:

inputs (Dict[str, Tensor]) – Dictionary of input names and PyTorch tensors.

Returns:

Dictionary of input names and numpy arrays ready for ONNX inference.

Raises:

ValueError – If inputs is not a dictionary.

Return type:

Dict[str, ndarray]

run_inference(inputs)[source]ΒΆ

Run the ONNX model inference.

Parameters:

inputs (Dict[str, ndarray]) – Prepared inputs for the model as numpy arrays.

Returns:

Dictionary mapping output names to their corresponding numpy arrays.

Return type:

Dict[str, ndarray]

abstract forward(input_ids, attention_mask, **kwargs)[source]ΒΆ

Perform forward pass through the model.

Abstract method that must be implemented by subclasses to define model-specific forward pass logic.

Parameters:
  • input_ids – Input token IDs.

  • attention_mask – Attention mask for input tokens.

  • **kwargs – Additional model-specific arguments.

Returns:

Dictionary containing model outputs.

Return type:

Dict[str, Any]

class gliner.onnx.model.UniEncoderSpanORTModel(session)[source]ΒΆ

Bases: BaseORTModel

ONNX Runtime model for uni-encoder span-level NER.

Uses a single encoder to process both text and entity labels, performing span-level entity recognition.

Initialize the ONNX Runtime model.

Parameters:

session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, span_idx, span_mask, **kwargs)[source]ΒΆ

Forward pass for span model using ONNX inference.

Parameters:
  • input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.

  • attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.

  • words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.

  • text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.

  • span_idx (Tensor) – Tensor containing indices of spans to classify.

  • span_mask (Tensor) – Tensor indicating which spans are valid (not padding).

  • **kwargs – Additional arguments (ignored).

Returns:

GLiNERBaseOutput containing logits for span classification.

Return type:

Dict[str, Any]

class gliner.onnx.model.BiEncoderSpanORTModel(session)[source]ΒΆ

Bases: BaseORTModel

ONNX Runtime model for bi-encoder span-level NER.

Uses separate encoders for text and entity labels, performing span-level entity recognition with bi-encoder architecture.

Initialize the ONNX Runtime model.

Parameters:

session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, span_idx, span_mask, labels_embeds=None, labels_input_ids=None, labels_attention_mask=None, **kwargs)[source]ΒΆ

Forward pass for bi-encoder span model using ONNX inference.

Parameters:
  • input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.

  • attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.

  • words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.

  • text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.

  • span_idx (Tensor) – Tensor containing indices of spans to classify.

  • span_mask (Tensor) – Tensor indicating which spans are valid (not padding).

  • labels_embeds (Tensor | None) – Optional pre-computed embeddings for entity labels. If provided, labels_input_ids and labels_attention_mask are ignored.

  • labels_input_ids (FloatTensor | None) – Optional tensor containing token IDs for entity labels. Used when labels_embeds is not provided.

  • labels_attention_mask (LongTensor | None) – Optional attention mask for entity label tokens. Used when labels_embeds is not provided.

  • **kwargs – Additional arguments (ignored).

Returns:

GLiNERBaseOutput containing logits for span classification.

Return type:

Dict[str, Any]

class gliner.onnx.model.UniEncoderTokenORTModel(session)[source]ΒΆ

Bases: BaseORTModel

ONNX Runtime model for uni-encoder token-level NER.

Uses a single encoder to process both text and entity labels, performing token-level entity recognition.

Initialize the ONNX Runtime model.

Parameters:

session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, **kwargs)[source]ΒΆ

Forward pass for token model using ONNX inference.

Parameters:
  • input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.

  • attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.

  • words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.

  • text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.

  • **kwargs – Additional arguments (ignored).

Returns:

GLiNERBaseOutput containing logits for token classification.

Return type:

Dict[str, Any]

class gliner.onnx.model.BiEncoderTokenORTModel(session)[source]ΒΆ

Bases: BaseORTModel

ONNX Runtime model for bi-encoder token-level NER.

Uses separate encoders for text and entity labels, performing token-level entity recognition with bi-encoder architecture.

Initialize the ONNX Runtime model.

Parameters:

session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, labels_embeds=None, labels_input_ids=None, labels_attention_mask=None, **kwargs)[source]ΒΆ

Forward pass for bi-encoder token model using ONNX inference.

Parameters:
  • input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.

  • attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.

  • words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.

  • text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.

  • labels_embeds (Tensor | None) – Optional pre-computed embeddings for entity labels. If provided, labels_input_ids and labels_attention_mask are ignored.

  • labels_input_ids (FloatTensor | None) – Optional tensor containing token IDs for entity labels. Used when labels_embeds is not provided.

  • labels_attention_mask (LongTensor | None) – Optional attention mask for entity label tokens. Used when labels_embeds is not provided.

  • **kwargs – Additional arguments (ignored).

Returns:

GLiNERBaseOutput containing logits for token classification.

Return type:

Dict[str, Any]

class gliner.onnx.model.UniEncoderSpanRelexORTModel(session)[source]ΒΆ

Bases: BaseORTModel

ONNX Runtime model for uni-encoder span-level relation extraction.

Uses a single encoder to process text and perform both entity recognition and relation extraction at the span level.

Initialize the ONNX Runtime model.

Parameters:

session (InferenceSession) – ONNX Runtime inference session.

forward(input_ids, attention_mask, words_mask, text_lengths, span_idx, span_mask, **kwargs)[source]ΒΆ

Forward pass for span relation extraction model using ONNX inference.

Parameters:
  • input_ids (Tensor) – Tensor of shape (batch_size, seq_len) containing input token IDs.

  • attention_mask (Tensor) – Tensor of shape (batch_size, seq_len) with 1s for real tokens and 0s for padding.

  • words_mask (Tensor) – Tensor of shape (batch_size, seq_len) indicating word boundaries.

  • text_lengths (Tensor) – Tensor of shape (batch_size,) containing the actual length of each text sequence.

  • span_idx (Tensor) – Tensor containing indices of spans to classify.

  • span_mask (Tensor) – Tensor indicating which spans are valid (not padding).

  • **kwargs – Additional arguments (ignored).

Returns:

GLiNERRelexOutput containing logits for span classification, relation indices, relation logits, and relation mask.

Return type:

Dict[str, Any]