gliner package

class gliner.GLiNER(*args, **kwargs)[source]

Bases: Module, PyTorchModelHubMixin

Meta GLiNER class that automatically instantiates the appropriate GLiNER variant.

This class provides a unified interface for all GLiNER models, automatically switching to specialized model types based on the model configuration. It supports various NER architectures including uni-encoder, bi-encoder, decoder-based, and relation extraction models.

The class automatically detects the model type based on:
  • span_mode: Token-level vs span-level

  • labels_encoder: Uni-encoder vs bi-encoder

  • labels_decoder: Standard vs decoder-based

  • relations_layer: NER-only vs joint entity-relation extraction

model

The loaded GLiNER model instance (automatically typed).

config

Model configuration.

data_processor

Data processor for the model.

decoder

Decoder for predictions.

Examples

Load a pretrained uni-encoder span model: >>> model = GLiNER.from_pretrained(“urchade/gliner_small-v2.1”)

Load a bi-encoder model: >>> model = GLiNER.from_pretrained(“knowledgator/gliner-bi-small-v1.0”)

Load from local configuration: >>> config = GLiNERConfig.from_pretrained(“config.json”) >>> model = GLiNER.from_config(config)

Initialize from scratch: >>> config = GLiNERConfig(model_name=”microsoft/deberta-v3-small”) >>> model = GLiNER(config)

Initialize a GLiNER model with automatic type detection.

This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.

Parameters:
  • config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • **kwargs – Additional arguments passed to the specific GLiNER variant.

Examples

>>> config = GLiNERConfig(model_name="bert-base-cased")
>>> model = GLiNER(config)
>>> model = GLiNER("path/to/gliner_config.json")
__init__(config, **kwargs)[source]

Initialize a GLiNER model with automatic type detection.

This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.

Parameters:
  • config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • **kwargs – Additional arguments passed to the specific GLiNER variant.

Examples

>>> config = GLiNERConfig(model_name="bert-base-cased")
>>> model = GLiNER(config)
>>> model = GLiNER("path/to/gliner_config.json")
classmethod from_pretrained(model_id, revision=None, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, token=None, map_location='cpu', strict=False, load_tokenizer=None, resize_token_embeddings=True, compile_torch_model=False, load_onnx_model=False, onnx_model_file='model.onnx', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]

Load a pretrained GLiNER model with automatic type detection.

This method loads the configuration, determines the appropriate GLiNER variant, and delegates to that variant’s from_pretrained method.

Parameters:
  • model_id (str) – Model identifier or local path.

  • revision (str | None) – Model revision.

  • cache_dir (str | Path | None) – Cache directory.

  • force_download (bool) – Force redownload.

  • proxies (dict | None) – Proxy configuration.

  • resume_download (bool) – Resume interrupted downloads.

  • local_files_only (bool) – Only use local files.

  • token (str | bool | None) – HF token for private repos.

  • map_location (str) – Device to map model to.

  • strict (bool) – Enforce strict state_dict loading.

  • load_tokenizer (bool | None) – Whether to load tokenizer.

  • resize_token_embeddings (bool | None) – Whether to resize embeddings.

  • compile_torch_model (bool | None) – Whether to compile with torch.compile.

  • load_onnx_model (bool | None) – Whether to load ONNX model instead of PyTorch.

  • onnx_model_file (str | None) – Path to ONNX model file.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Appropriate GLiNER model instance.

Examples

>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")
>>> model = GLiNER.from_pretrained("knowledgator/gliner-bi-small-v1.0")
>>> model = GLiNER.from_pretrained("path/to/local/model")
classmethod from_config(config, cache_dir=None, load_tokenizer=True, resize_token_embeddings=True, backbone_from_pretrained=True, compile_torch_model=False, map_location='cpu', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]

Create a GLiNER model from configuration.

Parameters:
  • config (GLiNERConfig | str | Path | dict) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • cache_dir (str | Path | None) – Cache directory for downloads.

  • load_tokenizer (bool) – Whether to load tokenizer.

  • resize_token_embeddings (bool) – Whether to resize token embeddings.

  • backbone_from_pretrained (bool) – Whether to load the backbone encoder from pretrained weights.

  • compile_torch_model (bool) – Whether to compile with torch.compile.

  • map_location (str) – Device to map model to.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Initialized GLiNER model instance.

Examples

>>> config = GLiNERConfig(model_name="microsoft/deberta-v3-small")
>>> model = GLiNER.from_config(config)
>>> model = GLiNER.from_config("path/to/gliner_config.json")
property model_map: dict[str, dict[str, Any]]

Map configuration patterns to their corresponding GLiNER classes.

Returns:

Dictionary mapping model types to their classes and descriptions.

get_model_type()[source]

Get the type of the current model instance.

Returns:

String identifier of the model type

Return type:

str

class gliner.GLiNERConfig(labels_encoder=None, labels_decoder=None, relations_layer=None, **kwargs)[source]

Bases: BaseGLiNERConfig

Legacy configuration class that auto-detects model type.

This class provides backward compatibility by automatically determining the appropriate model type based on the provided configuration parameters.

labels_encoder

Name of the encoder for entity labels (bi-encoder).

Type:

str

labels_decoder

Name of the decoder for label generation.

Type:

str

relations_layer

Layer configuration for relation extraction.

Type:

str

Initialize GLiNERConfig.

Parameters:
  • labels_encoder (str, optional) – Labels encoder for bi-encoder models. Defaults to None.

  • labels_decoder (str, optional) – Decoder for label generation. Defaults to None.

  • relations_layer (str, optional) – Relations layer for relation extraction. Defaults to None.

  • **kwargs – Additional keyword arguments passed to BaseGLiNERConfig.

__init__(labels_encoder=None, labels_decoder=None, relations_layer=None, **kwargs)[source]

Initialize GLiNERConfig.

Parameters:
  • labels_encoder (str, optional) – Labels encoder for bi-encoder models. Defaults to None.

  • labels_decoder (str, optional) – Decoder for label generation. Defaults to None.

  • relations_layer (str, optional) – Relations layer for relation extraction. Defaults to None.

  • **kwargs – Additional keyword arguments passed to BaseGLiNERConfig.

property model_type

Auto-detect model type based on configuration.

class gliner.InferencePackingConfig(max_length, sep_token_id=None, streams_per_batch=1)[source]

Bases: object

Configuration describing how sequences should be packed.

max_length

Maximum number of tokens allowed in a packed stream.

Type:

int

sep_token_id

Optional separator token ID to insert between sequences. Currently not used in the implementation.

Type:

int | None

streams_per_batch

Number of streams to create per batch. Must be >= 1.

Type:

int

max_length: int
sep_token_id: int | None = None
streams_per_batch: int = 1
__init__(max_length, sep_token_id=None, streams_per_batch=1)
class gliner.PackedBatch(input_ids, attention_mask, pair_attention_mask, segment_ids, map_out, offsets, lengths)[source]

Bases: object

Container describing a packed collection of requests.

input_ids

Tensor of shape (num_streams, max_len) containing packed token IDs.

Type:

torch.LongTensor

attention_mask

Tensor of shape (num_streams, max_len) with 1s for valid tokens and 0s for padding.

Type:

torch.LongTensor

pair_attention_mask

Boolean tensor of shape (num_streams, max_len, max_len) representing block-diagonal attention mask.

Type:

torch.BoolTensor

segment_ids

Tensor of shape (num_streams, max_len) with unique IDs for each packed segment within a stream.

Type:

torch.LongTensor

map_out

List of lists mapping each segment in each stream back to its original request index.

Type:

List[List[int]]

offsets

List of lists containing the starting offset of each segment within each stream.

Type:

List[List[int]]

lengths

List of lists containing the length of each segment within each stream.

Type:

List[List[int]]

input_ids: LongTensor
attention_mask: LongTensor
pair_attention_mask: BoolTensor
segment_ids: LongTensor
map_out: List[List[int]]
offsets: List[List[int]]
lengths: List[List[int]]
__init__(input_ids, attention_mask, pair_attention_mask, segment_ids, map_out, offsets, lengths)
gliner.pack_requests(requests, cfg, pad_token_id)[source]

Pack a collection of requests into one or more streams.

Groups multiple short sequences into contiguous token streams to reduce padding overhead. Each request’s tokens are placed into streams using a first-fit strategy. A block-diagonal attention mask ensures tokens from different requests cannot attend to each other.

Parameters:
  • requests (List[Dict[str, Any]]) – List of request dictionaries. Each must contain an ‘input_ids’ key with a sequence of token IDs.

  • cfg (InferencePackingConfig) – Configuration specifying packing parameters (max_length, etc.).

  • pad_token_id (int) – Token ID to use for padding positions.

Returns:

PackedBatch object containing packed tensors and metadata needed to unpack results back to original request ordering.

Raises:
  • ValueError – If requests list is empty or configuration is invalid.

  • KeyError – If any request is missing required ‘input_ids’ key.

Return type:

PackedBatch

Example

>>> requests = [
...     {"input_ids": [1, 2, 3]},
...     {"input_ids": [4, 5]},
... ]
>>> cfg = InferencePackingConfig(max_length=10)
>>> batch = pack_requests(requests, cfg, pad_token_id=0)
gliner.unpack_spans(per_token_outputs, packed)[source]

Unpack encoder outputs back to the original request layout.

Takes per-token outputs from a packed batch and redistributes them back to match the original request ordering. Handles requests that were split across multiple streams by concatenating their segments.

Parameters:
  • per_token_outputs (Any) – Tensor or array of shape (num_streams, max_len, …) containing per-token outputs from the encoder.

  • packed (PackedBatch) – PackedBatch object containing metadata about how requests were packed (from pack_requests).

Returns:

List of tensors or arrays (one per original request) containing the unpacked outputs. If input was a NumPy array, outputs will be NumPy arrays; if PyTorch tensor, outputs will be PyTorch tensors.

Raises:
  • ValueError – If per_token_outputs is not at least 2-dimensional.

  • TypeError – If per_token_outputs is neither a PyTorch tensor nor NumPy array.

Return type:

List[Any]

Subpackages

Submodules