gliner package¶

class gliner.GLiNER(*args, **kwargs)[source]¶

Bases: Module, PyTorchModelHubMixin

Meta GLiNER class that automatically instantiates the appropriate GLiNER variant.

This class provides a unified interface for all GLiNER models, automatically switching to specialized model types based on the model configuration. It supports various NER architectures including uni-encoder, bi-encoder, decoder-based, and relation extraction models.

The class automatically detects the model type based on:

span_mode: Token-level vs span-level
labels_encoder: Uni-encoder vs bi-encoder
labels_decoder: Standard vs decoder-based
relations_layer: NER-only vs joint entity-relation extraction

model¶: The loaded GLiNER model instance (automatically typed).

config¶: Model configuration.

data_processor¶: Data processor for the model.

decoder¶: Decoder for predictions.

Examples

Load a pretrained uni-encoder span model: >>> model = GLiNER.from_pretrained(“urchade/gliner_small-v2.1”)

Load a bi-encoder model: >>> model = GLiNER.from_pretrained(“knowledgator/gliner-bi-small-v1.0”)

Load from local configuration: >>> config = GLiNERConfig.from_pretrained(“config.json”) >>> model = GLiNER.from_config(config)

Initialize from scratch: >>> config = GLiNERConfig(model_name=”microsoft/deberta-v3-small”) >>> model = GLiNER(config)

Initialize a GLiNER model with automatic type detection.

This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.

Parameters:

config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).
**kwargs – Additional arguments passed to the specific GLiNER variant.

Examples

>>> config = GLiNERConfig(model_name="bert-base-cased")
>>> model = GLiNER(config)
>>> model = GLiNER("path/to/gliner_config.json")

__init__(config, **kwargs)[source]¶

Initialize a GLiNER model with automatic type detection.

This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.

Parameters:

config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).
**kwargs – Additional arguments passed to the specific GLiNER variant.

Examples

>>> config = GLiNERConfig(model_name="bert-base-cased")
>>> model = GLiNER(config)
>>> model = GLiNER("path/to/gliner_config.json")

classmethod from_pretrained(model_id, revision=None, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, token=None, map_location='cpu', strict=False, load_tokenizer=None, resize_token_embeddings=True, compile_torch_model=False, load_onnx_model=False, onnx_model_file='model.onnx', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]¶

Load a pretrained GLiNER model with automatic type detection.

This method loads the configuration, determines the appropriate GLiNER variant, and delegates to that variant’s from_pretrained method.

Parameters:

model_id (str) – Model identifier or local path.
revision (str | None) – Model revision.
cache_dir (str | Path | None) – Cache directory.
force_download (bool) – Force redownload.
proxies (dict | None) – Proxy configuration.
resume_download (bool) – Resume interrupted downloads.
local_files_only (bool) – Only use local files.
token (str | bool | None) – HF token for private repos.
map_location (str) – Device to map model to.
strict (bool) – Enforce strict state_dict loading.
load_tokenizer (bool | None) – Whether to load tokenizer.
resize_token_embeddings (bool | None) – Whether to resize embeddings.
compile_torch_model (bool | None) – Whether to compile with torch.compile.
load_onnx_model (bool | None) – Whether to load ONNX model instead of PyTorch.
onnx_model_file (str | None) – Path to ONNX model file.
max_length (int | None) – Override max_length in config.
max_width (int | None) – Override max_width in config.
post_fusion_schema (str | None) – Override post_fusion_schema in config.
_attn_implementation (str | None) – Override attention implementation.
**model_kwargs – Additional model initialization arguments.

Returns:

Appropriate GLiNER model instance.

Examples

>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")
>>> model = GLiNER.from_pretrained("knowledgator/gliner-bi-small-v1.0")
>>> model = GLiNER.from_pretrained("path/to/local/model")

classmethod from_config(config, cache_dir=None, load_tokenizer=True, resize_token_embeddings=True, backbone_from_pretrained=True, compile_torch_model=False, map_location='cpu', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]¶

Create a GLiNER model from configuration.

Parameters:

config (GLiNERConfig | str | Path | dict) – Model configuration (GLiNERConfig object, path to config file, or dict).
cache_dir (str | Path | None) – Cache directory for downloads.
load_tokenizer (bool) – Whether to load tokenizer.
resize_token_embeddings (bool) – Whether to resize token embeddings.
backbone_from_pretrained (bool) – Whether to load the backbone encoder from pretrained weights.
compile_torch_model (bool) – Whether to compile with torch.compile.
map_location (str) – Device to map model to.
max_length (int | None) – Override max_length in config.
max_width (int | None) – Override max_width in config.
post_fusion_schema (str | None) – Override post_fusion_schema in config.
_attn_implementation (str | None) – Override attention implementation.
**model_kwargs – Additional model initialization arguments.

Returns:

Initialized GLiNER model instance.

Examples

>>> config = GLiNERConfig(model_name="microsoft/deberta-v3-small")
>>> model = GLiNER.from_config(config)
>>> model = GLiNER.from_config("path/to/gliner_config.json")

property model_map: dict[str, dict[str, Any]]¶

Map configuration patterns to their corresponding GLiNER classes.

Returns:: Dictionary mapping model types to their classes and descriptions.

get_model_type()[source]¶

Get the type of the current model instance.

Returns:: String identifier of the model type
Return type:: str

class gliner.GLiNERConfig(labels_encoder=None, labels_decoder=None, relations_layer=None, **kwargs)[source]¶

Bases: BaseGLiNERConfig

Legacy configuration class that auto-detects model type.

This class provides backward compatibility by automatically determining the appropriate model type based on the provided configuration parameters.

labels_encoder¶

Name of the encoder for entity labels (bi-encoder).

Type:: str

labels_decoder¶

Name of the decoder for label generation.

Type:: str

relations_layer¶

Layer configuration for relation extraction.

Type:: str

Initialize GLiNERConfig.

Parameters:

labels_encoder (str, optional) – Labels encoder for bi-encoder models. Defaults to None.
labels_decoder (str, optional) – Decoder for label generation. Defaults to None.
relations_layer (str, optional) – Relations layer for relation extraction. Defaults to None.
**kwargs – Additional keyword arguments passed to BaseGLiNERConfig.

__init__(labels_encoder=None, labels_decoder=None, relations_layer=None, **kwargs)[source]¶

Initialize GLiNERConfig.

Parameters:

labels_encoder (str, optional) – Labels encoder for bi-encoder models. Defaults to None.
labels_decoder (str, optional) – Decoder for label generation. Defaults to None.
relations_layer (str, optional) – Relations layer for relation extraction. Defaults to None.
**kwargs – Additional keyword arguments passed to BaseGLiNERConfig.

property model_type¶: Auto-detect model type based on configuration.

class gliner.InferencePackingConfig(max_length, sep_token_id=None, streams_per_batch=1)[source]¶

Bases: object

Configuration describing how sequences should be packed.

max_length¶

Maximum number of tokens allowed in a packed stream.

Type:: int

sep_token_id¶

Optional separator token ID to insert between sequences. Currently not used in the implementation.

Type:: int | None

streams_per_batch¶

Number of streams to create per batch. Must be >= 1.

Type:: int

max_length: int¶

sep_token_id: int | None = None¶

streams_per_batch: int = 1¶

__init__(max_length, sep_token_id=None, streams_per_batch=1)¶

class gliner.PackedBatch(input_ids, attention_mask, pair_attention_mask, segment_ids, map_out, offsets, lengths)[source]¶

Bases: object

Container describing a packed collection of requests.

input_ids¶

Tensor of shape (num_streams, max_len) containing packed token IDs.

Type:: torch.LongTensor

attention_mask¶

Tensor of shape (num_streams, max_len) with 1s for valid tokens and 0s for padding.

Type:: torch.LongTensor

pair_attention_mask¶

Boolean tensor of shape (num_streams, max_len, max_len) representing block-diagonal attention mask.

Type:: torch.BoolTensor

segment_ids¶

Tensor of shape (num_streams, max_len) with unique IDs for each packed segment within a stream.

Type:: torch.LongTensor

map_out¶

List of lists mapping each segment in each stream back to its original request index.

Type:: List[List[int]]

offsets¶

List of lists containing the starting offset of each segment within each stream.

Type:: List[List[int]]

lengths¶

List of lists containing the length of each segment within each stream.

Type:: List[List[int]]

input_ids: LongTensor¶

attention_mask: LongTensor¶

pair_attention_mask: BoolTensor¶

segment_ids: LongTensor¶

map_out: List[List[int]]¶

offsets: List[List[int]]¶

lengths: List[List[int]]¶

__init__(input_ids, attention_mask, pair_attention_mask, segment_ids, map_out, offsets, lengths)¶

gliner.pack_requests(requests, cfg, pad_token_id)[source]¶

Pack a collection of requests into one or more streams.

Groups multiple short sequences into contiguous token streams to reduce padding overhead. Each request’s tokens are placed into streams using a first-fit strategy. A block-diagonal attention mask ensures tokens from different requests cannot attend to each other.

Parameters:

requests (List[Dict[str, Any]]) – List of request dictionaries. Each must contain an ‘input_ids’ key with a sequence of token IDs.
cfg (InferencePackingConfig) – Configuration specifying packing parameters (max_length, etc.).
pad_token_id (int) – Token ID to use for padding positions.

Returns:

PackedBatch object containing packed tensors and metadata needed to unpack results back to original request ordering.

Raises:

ValueError – If requests list is empty or configuration is invalid.
KeyError – If any request is missing required ‘input_ids’ key.

Return type:

PackedBatch

Example

>>> requests = [
...     {"input_ids": [1, 2, 3]},
...     {"input_ids": [4, 5]},
... ]
>>> cfg = InferencePackingConfig(max_length=10)
>>> batch = pack_requests(requests, cfg, pad_token_id=0)

gliner.unpack_spans(per_token_outputs, packed)[source]¶

Unpack encoder outputs back to the original request layout.

Takes per-token outputs from a packed batch and redistributes them back to match the original request ordering. Handles requests that were split across multiple streams by concatenating their segments.

Parameters:

per_token_outputs (Any) – Tensor or array of shape (num_streams, max_len, …) containing per-token outputs from the encoder.
packed (PackedBatch) – PackedBatch object containing metadata about how requests were packed (from pack_requests).

Returns:

List of tensors or arrays (one per original request) containing the unpacked outputs. If input was a NumPy array, outputs will be NumPy arrays; if PyTorch tensor, outputs will be PyTorch tensors.

Raises:

ValueError – If per_token_outputs is not at least 2-dimensional.
TypeError – If per_token_outputs is neither a PyTorch tensor nor NumPy array.

Return type:

List[Any]

gliner package¶

Subpackages¶

Submodules¶