gliner.model module

class gliner.model.BaseGLiNER(*args, **kwargs)[source]

Bases: ABC, Module, PyTorchModelHubMixin

Initialize a BaseGLiNER model.

Parameters:
  • config (BaseGLiNERConfig) – Model configuration object.

  • model (BaseModel | None) – Pre-initialized model instance. If None, creates a new model.

  • tokenizer (BaseModel | None) – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor (BaseProcessor | None) – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained (bool | None) – Whether to load the backbone from pretrained weights.

  • cache_dir (str | Path | None) – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class: type = None
model_class: type = None
ort_model_class: type = None
data_processor_class: type = None
data_collator_class: type = None
decoder_class: type = None
__init__(config, model=None, tokenizer=None, data_processor=None, backbone_from_pretrained=False, cache_dir=None, **kwargs)[source]

Initialize a BaseGLiNER model.

Parameters:
  • config (BaseGLiNERConfig) – Model configuration object.

  • model (BaseModel | None) – Pre-initialized model instance. If None, creates a new model.

  • tokenizer (BaseModel | None) – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor (BaseProcessor | None) – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained (bool | None) – Whether to load the backbone from pretrained weights.

  • cache_dir (str | Path | None) – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

abstract resize_embeddings()[source]
abstract inference()[source]
abstract evaluate()[source]
forward(*args, **kwargs)[source]

Forward pass through the model.

Parameters:
  • *args – Positional arguments passed to the model.

  • **kwargs – Keyword arguments passed to the model.

Returns:

Model output from the forward pass.

property device

Get the device where the model is located.

Returns:

Torch device object (CPU or CUDA).

configure_inference_packing(config)[source]

Configure default packing behavior for inference calls.

Passing None disables packing by default. Individual inference methods accept a packing_config argument to override this setting on a per-call basis.

Parameters:

config (InferencePackingConfig | None) – Inference packing configuration or None to disable packing.

compile()[source]

Compile the model using torch.compile for optimization.

Uses dynamic=True to generate shape-generic kernels, which avoids recompilation on variable-length NER inputs. Also enables capture_scalar_outputs to trace through data-dependent shape operations (e.g., computing max number of entity types per batch).

Best combined with quantize() for maximum throughput (~1.9x over fp32).

When FlashDeBERTa is active, its custom Triton kernels are incompatible with torch.compile tracing. The encoder forward is automatically wrapped with torch.compiler.disable so the rest of the model (span representation, scoring, etc.) still benefits from compilation.

quantize(dtype='fp16')[source]

Apply quantization to the model.

Parameters:

dtype (str) –

Quantization type. Options: - "fp16" (default): float16 half-precision. On GPU, uses Tensor Core

acceleration for ~1.4x speedup. On CPU, applies dynamic quantization (reduces memory, no speed benefit).

  • "bf16": bfloat16 half-precision. Better numerical stability than fp16 with slightly less speedup (~1.2x).

  • "int8": int8 quantization (GPU and CPU). On CPU, uses PyTorch’s built-in dynamic quantization with FBGEMM int8 kernels (~1.6x speedup). On GPU, uses torchao int8 weight-only quantization (~50% memory reduction, no speed gain; requires the torchao package). Stock DeBERTa-based models lose accuracy with int8; use this with models that have been fine-tuned with quantization-aware training (QAT).

Raises:
  • RuntimeError – If the model is an ONNX model (use ONNX quantization instead).

  • ValueError – If dtype is not a recognized quantization type.

  • ImportError – If torchao is not installed and int8 on GPU is requested.

Examples

>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1", map_location="cuda")
>>> model.quantize()           # fp16 half-precision on GPU — ~1.4x faster
>>> model.quantize("bf16")     # bfloat16 on GPU — ~1.2x faster
>>> model.quantize("int8")     # int8 quantization (torchao on GPU, FBGEMM on CPU)
prepare_state_dict(state_dict)[source]

Prepare state dict for saving, handling torch.compile artifacts.

Parameters:

state_dict – Original state dictionary from the model.

Returns:

Cleaned state dictionary with torch.compile prefixes removed.

save_pretrained(save_directory, *, config=None, repo_id=None, push_to_hub=False, safe_serialization=False, **push_to_hub_kwargs)[source]

Save model weights and configuration to local directory.

Parameters:
  • save_directory (str | Path) – Path to directory for saving.

  • config (BaseGLiNERConfig | None) – Model configuration. Uses self.config if None.

  • repo_id (str | None) – Repository ID for hub upload.

  • push_to_hub (bool) – Whether to push to HuggingFace Hub.

  • safe_serialization (bool) – Whether to use safetensors format.

  • **push_to_hub_kwargs – Additional arguments for push_to_hub.

Returns:

Repository URL if pushed to hub, None otherwise.

Return type:

str | None

classmethod load_from_config(config, cache_dir=None, load_tokenizer=True, resize_token_embeddings=True, backbone_from_pretrained=True, compile_torch_model=False, quantize=False, map_location='cpu', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]

Initialize a model from configuration without loading pretrained weights.

This method creates a new model instance from scratch using the provided configuration. The backbone encoder can optionally be loaded from pretrained weights, but the GLiNER-specific layers are always randomly initialized.

Parameters:
  • config (str | Path | GLiNERConfig | dict) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • cache_dir (str | Path | None) – Cache directory for downloads.

  • load_tokenizer (bool) – Whether to load tokenizer.

  • resize_token_embeddings (bool) – Whether to resize token embeddings.

  • backbone_from_pretrained (bool) – Whether to load the backbone encoder from pretrained weights.

  • compile_torch_model (bool) – Whether to compile with torch.compile.

  • quantize (bool | str) – Quantization dtype. True or "fp16" for float16, "bf16" for bfloat16, "int8" for int8 dynamic quantization (requires torchao). False to disable.

  • map_location (str) – Device to map model to.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Initialized model instance with randomly initialized weights (except backbone if specified).

Examples

>>> config = GLiNERConfig(model_name="microsoft/deberta-v3-small")
>>> model = GLiNER.load_from_config(config)
>>> model = GLiNER.load_from_config("path/to/gliner_config.json")
>>> # Load with pretrained backbone but random GLiNER layers
>>> model = GLiNER.load_from_config(config, backbone_from_pretrained=True)
classmethod from_pretrained(model_id, model_dir=None, revision=None, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, token=None, map_location='cpu', strict=False, load_tokenizer=None, resize_token_embeddings=True, compile_torch_model=False, quantize=False, load_onnx_model=False, onnx_model_file='model.onnx', session_options=None, max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]

Load pretrained model from HuggingFace Hub or local directory.

Parameters:
  • model_id (str) – Model identifier or local path.

  • model_dir (str | None) – Override model directory path.

  • revision (str | None) – Model revision.

  • cache_dir (str | Path | None) – Cache directory.

  • force_download (bool) – Force redownload.

  • proxies (dict | None) – Proxy configuration.

  • resume_download (bool) – Resume interrupted downloads.

  • local_files_only (bool) – Only use local files.

  • token (str | bool | None) – HF token for private repos.

  • map_location (str) – Device to map model to.

  • strict (bool) – Enforce strict state_dict loading.

  • load_tokenizer (bool | None) – Whether to load tokenizer.

  • resize_token_embeddings (bool | None) – Whether to resize embeddings.

  • compile_torch_model (bool | None) – Whether to compile with torch.compile.

  • quantize (bool | str) – Quantization dtype. True or "fp16" for float16, "bf16" for bfloat16, "int8" for int8 dynamic quantization (requires torchao). False to disable.

  • load_onnx_model (bool | None) – Whether to load ONNX model instead of PyTorch.

  • onnx_model_file (str | None) – Path to ONNX model file.

  • session_options – ONNX runtime session options.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Loaded model instance.

export_to_onnx(save_dir, onnx_filename='model.onnx', quantized_filename='model_quantized.onnx', quantize=False, opset=19, **export_kwargs)[source]

Unified ONNX export method using specifications from child classes.

Parameters:
  • save_dir (str | Path) – Directory to save ONNX files.

  • onnx_filename (str) – Name of the ONNX model file.

  • quantized_filename (str) – Name of the quantized model file.

  • quantize (bool) – Whether to create a quantized version.

  • opset (int) – ONNX opset version.

  • **export_kwargs – Additional export arguments (model-specific).

Returns:

  • onnx_path: Path to standard ONNX model

  • quantized_path: Path to quantized model (if quantize=True)

Return type:

Dictionary with paths to exported models

freeze_component(component_name)[source]

Freeze a specific component of the model.

Parameters:

component_name (str) – Name of component to freeze (e.g., ‘text_encoder’, ‘labels_encoder’, ‘decoder’)

unfreeze_component(component_name)[source]

Unfreeze a specific component of the model.

Parameters:

component_name (str) – Name of component to unfreeze

classmethod create_training_args(output_dir, learning_rate=5e-05, weight_decay=0.01, others_lr=None, others_weight_decay=None, focal_loss_alpha=-1, focal_loss_gamma=0.0, rel_focal_loss_alpha=None, rel_focal_loss_gamma=None, focal_loss_prob_margin=0.0, loss_reduction='sum', negatives=1.0, masking='none', lr_scheduler_type='linear', warmup_ratio=0.1, per_device_train_batch_size=8, per_device_eval_batch_size=8, max_grad_norm=1.0, max_steps=10000, save_steps=1000, save_total_limit=10, logging_steps=10, use_cpu=False, bf16=False, dataloader_num_workers=1, report_to='none', **kwargs)[source]

Create training arguments with sensible defaults.

Parameters:
  • output_dir (str | Path) – Directory to save model checkpoints.

  • learning_rate (float) – Learning rate for main parameters.

  • weight_decay (float) – Weight decay for main parameters.

  • others_lr (float | None) – Learning rate for other parameters.

  • others_weight_decay (float | None) – Weight decay for other parameters.

  • focal_loss_alpha (float) – Alpha for focal loss.

  • focal_loss_gamma (float) – Gamma for focal loss.

  • focal_loss_prob_margin (float) – Probability margin for focal loss.

  • loss_reduction (str) – Loss reduction method.

  • negatives (float) – Negative sampling ratio.

  • masking (str) – Masking strategy.

  • lr_scheduler_type (str) – Learning rate scheduler type.

  • warmup_ratio (float) – Warmup ratio.

  • per_device_train_batch_size (int) – Training batch size.

  • per_device_eval_batch_size (int) – Evaluation batch size.

  • max_grad_norm (float) – Maximum gradient norm.

  • max_steps (int) – Maximum training steps.

  • save_steps (int) – Save checkpoint every N steps.

  • save_total_limit (int) – Maximum number of checkpoints to keep.

  • logging_steps (int) – Log every N steps.

  • use_cpu (bool) – Whether to use CPU.

  • bf16 (bool) – Whether to use bfloat16.

  • dataloader_num_workers (int) – Number of dataloader workers.

  • report_to (str) – Where to report metrics.

  • **kwargs – Additional training arguments.

Returns:

TrainingArguments instance.

Return type:

TrainingArguments

train_model(train_dataset, eval_dataset, training_args=None, freeze_components=None, compile_model=False, output_dir=None, **training_kwargs)[source]

Train the model.

Parameters:
  • train_dataset – Training dataset.

  • eval_dataset – Evaluation dataset.

  • training_args (TrainingArguments | None) – Training arguments (created with defaults if None).

  • freeze_components (list[str] | None) – List of component names to freeze (e.g., [‘text_encoder’, ‘decoder’]).

  • compile_model (bool) – Whether to compile model with torch.compile.

  • output_dir (str | Path | None) – Output directory (required if training_args is None).

  • **training_kwargs – Additional kwargs for creating training args.

Returns:

Trained Trainer instance.

Return type:

Trainer

class gliner.model.BaseEncoderGLiNER(*args, **kwargs)[source]

Bases: BaseGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

set_class_indices()[source]

Set the class token index in the configuration based on tokenizer vocabulary.

resize_embeddings(set_class_token_index=True)[source]

Resize token embeddings to match tokenizer vocabulary size.

Parameters:

set_class_token_index – Whether to update the class token index.

prepare_inputs(texts)[source]

Prepare inputs for the model by tokenizing and creating index mappings.

Parameters:

texts (List[str]) – The input texts to process.

Returns:

  • all_tokens: List of tokenized texts

  • all_start_token_idx_to_text_idx: Start position mappings

  • all_end_token_idx_to_text_idx: End position mappings

Return type:

Tuple containing

prepare_base_input(all_tokens)[source]

Prepare base input format for data collation.

Parameters:

all_tokens (List[List[str]]) – List of tokenized texts.

Returns:

List of input dictionaries ready for collation.

Return type:

List[Dict[str, Any]]

inference(texts, labels, flat_ner=True, threshold=0.5, multi_label=False, batch_size=8, packing_config=None, input_spans=None, return_class_probs=False, **external_inputs)[source]

Predict entities for a batch of texts.

Parameters:
  • texts (str | List[str]) – A list of input texts to predict entities for or a single text string.

  • labels (List[str]) – A list of labels to predict.

  • flat_ner (bool) – Whether to use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold for predictions. Defaults to 0.5.

  • multi_label (bool) – Whether to allow multiple labels per token. Defaults to False.

  • batch_size (int) – Batch size for processing. Defaults to 8.

  • packing_config (InferencePackingConfig | None) – Configuration describing how to pack encoder inputs. When None the instance-level configuration set via configure_inference_packing is used.

  • input_spans (List[List[Dict]]) – Input entity spans that should be classified by the model.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

  • **external_inputs – Additional inputs to pass to the model.

Returns:

  • start: Start character position

  • end: End character position

  • text: Entity text

  • label: Entity type

  • score: Confidence score

  • class_probs: (optional) Dictionary mapping class names to probabilities (top 5)

Return type:

List of lists with predicted entities, where each entity is a dictionary containing

predict_entities(text, labels, flat_ner=True, threshold=0.5, multi_label=False, return_class_probs=False, **kwargs)[source]

Predict entities for a single text input.

Parameters:
  • text (str) – The input text to predict entities for.

  • labels (List[str]) – The labels to predict.

  • flat_ner (bool) – Whether to use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold for predictions. Defaults to 0.5.

  • multi_label (bool) – Whether to allow multiple labels per entity. Defaults to False.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

  • **kwargs – Additional arguments passed to inference.

Returns:

List of entity predictions as dictionaries.

Return type:

List[Dict[str, Any]]

batch_predict_entities(texts, labels, flat_ner=True, threshold=0.5, multi_label=False, **kwargs)[source]

Predict entities for multiple texts.

DEPRECATED: Use inference instead.

This method will be removed in a future release. It now forwards to GLiNER.inference(…) to perform inference.

Parameters:
  • texts (List[str]) – Input texts.

  • labels (List[str]) – Labels to predict.

  • flat_ner (bool) – Use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold. Defaults to 0.5.

  • multi_label (bool) – Allow multiple labels per token/entity. Defaults to False.

  • **kwargs – Extra arguments forwarded to inference (e.g., batch_size).

Returns:

List of entity predictions for each text.

Return type:

List[List[Dict[str, Any]]]

evaluate(test_data, flat_ner=False, multi_label=False, threshold=0.5, batch_size=12)[source]

Evaluate the model on a given test dataset.

Parameters:
  • test_data (List[Dict[str, Any]]) – The test data containing text and entity annotations.

  • flat_ner (bool) – Whether to use flat NER. Defaults to False.

  • multi_label (bool) – Whether to use multi-label classification. Defaults to False.

  • threshold (float) – The threshold for predictions. Defaults to 0.5.

  • batch_size (int) – The batch size for evaluation. Defaults to 12.

Returns:

Tuple containing the evaluation output and the F1 score.

Return type:

Tuple[Any, float]

class gliner.model.BaseBiEncoderGLiNER(*args, **kwargs)[source]

Bases: BaseEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

resize_embeddings(**kwargs)[source]

Resize token embeddings to match tokenizer vocabulary size.

Parameters:

set_class_token_index – Whether to update the class token index.

encode_labels(labels, batch_size=8)[source]

Compute embeddings for labels using the label encoder.

Parameters:
  • labels (List[str]) – A list of labels to encode.

  • batch_size (int) – Batch size for processing labels.

Returns:

Tensor containing label embeddings with shape (num_labels, hidden_size).

Raises:

NotImplementedError – If the model doesn’t have a label encoder.

Return type:

FloatTensor

batch_predict_with_embeds(texts, labels_embeddings, labels, flat_ner=True, threshold=0.5, multi_label=False, batch_size=8, packing_config=None, input_spans=None, return_class_probs=False)[source]

Predict entities for a batch of texts using pre-computed label embeddings.

Parameters:
  • texts (List[str]) – A list of input texts to predict entities for.

  • labels_embeddings (Tensor) – Pre-computed embeddings for the labels.

  • labels (List[str]) – List of label strings corresponding to the embeddings.

  • flat_ner (bool) – Whether to use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold for predictions. Defaults to 0.5.

  • multi_label (bool) – Whether to allow multiple labels per token. Defaults to False.

  • batch_size (int) – Batch size for processing. Defaults to 8.

  • packing_config (InferencePackingConfig | None) – Configuration describing how to pack encoder inputs. When None the instance-level configuration set via configure_inference_packing is used.

  • input_spans (List[List[Dict]]) – Input entity spans to limit predictions to. Each span is a dict with ‘start’ and ‘end’ character positions.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

Returns:

List of lists with predicted entities.

Return type:

List[List[Dict[str, Any]]]

predict_with_embeds(text, labels_embeddings, labels, flat_ner=True, threshold=0.5, multi_label=False, return_class_probs=False, **kwargs)[source]

Predict entities for a single text input using pre-computed label embeddings.

Parameters:
  • text – The input text to predict entities for.

  • labels_embeddings – Pre-computed embeddings for the labels.

  • labels – List of label strings corresponding to the embeddings.

  • flat_ner – Whether to use flat NER. Defaults to True.

  • threshold – Confidence threshold for predictions. Defaults to 0.5.

  • multi_label – Whether to allow multiple labels per entity. Defaults to False.

  • return_class_probs – Whether to include class probabilities in output. Defaults to False.

  • **kwargs – Additional arguments passed to batch_predict_with_embeds.

Returns:

List of entity predictions.

class gliner.model.UniEncoderSpanGLiNER(*args, **kwargs)[source]

Bases: BaseEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class

alias of UniEncoderSpanConfig

model_class

alias of UniEncoderSpanModel

ort_model_class

alias of UniEncoderSpanORTModel

data_processor_class

alias of UniEncoderSpanProcessor

data_collator_class

alias of UniEncoderSpanDataCollator

decoder_class

alias of SpanDecoder

class gliner.model.UniEncoderTokenGLiNER(*args, **kwargs)[source]

Bases: BaseEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class

alias of UniEncoderTokenConfig

model_class

alias of UniEncoderTokenModel

ort_model_class

alias of UniEncoderTokenORTModel

data_processor_class

alias of UniEncoderTokenProcessor

data_collator_class

alias of UniEncoderTokenDataCollator

decoder_class

alias of TokenDecoder

class gliner.model.BiEncoderSpanGLiNER(*args, **kwargs)[source]

Bases: BaseBiEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class

alias of BiEncoderSpanConfig

model_class

alias of BiEncoderSpanModel

ort_model_class

alias of BiEncoderSpanORTModel

data_processor_class

alias of BiEncoderSpanProcessor

data_collator_class

alias of BiEncoderSpanDataCollator

decoder_class

alias of SpanDecoder

class gliner.model.BiEncoderTokenGLiNER(*args, **kwargs)[source]

Bases: BaseBiEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class

alias of BiEncoderTokenConfig

model_class

alias of BiEncoderTokenModel

ort_model_class

alias of BiEncoderTokenORTModel

data_processor_class

alias of BiEncoderTokenProcessor

data_collator_class

alias of BiEncoderTokenDataCollator

decoder_class

alias of TokenDecoder

class gliner.model.UniEncoderSpanDecoderGLiNER(*args, **kwargs)[source]

Bases: BaseEncoderGLiNER

GLiNER model with span-based encoding and label decoding capabilities.

Supports generating textual labels for entities.

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class

alias of UniEncoderSpanDecoderConfig

model_class

alias of UniEncoderSpanDecoderModel

ort_model_class: type = None
data_processor_class

alias of UniEncoderSpanDecoderProcessor

data_collator_class

alias of UniEncoderSpanDecoderDataCollator

decoder_class

alias of SpanGenerativeDecoder

set_labels_trie(labels)[source]

Initialize the labels trie for constrained generation.

Parameters:

labels (List[str]) – Labels that will be used for constrained generation.

Returns:

Trie structure for constrained beam search.

Raises:

NotImplementedError – If the model doesn’t have a decoder.

generate_labels(model_output, **gen_kwargs)[source]

Generate textual class labels for each entity span.

Parameters:
  • model_output – Model output containing decoder_embedding and decoder_embedding_mask.

  • **gen_kwargs – Generation parameters (max_new_tokens, temperature, etc.).

Returns:

List of generated label strings.

inference(texts, labels, flat_ner=True, threshold=0.5, multi_label=False, batch_size=8, gen_constraints=None, num_gen_sequences=1, packing_config=None, input_spans=None, return_class_probs=False, **gen_kwargs)[source]

Predict entities with optional label generation.

Parameters:
  • texts (str | List[str]) – Input texts (string or list of strings).

  • labels (List[str]) – Entity type labels.

  • flat_ner (bool) – Whether to use flat NER.

  • threshold (float) – Confidence threshold.

  • multi_label (bool) – Allow multiple labels per span.

  • batch_size (int) – Batch size for processing.

  • gen_constraints (List[str] | None) – Labels to constrain generation.

  • num_gen_sequences (int) – Number of label sequences to generate per span.

  • packing_config (InferencePackingConfig | None) – Inference packing configuration.

  • input_spans (List[List[Dict]]) – Input entity spans to limit predictions to. Each span is a dict with ‘start’ and ‘end’ character positions.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

  • **gen_kwargs – Additional generation parameters.

Returns:

List of entity predictions with optional generated labels.

Return type:

List[List[Dict[str, Any]]]

export_to_onnx(save_dir, onnx_filename='model.onnx', quantized_filename='model_quantized.onnx', quantize=False, opset=19)[source]

ONNX export not supported for encoder-decoder models.

Raises:

NotImplementedError – Always raised as this model type cannot be exported to ONNX

class gliner.model.UniEncoderTokenDecoderGLiNER(*args, **kwargs)[source]

Bases: UniEncoderSpanDecoderGLiNER

GLiNER model with token-based encoding and label decoding capabilities.

Combines token-level BIO tagging with a decoder that generates entity type labels autoregressively.

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class

alias of UniEncoderTokenDecoderConfig

model_class

alias of UniEncoderTokenDecoderModel

ort_model_class: type = None
data_processor_class

alias of UniEncoderTokenDecoderProcessor

data_collator_class

alias of UniEncoderTokenDecoderDataCollator

decoder_class

alias of TokenGenerativeDecoder

class gliner.model.UniEncoderSpanRelexGLiNER(*args, **kwargs)[source]

Bases: BaseEncoderGLiNER

GLiNER model for both entity recognition and relation extraction.

Performs joint entity and relation prediction, allowing the model to simultaneously detect entities and the relationships between them in a single forward pass.

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class

alias of UniEncoderSpanRelexConfig

model_class

alias of UniEncoderSpanRelexModel

ort_model_class

alias of UniEncoderSpanRelexORTModel

data_processor_class

alias of RelationExtractionSpanProcessor

data_collator_class

alias of RelationExtractionSpanDataCollator

decoder_class

alias of SpanRelexDecoder

set_class_indices()[source]

Set the class token indices for entities and relations in the configuration.

inference(texts, labels, relations=[], flat_ner=True, threshold=0.5, adjacency_threshold=None, relation_threshold=None, multi_label=False, batch_size=8, packing_config=None, input_spans=None, return_relations=True, return_class_probs=False)[source]

Predict entities and relations.

Parameters:
  • texts (str | List[str]) – Input texts (str or List[str]).

  • labels (List[str]) – Entity type labels (List[str]).

  • relations (List[str]) – Relation type labels (List[str]).

  • flat_ner (bool) – Whether to use flat NER (no nested entities).

  • threshold (float) – Confidence threshold for entities.

  • adjacency_threshold (float | None) – Confidence threshold for adjacency matrix reconstruction (defaults to threshold).

  • relation_threshold (float | None) – Confidence threshold for relations (defaults to threshold).

  • multi_label (bool) – Allow multiple labels per span.

  • batch_size (int) – Batch size for processing.

  • packing_config (InferencePackingConfig | None) – Inference packing configuration.

  • input_spans (List[List[Dict]]) – Input entity spans to limit predictions to. Each span is a dict with ‘start’ and ‘end’ character positions.

  • return_relations (bool) – Whether to return relation predictions.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

Returns:

Tuple of (entities, relations) if return_relations=True, else just entities.

Return type:

List[List[Dict[str, Any]]] | Tuple[List[List[Dict[str, Any]]], List[List[Dict[str, Any]]]]

evaluate(test_data, flat_ner=False, multi_label=False, threshold=0.5, adjacency_threshold=None, relation_threshold=None, batch_size=12)[source]

Evaluate the model on both NER and relation extraction tasks.

Parameters:
  • test_data (List[Dict[str, Any]]) – The test data containing text, entity, and relation annotations.

  • flat_ner (bool) – Whether to use flat NER. Defaults to False.

  • multi_label (bool) – Whether to use multi-label classification. Defaults to False.

  • threshold (float) – The threshold for entity predictions. Defaults to 0.5.

  • adjacency_threshold (float | None) – Threshold for adjacency matrix reconstruction. Defaults to threshold.

  • relation_threshold (float | None) – The threshold for relation predictions. Defaults to threshold.

  • batch_size (int) – The batch size for evaluation. Defaults to 12.

Returns:

  • ner_output: Formatted string with NER P, R, F1

  • ner_f1: NER F1 score

  • rel_output: Formatted string with relation extraction P, R, F1

  • rel_f1: Relation extraction F1 score

Return type:

Tuple of ((ner_output, ner_f1), (rel_output, rel_f1)) containing

class gliner.model.UniEncoderTokenRelexGLiNER(*args, **kwargs)[source]

Bases: UniEncoderSpanRelexGLiNER

GLiNER model for both entity recognition and relation extraction.

Performs joint entity and relation prediction, allowing the model to simultaneously detect entities and the relationships between them in a single forward pass.

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class

alias of UniEncoderTokenRelexConfig

model_class

alias of UniEncoderTokenRelexModel

ort_model_class

alias of UniEncoderTokenRelexORTModel

data_processor_class

alias of RelationExtractionTokenProcessor

data_collator_class

alias of RelationExtractionTokenDataCollator

decoder_class

alias of TokenRelexDecoder

class gliner.model.GLiNER(*args, **kwargs)[source]

Bases: Module, PyTorchModelHubMixin

Meta GLiNER class that automatically instantiates the appropriate GLiNER variant.

This class provides a unified interface for all GLiNER models, automatically switching to specialized model types based on the model configuration. It supports various NER architectures including uni-encoder, bi-encoder, decoder-based, and relation extraction models.

The class automatically detects the model type based on:
  • span_mode: Token-level vs span-level

  • labels_encoder: Uni-encoder vs bi-encoder

  • labels_decoder: Standard vs decoder-based

  • relations_layer: NER-only vs joint entity-relation extraction

model

The loaded GLiNER model instance (automatically typed).

config

Model configuration.

data_processor

Data processor for the model.

decoder

Decoder for predictions.

Examples

Load a pretrained uni-encoder span model: >>> model = GLiNER.from_pretrained(“urchade/gliner_small-v2.1”)

Load a bi-encoder model: >>> model = GLiNER.from_pretrained(“knowledgator/gliner-bi-small-v1.0”)

Load from local configuration: >>> config = GLiNERConfig.from_pretrained(“config.json”) >>> model = GLiNER.from_config(config)

Initialize from scratch: >>> config = GLiNERConfig(model_name=”microsoft/deberta-v3-small”) >>> model = GLiNER(config)

Initialize a GLiNER model with automatic type detection.

This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.

Parameters:
  • config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • **kwargs – Additional arguments passed to the specific GLiNER variant.

Examples

>>> config = GLiNERConfig(model_name="bert-base-cased")
>>> model = GLiNER(config)
>>> model = GLiNER("path/to/gliner_config.json")
__init__(config, **kwargs)[source]

Initialize a GLiNER model with automatic type detection.

This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.

Parameters:
  • config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • **kwargs – Additional arguments passed to the specific GLiNER variant.

Examples

>>> config = GLiNERConfig(model_name="bert-base-cased")
>>> model = GLiNER(config)
>>> model = GLiNER("path/to/gliner_config.json")
classmethod from_pretrained(model_id, revision=None, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, token=None, map_location='cpu', strict=False, load_tokenizer=None, resize_token_embeddings=True, compile_torch_model=False, quantize=False, load_onnx_model=False, onnx_model_file='model.onnx', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]

Load a pretrained GLiNER model with automatic type detection.

This method loads the configuration, determines the appropriate GLiNER variant, and delegates to that variant’s from_pretrained method.

Parameters:
  • model_id (str) – Model identifier or local path.

  • revision (str | None) – Model revision.

  • cache_dir (str | Path | None) – Cache directory.

  • force_download (bool) – Force redownload.

  • proxies (dict | None) – Proxy configuration.

  • resume_download (bool) – Resume interrupted downloads.

  • local_files_only (bool) – Only use local files.

  • token (str | bool | None) – HF token for private repos.

  • map_location (str) – Device to map model to.

  • strict (bool) – Enforce strict state_dict loading.

  • load_tokenizer (bool | None) – Whether to load tokenizer.

  • resize_token_embeddings (bool | None) – Whether to resize embeddings.

  • compile_torch_model (bool | None) – Whether to compile with torch.compile.

  • quantize (bool | str) – Quantization dtype. True or "fp16" for float16, "bf16" for bfloat16, "int8" for int8 dynamic quantization (requires torchao). False to disable.

  • load_onnx_model (bool | None) – Whether to load ONNX model instead of PyTorch.

  • onnx_model_file (str | None) – Path to ONNX model file.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Appropriate GLiNER model instance.

Examples

>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")
>>> model = GLiNER.from_pretrained("knowledgator/gliner-bi-small-v1.0")
>>> model = GLiNER.from_pretrained("path/to/local/model", quantize=True)
classmethod from_config(config, cache_dir=None, load_tokenizer=True, resize_token_embeddings=True, backbone_from_pretrained=True, compile_torch_model=False, quantize=False, map_location='cpu', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]

Create a GLiNER model from configuration.

Parameters:
  • config (GLiNERConfig | str | Path | dict) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • cache_dir (str | Path | None) – Cache directory for downloads.

  • load_tokenizer (bool) – Whether to load tokenizer.

  • resize_token_embeddings (bool) – Whether to resize token embeddings.

  • backbone_from_pretrained (bool) – Whether to load the backbone encoder from pretrained weights.

  • compile_torch_model (bool) – Whether to compile with torch.compile.

  • quantize (bool | str) – Quantization dtype. True or "fp16" for float16, "bf16" for bfloat16, "int8" for int8 dynamic quantization (requires torchao). False to disable.

  • map_location (str) – Device to map model to.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Initialized GLiNER model instance.

Examples

>>> config = GLiNERConfig(model_name="microsoft/deberta-v3-small")
>>> model = GLiNER.from_config(config)
>>> model = GLiNER.from_config("path/to/gliner_config.json")
property model_map: dict[str, dict[str, Any]]

Map configuration patterns to their corresponding GLiNER classes.

Returns:

Dictionary mapping model types to their classes and descriptions.

get_model_type()[source]

Get the type of the current model instance.

Returns:

String identifier of the model type

Return type:

str