gliner.model moduleΒΆ

class gliner.model.BaseGLiNER(*args, **kwargs)[source]ΒΆ

Bases: ABC, Module, PyTorchModelHubMixin

Initialize a BaseGLiNER model.

Parameters:
  • config (BaseGLiNERConfig) – Model configuration object.

  • model (BaseModel | None) – Pre-initialized model instance. If None, creates a new model.

  • tokenizer (BaseModel | None) – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor (BaseProcessor | None) – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained (bool | None) – Whether to load the backbone from pretrained weights.

  • cache_dir (str | Path | None) – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_class: type = NoneΒΆ
model_class: type = NoneΒΆ
ort_model_class: type = NoneΒΆ
data_processor_class: type = NoneΒΆ
data_collator_class: type = NoneΒΆ
decoder_class: type = NoneΒΆ
__init__(config, model=None, tokenizer=None, data_processor=None, backbone_from_pretrained=False, cache_dir=None, **kwargs)[source]ΒΆ

Initialize a BaseGLiNER model.

Parameters:
  • config (BaseGLiNERConfig) – Model configuration object.

  • model (BaseModel | None) – Pre-initialized model instance. If None, creates a new model.

  • tokenizer (BaseModel | None) – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor (BaseProcessor | None) – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained (bool | None) – Whether to load the backbone from pretrained weights.

  • cache_dir (str | Path | None) – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

abstract resize_embeddings()[source]ΒΆ
abstract inference()[source]ΒΆ
abstract evaluate()[source]ΒΆ
forward(*args, **kwargs)[source]ΒΆ

Forward pass through the model.

Parameters:
  • *args – Positional arguments passed to the model.

  • **kwargs – Keyword arguments passed to the model.

Returns:

Model output from the forward pass.

property deviceΒΆ

Get the device where the model is located.

Returns:

Torch device object (CPU or CUDA).

configure_inference_packing(config)[source]ΒΆ

Configure default packing behavior for inference calls.

Passing None disables packing by default. Individual inference methods accept a packing_config argument to override this setting on a per-call basis.

Parameters:

config (InferencePackingConfig | None) – Inference packing configuration or None to disable packing.

compile()[source]ΒΆ

Compile the model using torch.compile for optimization.

Uses dynamic=True to generate shape-generic kernels, which avoids recompilation on variable-length NER inputs. Also enables capture_scalar_outputs to trace through data-dependent shape operations (e.g., computing max number of entity types per batch).

Best combined with quantize() for maximum throughput (~1.9x over fp32).

When FlashDeBERTa is active, its custom Triton kernels are incompatible with torch.compile tracing. The encoder forward is automatically wrapped with torch.compiler.disable so the rest of the model (span representation, scoring, etc.) still benefits from compilation.

quantize(dtype='int8')[source]ΒΆ

Apply int8 quantization to the model.

Only "int8" is accepted; for precision changes (fp16/bf16), use dtype= on GLiNER.from_pretrained() or model.to(torch_dtype) β€” those are downcasts, not quantization, and were removed from this API.

Parameters:

dtype (str) – Must be "int8". On CPU, uses PyTorch’s built-in dynamic quantization with FBGEMM int8 kernels (~1.6x speedup). On GPU, uses torchao int8 weight-only quantization (~50% memory reduction, no speed gain; requires the torchao package). Stock DeBERTa-based models lose accuracy with int8; use this with models fine-tuned with quantization-aware training (QAT).

Raises:
  • RuntimeError – If the model is an ONNX model (use ONNX quantization instead).

  • ValueError – If dtype is not "int8". Precision aliases (fp16/bf16) raise with a migration message pointing at dtype= / model.to(...).

  • ImportError – If torchao is not installed and int8 on GPU is requested.

Examples

>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1", map_location="cuda")
>>> model.quantize("int8")  # int8 (torchao on GPU, FBGEMM on CPU)
>>> # For precision-only changes, prefer:
>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1", dtype="bf16")
prepare_state_dict(state_dict)[source]ΒΆ

Prepare state dict for saving, handling torch.compile artifacts.

Parameters:

state_dict – Original state dictionary from the model.

Returns:

Cleaned state dictionary with torch.compile prefixes removed.

save_pretrained(save_directory, *, config=None, repo_id=None, push_to_hub=False, safe_serialization=False, **push_to_hub_kwargs)[source]ΒΆ

Save model weights and configuration to local directory.

Parameters:
  • save_directory (str | Path) – Path to directory for saving.

  • config (BaseGLiNERConfig | None) – Model configuration. Uses self.config if None.

  • repo_id (str | None) – Repository ID for hub upload.

  • push_to_hub (bool) – Whether to push to HuggingFace Hub.

  • safe_serialization (bool) – Whether to use safetensors format.

  • **push_to_hub_kwargs – Additional arguments for push_to_hub.

Returns:

Repository URL if pushed to hub, None otherwise.

Return type:

str | None

classmethod load_from_config(config, cache_dir=None, load_tokenizer=True, resize_token_embeddings=True, backbone_from_pretrained=True, compile_torch_model=False, quantize=None, map_location='cpu', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]ΒΆ

Initialize a model from configuration without loading pretrained weights.

This method creates a new model instance from scratch using the provided configuration. The backbone encoder can optionally be loaded from pretrained weights, but the GLiNER-specific layers are always randomly initialized.

Parameters:
  • config (str | Path | GLiNERConfig | dict) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • cache_dir (str | Path | None) – Cache directory for downloads.

  • load_tokenizer (bool) – Whether to load tokenizer.

  • resize_token_embeddings (bool) – Whether to resize token embeddings.

  • backbone_from_pretrained (bool) – Whether to load the backbone encoder from pretrained weights.

  • compile_torch_model (bool) – Whether to compile with torch.compile.

  • quantize (str | None) – Only "int8" is accepted (int8 dynamic quantization: torchao on GPU, FBGEMM on CPU). For precision-only changes (fp16/bf16), use dtype=. None to disable.

  • map_location (str) – Device to map model to.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Initialized model instance with randomly initialized weights (except backbone if specified).

Examples

>>> config = GLiNERConfig(model_name="microsoft/deberta-v3-small")
>>> model = GLiNER.load_from_config(config)
>>> model = GLiNER.load_from_config("path/to/gliner_config.json")
>>> # Load with pretrained backbone but random GLiNER layers
>>> model = GLiNER.load_from_config(config, backbone_from_pretrained=True)
classmethod from_pretrained(model_id, model_dir=None, revision=None, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, token=None, map_location='cpu', strict=False, load_tokenizer=None, resize_token_embeddings=True, compile_torch_model=False, quantize=None, dtype=None, low_cpu_mem_usage=False, variant=None, load_onnx_model=False, onnx_model_file='model.onnx', session_options=None, max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]ΒΆ

Load pretrained model from HuggingFace Hub or local directory.

Parameters:
  • model_id (str) – Model identifier or local path.

  • model_dir (str | None) – Override model directory path.

  • revision (str | None) – Model revision.

  • cache_dir (str | Path | None) – Cache directory.

  • force_download (bool) – Force redownload.

  • proxies (dict | None) – Proxy configuration.

  • resume_download (bool) – Resume interrupted downloads.

  • local_files_only (bool) – Only use local files.

  • token (str | bool | None) – HF token for private repos.

  • map_location (str) – Device to map model to.

  • strict (bool) – Enforce strict state_dict loading.

  • load_tokenizer (bool | None) – Whether to load tokenizer.

  • resize_token_embeddings (bool | None) – Whether to resize embeddings.

  • compile_torch_model (bool | None) – Whether to compile with torch.compile.

  • quantize (str | None) – Only "int8" is accepted (int8 dynamic quantization: torchao on GPU, FBGEMM on CPU). For precision-only changes (fp16/bf16), use dtype=. None to disable.

  • dtype (str | dtype | None) – Target floating-point dtype for the loaded weights (e.g. torch.bfloat16, "bf16", "fp16"). When set, the model shell is pre-cast and each state-dict tensor is cast during reading, so the full fp32 copy is never materialized β€” peak host memory is roughly half of the default path for bf16/fp16. Prefer this over quantize for plain precision changes.

  • low_cpu_mem_usage (bool) – If True, build the model under torch.device("meta") and use load_state_dict(assign=True) to swap loaded tensors into place. Skips the random-init compute, the fp32 random-init shell, and the post-init cast pass β€” the model goes from β€œshape descriptor” to β€œloaded weights” in one shot. Non-persistent buffers (e.g. DeBERTa’s position_ids) are re-materialized after the load. Default False for now (opt-in); enable for cold-start / serverless deployments where every 100ms matters.

  • variant (str | None) – If set ("fp16" or "bf16"), prefer model.{variant}.safetensors over the default fp32 file. Best-effort: the loader probes the Hub (or local path) for the variant file before downloading. If it is published, only the variant file is fetched (~half the bytes vs fp32) and loaded directly. If it is not published, a UserWarning is emitted and the loader falls back to the default fp32 file plus an in-memory cast β€” same outcome as dtype={variant!r} alone, no I/O win, no error. dtype is inferred from variant when not set; passing both with mismatched precisions raises. None (default) preserves the prior behavior verbatim.

  • load_onnx_model (bool | None) – Whether to load ONNX model instead of PyTorch.

  • onnx_model_file (str | None) – Path to ONNX model file.

  • session_options – ONNX runtime session options.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Loaded model instance.

export_to_onnx(save_dir, onnx_filename='model.onnx', quantized_filename='model_quantized.onnx', quantize=False, opset=19, **export_kwargs)[source]ΒΆ

Unified ONNX export method using specifications from child classes.

Parameters:
  • save_dir (str | Path) – Directory to save ONNX files.

  • onnx_filename (str) – Name of the ONNX model file.

  • quantized_filename (str) – Name of the quantized model file.

  • quantize (bool) – Whether to create a quantized version.

  • opset (int) – ONNX opset version.

  • **export_kwargs – Additional export arguments (model-specific).

Returns:

  • onnx_path: Path to standard ONNX model

  • quantized_path: Path to quantized model (if quantize=True)

Return type:

Dictionary with paths to exported models

freeze_component(component_name)[source]ΒΆ

Freeze a specific component of the model.

Parameters:

component_name (str) – Name of component to freeze (e.g., β€˜text_encoder’, β€˜labels_encoder’, β€˜decoder’)

unfreeze_component(component_name)[source]ΒΆ

Unfreeze a specific component of the model.

Parameters:

component_name (str) – Name of component to unfreeze

classmethod create_training_args(output_dir, learning_rate=5e-05, weight_decay=0.01, others_lr=None, others_weight_decay=None, focal_loss_alpha=-1, focal_loss_gamma=0.0, rel_focal_loss_alpha=None, rel_focal_loss_gamma=None, focal_loss_prob_margin=0.0, loss_reduction='sum', negatives=1.0, masking='none', lr_scheduler_type='linear', warmup_ratio=0.1, per_device_train_batch_size=8, per_device_eval_batch_size=8, max_grad_norm=1.0, max_steps=10000, save_steps=1000, save_total_limit=10, logging_steps=10, use_cpu=False, bf16=False, dataloader_num_workers=1, report_to='none', **kwargs)[source]ΒΆ

Create training arguments with sensible defaults.

Parameters:
  • output_dir (str | Path) – Directory to save model checkpoints.

  • learning_rate (float) – Learning rate for main parameters.

  • weight_decay (float) – Weight decay for main parameters.

  • others_lr (float | None) – Learning rate for other parameters.

  • others_weight_decay (float | None) – Weight decay for other parameters.

  • focal_loss_alpha (float) – Alpha for focal loss.

  • focal_loss_gamma (float) – Gamma for focal loss.

  • rel_focal_loss_alpha (float | None) – Alpha for relation focal loss. Defaults to entity alpha.

  • rel_focal_loss_gamma (float | None) – Gamma for relation focal loss. Defaults to entity gamma.

  • focal_loss_prob_margin (float) – Probability margin for focal loss.

  • loss_reduction (str) – Loss reduction method.

  • negatives (float) – Negative sampling ratio.

  • masking (str) – Masking strategy.

  • lr_scheduler_type (str) – Learning rate scheduler type.

  • warmup_ratio (float) – Warmup ratio.

  • per_device_train_batch_size (int) – Training batch size.

  • per_device_eval_batch_size (int) – Evaluation batch size.

  • max_grad_norm (float) – Maximum gradient norm.

  • max_steps (int) – Maximum training steps.

  • save_steps (int) – Save checkpoint every N steps.

  • save_total_limit (int) – Maximum number of checkpoints to keep.

  • logging_steps (int) – Log every N steps.

  • use_cpu (bool) – Whether to use CPU.

  • bf16 (bool) – Whether to use bfloat16.

  • dataloader_num_workers (int) – Number of dataloader workers.

  • report_to (str) – Where to report metrics.

  • **kwargs – Additional training arguments.

Returns:

TrainingArguments instance.

Return type:

TrainingArguments

train_model(train_dataset, eval_dataset, training_args=None, freeze_components=None, compile_model=False, output_dir=None, **training_kwargs)[source]ΒΆ

Train the model.

Parameters:
  • train_dataset – Training dataset.

  • eval_dataset – Evaluation dataset.

  • training_args (TrainingArguments | None) – Training arguments (created with defaults if None).

  • freeze_components (list[str] | None) – List of component names to freeze (e.g., [β€˜text_encoder’, β€˜decoder’]).

  • compile_model (bool) – Whether to compile model with torch.compile.

  • output_dir (str | Path | None) – Output directory (required if training_args is None).

  • **training_kwargs – Additional kwargs for creating training args.

Returns:

Trained Trainer instance.

Return type:

Trainer

class gliner.model.BaseEncoderGLiNER(*args, **kwargs)[source]ΒΆ

Bases: BaseGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

set_class_indices()[source]ΒΆ

Set the class token index in the configuration based on tokenizer vocabulary.

resize_embeddings(set_class_token_index=True)[source]ΒΆ

Resize token embeddings to match tokenizer vocabulary size.

Parameters:

set_class_token_index – Whether to update the class token index.

prepare_inputs(texts)[source]ΒΆ

Prepare inputs for the model by tokenizing and creating index mappings.

Parameters:

texts (List[str]) – The input texts to process.

Returns:

  • all_tokens: List of tokenized texts

  • all_start_token_idx_to_text_idx: Start position mappings

  • all_end_token_idx_to_text_idx: End position mappings

Return type:

Tuple containing

prepare_base_input(all_tokens)[source]ΒΆ

Prepare base input format for data collation.

Parameters:

all_tokens (List[List[str]]) – List of tokenized texts.

Returns:

List of input dictionaries ready for collation.

Return type:

List[Dict[str, Any]]

prepare_batch(texts, labels, input_spans=None, **kwargs)[source]ΒΆ

Prepare raw inputs for inference (tokenization and normalization).

This method handles text normalization, tokenization, and span conversion. Use this as the first step in the inference pipeline.

Parameters:
  • texts (str | List[str]) – Single text string or list of texts.

  • labels (str | List[str] | List[List[str]]) – Entity labels - string, list of strings, or per-text label lists.

  • input_spans (List[List[Dict]] | None) – Optional pre-defined spans to classify (character positions).

  • **kwargs – Additional keyword arguments passed to the data processor.

Returns:

  • input_x: List of input dicts ready for collation

  • tokens: Tokenized texts

  • start_token_map: Per-text mapping from token idx to char start

  • end_token_map: Per-text mapping from token idx to char end

  • word_input_spans: Spans converted to word indices (or None)

  • entity_types: Normalized entity types

  • valid_texts: Non-empty texts that will be processed

  • valid_to_orig_idx: Mapping from valid indices to original indices

  • num_original: Total number of original texts

Return type:

Dictionary containing

collate_batch(input_x, entity_types, collator=None)[source]ΒΆ

Collate prepared inputs into a tensor batch.

Parameters:
  • input_x (List[Dict[str, Any]]) – List of input dicts from prepare_batch.

  • entity_types (List[str] | List[List[str]]) – Entity type labels.

  • collator (Any | None) – Optional pre-created collator instance. If None, creates one.

Returns:

Collated batch dictionary with tensors ready for the model.

Return type:

Dict[str, Any]

run_batch(batch, threshold=0.5, packing_config=None, move_to_device=True, **external_inputs)[source]ΒΆ

Run model forward pass on a collated batch.

Parameters:
  • batch (Dict[str, Any]) – Collated batch from collate_batch.

  • threshold (float) – Confidence threshold for predictions.

  • packing_config (InferencePackingConfig | None) – Optional inference packing configuration.

  • move_to_device (bool) – Whether to move tensors to model device.

  • **external_inputs – Additional inputs to pass to the model.

Returns:

Model output containing logits and span information.

Return type:

Any

decode_batch(model_output, batch, threshold=0.5, flat_ner=True, multi_label=False, return_class_probs=False, input_spans=None)[source]ΒΆ

Decode model output into entity predictions.

Parameters:
  • model_output (Any) – Output from run_batch.

  • batch (Dict[str, Any]) – The collated batch (needs β€˜tokens’ and β€˜id_to_classes’).

  • threshold (float) – Confidence threshold for predictions.

  • flat_ner (bool) – Whether to use flat NER (no overlapping entities).

  • multi_label (bool) – Whether to allow multiple labels per span.

  • return_class_probs (bool) – Whether to include class probabilities.

  • input_spans (List[List[Tuple[int, int]]] | None) – Optional word-level input spans to classify.

Returns:

List of entity lists (one per text in batch).

Return type:

List[List[Any]]

map_entities_to_text(decoded, valid_texts, valid_to_orig_idx, start_token_map, end_token_map, num_original)[source]ΒΆ

Map decoded entities back to character positions in original texts.

Parameters:
  • decoded (List[List[Any]]) – Decoded entity spans from decode_batch.

  • valid_texts (List[str]) – List of valid (non-empty) texts.

  • valid_to_orig_idx (List[int]) – Mapping from valid indices to original indices.

  • start_token_map (List[List[int]]) – Per-text token-to-char-start mapping.

  • end_token_map (List[List[int]]) – Per-text token-to-char-end mapping.

  • num_original (int) – Total number of original texts.

Returns:

List of entity dicts aligned with original input texts.

Return type:

List[List[Dict[str, Any]]]

create_collator()[source]ΒΆ

Create a data collator instance for batch collation.

Useful for serve.py to create a reusable collator.

Returns:

Configured data collator instance.

Return type:

Any

inference(texts, labels, flat_ner=True, threshold=0.5, multi_label=False, batch_size=8, packing_config=None, input_spans=None, return_class_probs=False, **external_inputs)[source]ΒΆ

Predict entities for a batch of texts.

Parameters:
  • texts (str | List[str]) – A list of input texts to predict entities for or a single text string.

  • labels (List[str]) – A list of labels to predict.

  • flat_ner (bool) – Whether to use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold for predictions. Defaults to 0.5.

  • multi_label (bool) – Whether to allow multiple labels per token. Defaults to False.

  • batch_size (int) – Batch size for processing. Defaults to 8.

  • packing_config (InferencePackingConfig | None) – Configuration describing how to pack encoder inputs. When None the instance-level configuration set via configure_inference_packing is used.

  • input_spans (List[List[Dict]] | None) – Input entity spans that should be classified by the model.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

  • **external_inputs – Additional inputs to pass to the model.

Returns:

  • start: Start character position

  • end: End character position

  • text: Entity text

  • label: Entity type

  • score: Confidence score

  • class_probs: (optional) Dictionary mapping class names to probabilities (top 5)

Return type:

List of lists with predicted entities, where each entity is a dictionary containing

predict_entities(text, labels, flat_ner=True, threshold=0.5, multi_label=False, return_class_probs=False, **kwargs)[source]ΒΆ

Predict entities for a single text input.

Parameters:
  • text (str) – The input text to predict entities for.

  • labels (List[str]) – The labels to predict.

  • flat_ner (bool) – Whether to use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold for predictions. Defaults to 0.5.

  • multi_label (bool) – Whether to allow multiple labels per entity. Defaults to False.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

  • **kwargs – Additional arguments passed to inference.

Returns:

List of entity predictions as dictionaries.

Return type:

List[Dict[str, Any]]

batch_predict_entities(texts, labels, flat_ner=True, threshold=0.5, multi_label=False, **kwargs)[source]ΒΆ

Predict entities for multiple texts.

DEPRECATED: Use inference instead.

This method will be removed in a future release. It now forwards to GLiNER.inference(…) to perform inference.

Parameters:
  • texts (List[str]) – Input texts.

  • labels (List[str]) – Labels to predict.

  • flat_ner (bool) – Use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold. Defaults to 0.5.

  • multi_label (bool) – Allow multiple labels per token/entity. Defaults to False.

  • **kwargs – Extra arguments forwarded to inference (e.g., batch_size).

Returns:

List of entity predictions for each text.

Return type:

List[List[Dict[str, Any]]]

evaluate(test_data, flat_ner=False, multi_label=False, threshold=0.5, batch_size=12, entity_types=None)[source]ΒΆ

Evaluate the model on a given test dataset.

Parameters:
  • test_data (List[Dict[str, Any]]) – The test data containing text and entity annotations.

  • flat_ner (bool) – Whether to use flat NER. Defaults to False.

  • multi_label (bool) – Whether to use multi-label classification. Defaults to False.

  • threshold (float) – The threshold for predictions. Defaults to 0.5.

  • batch_size (int) – The batch size for evaluation. Defaults to 12.

  • entity_types (List[str] | None) – Optional list of entity types to evaluate. If None, extracts from test data. Defaults to None.

Returns:

Tuple containing the evaluation output and the F1 score.

Return type:

Tuple[Any, float]

compress_prompt_embeddings(texts, labels, rel_labels=None, batch_size=8, distill=False, distill_threshold=0.3, distill_epochs=3, distill_lr=1e-05, distill_batch_size=None, distill_output_dir='./distill_ckpt', distill_train_kwargs=None)[source]ΒΆ

Precompute averaged prompt embeddings for each label.

Runs the normal forward pass over (texts, labels) pairs, extracts the per-label prompt embedding from each example, and stores the mean per label on the underlying model. Sets config.precomputed_prompts_mode to True so subsequent inference/training will skip label-prepending and look up the stored embeddings instead. Relation labels are supported for relation-extraction models via rel_labels.

When distill=True, the raw (pre-compression) model first generates pseudo-labels over texts; the method then compresses prompt embeddings and fine-tunes the compressed model on those pseudo-labels so quality recovers end-to-end in a single call.

Parameters:
  • texts (List[str]) – List of raw input texts used as contexts for averaging.

  • labels (List[str]) – Entity labels to compress.

  • rel_labels (List[str] | None) – Optional relation labels (relex models only).

  • batch_size (int) – Batch size used while running the model.

  • distill (bool) – If True, generate pseudo-labels with the raw model over texts and fine-tune the compressed model on them.

  • distill_threshold (float) – Confidence threshold for pseudo-label generation.

  • distill_epochs (int) – Number of fine-tuning epochs.

  • distill_lr (float) – Fine-tuning learning rate.

  • distill_batch_size (int | None) – Batch size for fine-tuning (defaults to batch_size).

  • distill_output_dir (str) – Output directory passed to train_model.

  • distill_train_kwargs (Dict[str, Any] | None) – Extra kwargs forwarded to train_model.

class gliner.model.BaseBiEncoderGLiNER(*args, **kwargs)[source]ΒΆ

Bases: BaseEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

resize_embeddings(**kwargs)[source]ΒΆ

Resize token embeddings to match tokenizer vocabulary size.

Parameters:

set_class_token_index – Whether to update the class token index.

encode_labels(labels, batch_size=8)[source]ΒΆ

Compute embeddings for labels using the label encoder.

Parameters:
  • labels (List[str]) – A list of labels to encode.

  • batch_size (int) – Batch size for processing labels.

Returns:

Tensor containing label embeddings with shape (num_labels, hidden_size).

Raises:

NotImplementedError – If the model doesn’t have a label encoder.

Return type:

FloatTensor

batch_predict_with_embeds(texts, labels_embeddings, labels, flat_ner=True, threshold=0.5, multi_label=False, batch_size=8, packing_config=None, input_spans=None, return_class_probs=False)[source]ΒΆ

Predict entities for a batch of texts using pre-computed label embeddings.

Parameters:
  • texts (List[str]) – A list of input texts to predict entities for.

  • labels_embeddings (Tensor) – Pre-computed embeddings for the labels.

  • labels (List[str]) – List of label strings corresponding to the embeddings.

  • flat_ner (bool) – Whether to use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold for predictions. Defaults to 0.5.

  • multi_label (bool) – Whether to allow multiple labels per token. Defaults to False.

  • batch_size (int) – Batch size for processing. Defaults to 8.

  • packing_config (InferencePackingConfig | None) – Configuration describing how to pack encoder inputs. When None the instance-level configuration set via configure_inference_packing is used.

  • input_spans (List[List[Dict]] | None) – Input entity spans to limit predictions to. Each span is a dict with β€˜start’ and β€˜end’ character positions.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

Returns:

List of lists with predicted entities.

Return type:

List[List[Dict[str, Any]]]

predict_with_embeds(text, labels_embeddings, labels, flat_ner=True, threshold=0.5, multi_label=False, return_class_probs=False, **kwargs)[source]ΒΆ

Predict entities for a single text input using pre-computed label embeddings.

Parameters:
  • text – The input text to predict entities for.

  • labels_embeddings – Pre-computed embeddings for the labels.

  • labels – List of label strings corresponding to the embeddings.

  • flat_ner – Whether to use flat NER. Defaults to True.

  • threshold – Confidence threshold for predictions. Defaults to 0.5.

  • multi_label – Whether to allow multiple labels per entity. Defaults to False.

  • return_class_probs – Whether to include class probabilities in output. Defaults to False.

  • **kwargs – Additional arguments passed to batch_predict_with_embeds.

Returns:

List of entity predictions.

class gliner.model.UniEncoderSpanGLiNER(*args, **kwargs)[source]ΒΆ

Bases: BaseEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_classΒΆ

alias of UniEncoderSpanConfig

model_classΒΆ

alias of UniEncoderSpanModel

ort_model_classΒΆ

alias of UniEncoderSpanORTModel

data_processor_classΒΆ

alias of UniEncoderSpanProcessor

data_collator_classΒΆ

alias of UniEncoderSpanDataCollator

decoder_classΒΆ

alias of SpanDecoder

class gliner.model.UniEncoderTokenGLiNER(*args, **kwargs)[source]ΒΆ

Bases: BaseEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_classΒΆ

alias of UniEncoderTokenConfig

model_classΒΆ

alias of UniEncoderTokenModel

ort_model_classΒΆ

alias of UniEncoderTokenORTModel

data_processor_classΒΆ

alias of UniEncoderTokenProcessor

data_collator_classΒΆ

alias of UniEncoderTokenDataCollator

decoder_classΒΆ

alias of TokenDecoder

class gliner.model.BiEncoderSpanGLiNER(*args, **kwargs)[source]ΒΆ

Bases: BaseBiEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_classΒΆ

alias of BiEncoderSpanConfig

model_classΒΆ

alias of BiEncoderSpanModel

ort_model_classΒΆ

alias of BiEncoderSpanORTModel

data_processor_classΒΆ

alias of BiEncoderSpanProcessor

data_collator_classΒΆ

alias of BiEncoderSpanDataCollator

decoder_classΒΆ

alias of SpanDecoder

class gliner.model.BiEncoderTokenGLiNER(*args, **kwargs)[source]ΒΆ

Bases: BaseBiEncoderGLiNER

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_classΒΆ

alias of BiEncoderTokenConfig

model_classΒΆ

alias of BiEncoderTokenModel

ort_model_classΒΆ

alias of BiEncoderTokenORTModel

data_processor_classΒΆ

alias of BiEncoderTokenProcessor

data_collator_classΒΆ

alias of BiEncoderTokenDataCollator

decoder_classΒΆ

alias of TokenDecoder

class gliner.model.UniEncoderSpanDecoderGLiNER(*args, **kwargs)[source]ΒΆ

Bases: BaseEncoderGLiNER

GLiNER model with span-based encoding and label decoding capabilities.

Supports generating textual labels for entities.

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_classΒΆ

alias of UniEncoderSpanDecoderConfig

model_classΒΆ

alias of UniEncoderSpanDecoderModel

ort_model_class: type = NoneΒΆ
data_processor_classΒΆ

alias of UniEncoderSpanDecoderProcessor

data_collator_classΒΆ

alias of UniEncoderSpanDecoderDataCollator

decoder_classΒΆ

alias of SpanGenerativeDecoder

set_labels_trie(labels)[source]ΒΆ

Initialize the labels trie for constrained generation.

Parameters:

labels (List[str]) – Labels that will be used for constrained generation.

Returns:

Trie structure for constrained beam search.

Raises:

NotImplementedError – If the model doesn’t have a decoder.

generate_labels(model_output, **gen_kwargs)[source]ΒΆ

Generate textual class labels for each entity span.

Parameters:
  • model_output – Model output containing decoder_embedding and decoder_embedding_mask.

  • **gen_kwargs – Generation parameters (max_new_tokens, temperature, etc.).

Returns:

List of generated label strings.

run_batch(batch, threshold=0.5, packing_config=None, move_to_device=True, gen_constraints=None, num_gen_sequences=1, **gen_kwargs)[source]ΒΆ

Run model forward pass on a collated batch with label generation.

Parameters:
  • batch (Dict[str, Any]) – Collated batch from collate_batch.

  • threshold (float) – Confidence threshold for predictions.

  • packing_config (InferencePackingConfig | None) – Optional inference packing configuration.

  • move_to_device (bool) – Whether to move tensors to model device.

  • gen_constraints (List[str] | None) – Labels to constrain generation.

  • num_gen_sequences (int) – Number of label sequences to generate per span.

  • **gen_kwargs – Additional generation parameters.

Returns:

Model output with generated labels attached.

Return type:

Any

decode_batch(model_output, batch, threshold=0.5, flat_ner=True, multi_label=False, return_class_probs=False, input_spans=None)[source]ΒΆ

Decode model output into entity predictions with generated labels.

Parameters:
  • model_output (Any) – Output from run_batch (includes gen_labels).

  • batch (Dict[str, Any]) – The collated batch (needs β€˜tokens’ and β€˜id_to_classes’).

  • threshold (float) – Confidence threshold for predictions.

  • flat_ner (bool) – Whether to use flat NER (no overlapping entities).

  • multi_label (bool) – Whether to allow multiple labels per span.

  • return_class_probs (bool) – Whether to include class probabilities.

  • input_spans (List[List[Tuple[int, int]]] | None) – Optional word-level input spans to classify.

Returns:

List of entity lists (one per text in batch).

Return type:

List[List[Any]]

map_entities_to_text(decoded, valid_texts, valid_to_orig_idx, start_token_map, end_token_map, num_original)[source]ΒΆ

Map decoded entities back to character positions with generated labels.

Parameters:
  • decoded (List[List[Any]]) – Decoded entity spans from decode_batch.

  • valid_texts (List[str]) – List of valid (non-empty) texts.

  • valid_to_orig_idx (List[int]) – Mapping from valid indices to original indices.

  • start_token_map (List[List[int]]) – Per-text token-to-char-start mapping.

  • end_token_map (List[List[int]]) – Per-text token-to-char-end mapping.

  • num_original (int) – Total number of original texts.

Returns:

List of entity dicts aligned with original input texts.

Return type:

List[List[Dict[str, Any]]]

inference(texts, labels, flat_ner=True, threshold=0.5, multi_label=False, batch_size=8, gen_constraints=None, num_gen_sequences=1, packing_config=None, input_spans=None, return_class_probs=False, **gen_kwargs)[source]ΒΆ

Predict entities with optional label generation.

Parameters:
  • texts (str | List[str]) – Input texts (string or list of strings).

  • labels (List[str]) – Entity type labels.

  • flat_ner (bool) – Whether to use flat NER.

  • threshold (float) – Confidence threshold.

  • multi_label (bool) – Allow multiple labels per span.

  • batch_size (int) – Batch size for processing.

  • gen_constraints (List[str] | None) – Labels to constrain generation.

  • num_gen_sequences (int) – Number of label sequences to generate per span.

  • packing_config (InferencePackingConfig | None) – Inference packing configuration.

  • input_spans (List[List[Dict]] | None) – Input entity spans to limit predictions to. Each span is a dict with β€˜start’ and β€˜end’ character positions.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

  • **gen_kwargs – Additional generation parameters.

Returns:

List of entity predictions with optional generated labels.

Return type:

List[List[Dict[str, Any]]]

predict_entities(text, labels, flat_ner=True, threshold=0.5, multi_label=False, gen_constraints=None, num_gen_sequences=1, return_class_probs=False, **gen_kwargs)[source]ΒΆ

Predict entities for a single text input with optional label generation.

Parameters:
  • text (str) – The input text to predict entities for.

  • labels (List[str]) – The labels to predict.

  • flat_ner (bool) – Whether to use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold for predictions. Defaults to 0.5.

  • multi_label (bool) – Whether to allow multiple labels per entity. Defaults to False.

  • gen_constraints (List[str] | None) – Labels to constrain generation.

  • num_gen_sequences (int) – Number of label sequences to generate per span.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

  • **gen_kwargs – Additional generation parameters.

Returns:

List of entity predictions as dictionaries.

Return type:

List[Dict[str, Any]]

export_to_onnx(save_dir, onnx_filename='model.onnx', quantized_filename='model_quantized.onnx', quantize=False, opset=19)[source]ΒΆ

ONNX export not supported for encoder-decoder models.

Raises:

NotImplementedError – Always raised as this model type cannot be exported to ONNX

class gliner.model.UniEncoderTokenDecoderGLiNER(*args, **kwargs)[source]ΒΆ

Bases: UniEncoderSpanDecoderGLiNER

GLiNER model with token-based encoding and label decoding capabilities.

Combines token-level BIO tagging with a decoder that generates entity type labels autoregressively.

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_classΒΆ

alias of UniEncoderTokenDecoderConfig

model_classΒΆ

alias of UniEncoderTokenDecoderModel

ort_model_class: type = NoneΒΆ
data_processor_classΒΆ

alias of UniEncoderTokenDecoderProcessor

data_collator_classΒΆ

alias of UniEncoderTokenDecoderDataCollator

decoder_classΒΆ

alias of TokenGenerativeDecoder

class gliner.model.UniEncoderSpanRelexGLiNER(*args, **kwargs)[source]ΒΆ

Bases: BaseEncoderGLiNER

GLiNER model for both entity recognition and relation extraction.

Performs joint entity and relation prediction, allowing the model to simultaneously detect entities and the relationships between them in a single forward pass.

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_classΒΆ

alias of UniEncoderSpanRelexConfig

model_classΒΆ

alias of UniEncoderSpanRelexModel

ort_model_classΒΆ

alias of UniEncoderSpanRelexORTModel

data_processor_classΒΆ

alias of RelationExtractionSpanProcessor

data_collator_classΒΆ

alias of RelationExtractionSpanDataCollator

decoder_classΒΆ

alias of SpanRelexDecoder

set_class_indices()[source]ΒΆ

Set the class token indices for entities and relations in the configuration.

prepare_batch(texts, labels, input_spans=None, relations=None, **kwargs)[source]ΒΆ

Prepare raw inputs for inference including relation types.

Parameters:
  • texts (str | List[str]) – Single text string or list of texts.

  • labels (str | List[str] | List[List[str]]) – Entity labels - string, list of strings, or per-text label lists.

  • input_spans (List[List[Dict]] | None) – Optional pre-defined spans to classify (character positions).

  • relations (str | List[str] | List[List[str]] | None) – Relation type labels - string, list of strings, or per-text label lists.

  • **kwargs – Additional keyword arguments passed to the parent prepare_batch.

Returns:

Dictionary containing prepared inputs plus relation_types.

Return type:

Dict[str, Any]

collate_batch(input_x, entity_types, collator=None, relation_types=None)[source]ΒΆ

Collate prepared inputs into a tensor batch with relation types.

Parameters:
  • input_x (List[Dict[str, Any]]) – List of input dicts from prepare_batch.

  • entity_types (List[str] | List[List[str]]) – Entity type labels.

  • collator (Any | None) – Optional pre-created collator instance.

  • relation_types (List[str] | List[List[str]] | None) – Relation type labels (list or per-text lists).

Returns:

Collated batch dictionary with tensors ready for the model.

Return type:

Dict[str, Any]

create_collator()[source]ΒΆ

Create a data collator instance for relation extraction.

Returns:

Configured data collator instance.

Return type:

Any

run_batch(batch, threshold=0.5, adjacency_threshold=None, packing_config=None, move_to_device=True, **external_inputs)[source]ΒΆ

Run model forward pass on a collated batch.

Parameters:
  • batch (Dict[str, Any]) – Collated batch from collate_batch.

  • threshold (float) – Confidence threshold for predictions.

  • adjacency_threshold (float | None) – Threshold for adjacency matrix reconstruction.

  • packing_config (InferencePackingConfig | None) – Optional inference packing configuration.

  • move_to_device (bool) – Whether to move tensors to model device.

  • **external_inputs – Additional inputs to pass to the model.

Returns:

Model output containing logits and relation information.

Return type:

Any

decode_batch(model_output, batch, threshold=0.5, relation_threshold=None, flat_ner=True, multi_label=False, return_class_probs=False, input_spans=None)[source]ΒΆ

Decode model output into entity and relation predictions.

Parameters:
  • model_output (Any) – Output from run_batch.

  • batch (Dict[str, Any]) – The collated batch.

  • threshold (float) – Confidence threshold for entity predictions.

  • relation_threshold (float | None) – Confidence threshold for relation predictions.

  • flat_ner (bool) – Whether to use flat NER.

  • multi_label (bool) – Whether to allow multiple labels per span.

  • return_class_probs (bool) – Whether to include class probabilities.

  • input_spans (List[List[Tuple[int, int]]] | None) – Optional word-level input spans to classify.

Returns:

Tuple of (entity_outputs, relation_outputs) where each is a list per text.

Return type:

Tuple[List[List[Any]], List[List[Any]]]

map_entities_to_text(decoded, valid_texts, valid_to_orig_idx, start_token_map, end_token_map, num_original)[source]ΒΆ

Map decoded entities back to character positions in original texts.

Parameters:
  • decoded (List[List[Any]]) – Decoded entity spans from decode_batch.

  • valid_texts (List[str]) – List of valid (non-empty) texts.

  • valid_to_orig_idx (List[int]) – Mapping from valid indices to original indices.

  • start_token_map (List[List[int]]) – Per-text token-to-char-start mapping.

  • end_token_map (List[List[int]]) – Per-text token-to-char-end mapping.

  • num_original (int) – Total number of original texts.

Returns:

List of entity dicts aligned with original input texts.

Return type:

List[List[Dict[str, Any]]]

map_relations_to_text(relation_outputs, entity_outputs, valid_texts, valid_to_orig_idx, start_token_map, end_token_map, num_original)[source]ΒΆ

Map relation predictions back to character positions.

Parameters:
  • relation_outputs (List[List[Any]]) – Decoded relations per text.

  • entity_outputs (List[List[Any]]) – Decoded entities per text (for getting span info).

  • valid_texts (List[str]) – List of valid (non-empty) texts.

  • valid_to_orig_idx (List[int]) – Mapping from valid indices to original indices.

  • start_token_map (List[List[int]]) – Per-text token-to-char-start mapping.

  • end_token_map (List[List[int]]) – Per-text token-to-char-end mapping.

  • num_original (int) – Total number of original texts.

Returns:

List of relation dicts aligned with original input texts.

Return type:

List[List[Dict[str, Any]]]

inference(texts, labels, relations=[], flat_ner=True, threshold=0.5, adjacency_threshold=None, relation_threshold=None, multi_label=False, batch_size=8, packing_config=None, input_spans=None, return_relations=True, return_class_probs=False)[source]ΒΆ

Predict entities and relations.

Parameters:
  • texts (str | List[str]) – Input texts (str or List[str]).

  • labels (str | List[str] | List[List[str]]) – Entity type labels - string, list of strings, or per-text label lists.

  • relations (str | List[str] | List[List[str]]) – Relation type labels - string, list of strings, or per-text label lists.

  • flat_ner (bool) – Whether to use flat NER (no nested entities).

  • threshold (float) – Confidence threshold for entities.

  • adjacency_threshold (float | None) – Confidence threshold for adjacency matrix reconstruction (defaults to threshold).

  • relation_threshold (float | None) – Confidence threshold for relations (defaults to threshold).

  • multi_label (bool) – Allow multiple labels per span.

  • batch_size (int) – Batch size for processing.

  • packing_config (InferencePackingConfig | None) – Inference packing configuration.

  • input_spans (List[List[Dict]] | None) – Input entity spans to limit predictions to. Each span is a dict with β€˜start’ and β€˜end’ character positions.

  • return_relations (bool) – Whether to return relation predictions.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

Returns:

Tuple of (entities, relations) if return_relations=True, else just entities.

Return type:

List[List[Dict[str, Any]]] | Tuple[List[List[Dict[str, Any]]], List[List[Dict[str, Any]]]]

predict_entities(text, labels, relations=[], flat_ner=True, threshold=0.5, adjacency_threshold=None, multi_label=False, return_class_probs=False, **kwargs)[source]ΒΆ

Predict entities for a single text input.

Parameters:
  • text (str) – The input text to predict entities for.

  • labels (List[str]) – The entity labels to predict.

  • relations (List[str]) – The relation labels (used for context but entities only returned).

  • flat_ner (bool) – Whether to use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold for predictions. Defaults to 0.5.

  • adjacency_threshold (float | None) – Threshold for adjacency matrix reconstruction. Defaults to threshold.

  • multi_label (bool) – Whether to allow multiple labels per entity. Defaults to False.

  • return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.

  • **kwargs – Additional arguments passed to inference.

Returns:

List of entity predictions as dictionaries.

Return type:

List[Dict[str, Any]]

predict_relations(text, labels, relations, flat_ner=True, threshold=0.5, adjacency_threshold=None, relation_threshold=None, multi_label=False, **kwargs)[source]ΒΆ

Predict entities and relations for a single text input.

Parameters:
  • text (str) – The input text to predict entities and relations for.

  • labels (List[str]) – The entity labels to predict.

  • relations (List[str]) – The relation labels to predict.

  • flat_ner (bool) – Whether to use flat NER. Defaults to True.

  • threshold (float) – Confidence threshold for entities. Defaults to 0.5.

  • adjacency_threshold (float | None) – Threshold for adjacency matrix reconstruction. Defaults to threshold.

  • relation_threshold (float | None) – Confidence threshold for relations. Defaults to threshold.

  • multi_label (bool) – Whether to allow multiple labels per entity. Defaults to False.

  • **kwargs – Additional arguments passed to inference.

Returns:

Tuple of (entities, relations) for the single text.

Return type:

Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]

evaluate(test_data, flat_ner=False, multi_label=False, threshold=0.5, adjacency_threshold=None, relation_threshold=None, batch_size=12, entity_types=None)[source]ΒΆ

Evaluate the model on both NER and relation extraction tasks.

Parameters:
  • test_data (List[Dict[str, Any]]) – The test data containing text, entity, and relation annotations.

  • flat_ner (bool) – Whether to use flat NER. Defaults to False.

  • multi_label (bool) – Whether to use multi-label classification. Defaults to False.

  • threshold (float) – The threshold for entity predictions. Defaults to 0.5.

  • adjacency_threshold (float | None) – Threshold for adjacency matrix reconstruction. Defaults to threshold.

  • relation_threshold (float | None) – The threshold for relation predictions. Defaults to threshold.

  • batch_size (int) – The batch size for evaluation. Defaults to 12.

  • entity_types (List[str] | None) – Optional list of entity types to evaluate. If None, extracts from test data. Defaults to None.

Returns:

  • ner_output: Formatted string with NER P, R, F1

  • ner_f1: NER F1 score

  • rel_output: Formatted string with relation extraction P, R, F1

  • rel_f1: Relation extraction F1 score

Return type:

Tuple of ((ner_output, ner_f1), (rel_output, rel_f1)) containing

class gliner.model.UniEncoderTokenRelexGLiNER(*args, **kwargs)[source]ΒΆ

Bases: UniEncoderSpanRelexGLiNER

GLiNER model for both entity recognition and relation extraction.

Performs joint entity and relation prediction, allowing the model to simultaneously detect entities and the relationships between them in a single forward pass.

Initialize a BaseGLiNER model.

Parameters:
  • config – Model configuration object.

  • model – Pre-initialized model instance. If None, creates a new model.

  • tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.

  • data_processor – Pre-initialized data processor. If None, creates a new processor.

  • backbone_from_pretrained – Whether to load the backbone from pretrained weights.

  • cache_dir – Directory for caching downloaded models.

  • **kwargs – Additional keyword arguments passed to model creation.

config_classΒΆ

alias of UniEncoderTokenRelexConfig

model_classΒΆ

alias of UniEncoderTokenRelexModel

ort_model_classΒΆ

alias of UniEncoderTokenRelexORTModel

data_processor_classΒΆ

alias of RelationExtractionTokenProcessor

data_collator_classΒΆ

alias of RelationExtractionTokenDataCollator

decoder_classΒΆ

alias of TokenRelexDecoder

class gliner.model.GLiNER(*args, **kwargs)[source]ΒΆ

Bases: Module, PyTorchModelHubMixin

Meta GLiNER class that automatically instantiates the appropriate GLiNER variant.

This class provides a unified interface for all GLiNER models, automatically switching to specialized model types based on the model configuration. It supports various NER architectures including uni-encoder, bi-encoder, decoder-based, and relation extraction models.

The class automatically detects the model type based on:
  • span_mode: Token-level vs span-level

  • labels_encoder: Uni-encoder vs bi-encoder

  • labels_decoder: Standard vs decoder-based

  • relations_layer: NER-only vs joint entity-relation extraction

modelΒΆ

The loaded GLiNER model instance (automatically typed).

configΒΆ

Model configuration.

data_processorΒΆ

Data processor for the model.

decoderΒΆ

Decoder for predictions.

Examples

Load a pretrained uni-encoder span model: >>> model = GLiNER.from_pretrained(β€œurchade/gliner_small-v2.1”)

Load a bi-encoder model: >>> model = GLiNER.from_pretrained(β€œknowledgator/gliner-bi-small-v1.0”)

Load from local configuration: >>> config = GLiNERConfig.from_pretrained(β€œconfig.json”) >>> model = GLiNER.from_config(config)

Initialize from scratch: >>> config = GLiNERConfig(model_name=”microsoft/deberta-v3-small”) >>> model = GLiNER(config)

Initialize a GLiNER model with automatic type detection.

This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.

Parameters:
  • config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • **kwargs – Additional arguments passed to the specific GLiNER variant.

Examples

>>> config = GLiNERConfig(model_name="bert-base-cased")
>>> model = GLiNER(config)
>>> model = GLiNER("path/to/gliner_config.json")
__init__(config, **kwargs)[source]ΒΆ

Initialize a GLiNER model with automatic type detection.

This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.

Parameters:
  • config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • **kwargs – Additional arguments passed to the specific GLiNER variant.

Examples

>>> config = GLiNERConfig(model_name="bert-base-cased")
>>> model = GLiNER(config)
>>> model = GLiNER("path/to/gliner_config.json")
classmethod from_pretrained(model_id, revision=None, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, token=None, map_location='cpu', strict=False, load_tokenizer=None, resize_token_embeddings=True, compile_torch_model=False, quantize=None, dtype=None, low_cpu_mem_usage=False, variant=None, load_onnx_model=False, onnx_model_file='model.onnx', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]ΒΆ

Load a pretrained GLiNER model with automatic type detection.

This method loads the configuration, determines the appropriate GLiNER variant, and delegates to that variant’s from_pretrained method.

Parameters:
  • model_id (str) – Model identifier or local path.

  • revision (str | None) – Model revision.

  • cache_dir (str | Path | None) – Cache directory.

  • force_download (bool) – Force redownload.

  • proxies (dict | None) – Proxy configuration.

  • resume_download (bool) – Resume interrupted downloads.

  • local_files_only (bool) – Only use local files.

  • token (str | bool | None) – HF token for private repos.

  • map_location (str) – Device to map model to.

  • strict (bool) – Enforce strict state_dict loading.

  • load_tokenizer (bool | None) – Whether to load tokenizer.

  • resize_token_embeddings (bool | None) – Whether to resize embeddings.

  • compile_torch_model (bool | None) – Whether to compile with torch.compile.

  • quantize (str | None) – Only "int8" is accepted (int8 dynamic quantization: torchao on GPU, FBGEMM on CPU). For precision-only changes (fp16/bf16), use dtype=. None to disable.

  • dtype (str | dtype | None) – Target floating-point dtype for the loaded weights (e.g. torch.bfloat16, "bf16", "fp16"). When set, weights are cast during the state-dict read so the fp32 copy is never fully materialized; prefer this over quantize for plain precision changes.

  • low_cpu_mem_usage (bool) – If True, build the model under torch.device("meta") and use load_state_dict(assign=True), skipping the random-init compute and the fp32 shell allocation. See the base-class docstring for the full contract.

  • variant (str | None) – "fp16" / "bf16" to prefer model.{variant}.safetensors over the default fp32 file. Best-effort with graceful fallback: if the publisher uploaded the variant, only that file is fetched; if not, warns and falls back to fp32 + cast on read. See the base-class from_pretrained docstring for the full contract. None (default) preserves prior behavior.

  • load_onnx_model (bool | None) – Whether to load ONNX model instead of PyTorch.

  • onnx_model_file (str | None) – Path to ONNX model file.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Appropriate GLiNER model instance.

Examples

>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1")
>>> model = GLiNER.from_pretrained("knowledgator/gliner-bi-small-v1.0")
>>> model = GLiNER.from_pretrained("path/to/local/model", quantize="int8")
>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1", dtype="bf16")
>>> # If the repo publishes model.bf16.safetensors, download only that:
>>> model = GLiNER.from_pretrained("org/gliner_bf16-v1", variant="bf16")
classmethod from_config(config, cache_dir=None, load_tokenizer=True, resize_token_embeddings=True, backbone_from_pretrained=True, compile_torch_model=False, quantize=None, map_location='cpu', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]ΒΆ

Create a GLiNER model from configuration.

Parameters:
  • config (GLiNERConfig | str | Path | dict) – Model configuration (GLiNERConfig object, path to config file, or dict).

  • cache_dir (str | Path | None) – Cache directory for downloads.

  • load_tokenizer (bool) – Whether to load tokenizer.

  • resize_token_embeddings (bool) – Whether to resize token embeddings.

  • backbone_from_pretrained (bool) – Whether to load the backbone encoder from pretrained weights.

  • compile_torch_model (bool) – Whether to compile with torch.compile.

  • quantize (str | None) – Only "int8" is accepted (int8 dynamic quantization: torchao on GPU, FBGEMM on CPU). For precision-only changes (fp16/bf16), use dtype=. None to disable.

  • map_location (str) – Device to map model to.

  • max_length (int | None) – Override max_length in config.

  • max_width (int | None) – Override max_width in config.

  • post_fusion_schema (str | None) – Override post_fusion_schema in config.

  • _attn_implementation (str | None) – Override attention implementation.

  • **model_kwargs – Additional model initialization arguments.

Returns:

Initialized GLiNER model instance.

Examples

>>> config = GLiNERConfig(model_name="microsoft/deberta-v3-small")
>>> model = GLiNER.from_config(config)
>>> model = GLiNER.from_config("path/to/gliner_config.json")
property model_map: dict[str, dict[str, Any]]ΒΆ

Map configuration patterns to their corresponding GLiNER classes.

Returns:

Dictionary mapping model types to their classes and descriptions.

get_model_type()[source]ΒΆ

Get the type of the current model instance.

Returns:

String identifier of the model type

Return type:

str