gliner.model module¶
- class gliner.model.BaseGLiNER(*args, **kwargs)[source]¶
Bases:
ABC,Module,PyTorchModelHubMixinInitialize a BaseGLiNER model.
- Parameters:
config (BaseGLiNERConfig) – Model configuration object.
model (BaseModel | None) – Pre-initialized model instance. If None, creates a new model.
tokenizer (BaseModel | None) – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor (BaseProcessor | None) – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained (bool | None) – Whether to load the backbone from pretrained weights.
cache_dir (str | Path | None) – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- config_class: type = None¶
- model_class: type = None¶
- ort_model_class: type = None¶
- data_processor_class: type = None¶
- data_collator_class: type = None¶
- decoder_class: type = None¶
- __init__(config, model=None, tokenizer=None, data_processor=None, backbone_from_pretrained=False, cache_dir=None, **kwargs)[source]¶
Initialize a BaseGLiNER model.
- Parameters:
config (BaseGLiNERConfig) – Model configuration object.
model (BaseModel | None) – Pre-initialized model instance. If None, creates a new model.
tokenizer (BaseModel | None) – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor (BaseProcessor | None) – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained (bool | None) – Whether to load the backbone from pretrained weights.
cache_dir (str | Path | None) – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- forward(*args, **kwargs)[source]¶
Forward pass through the model.
- Parameters:
*args – Positional arguments passed to the model.
**kwargs – Keyword arguments passed to the model.
- Returns:
Model output from the forward pass.
- property device¶
Get the device where the model is located.
- Returns:
Torch device object (CPU or CUDA).
- configure_inference_packing(config)[source]¶
Configure default packing behavior for inference calls.
Passing
Nonedisables packing by default. Individual inference methods accept apacking_configargument to override this setting on a per-call basis.- Parameters:
config (InferencePackingConfig | None) – Inference packing configuration or None to disable packing.
- compile()[source]¶
Compile the model using torch.compile for optimization.
Uses
dynamic=Trueto generate shape-generic kernels, which avoids recompilation on variable-length NER inputs. Also enablescapture_scalar_outputsto trace through data-dependent shape operations (e.g., computing max number of entity types per batch).Best combined with
quantize()for maximum throughput (~1.9x over fp32).When FlashDeBERTa is active, its custom Triton kernels are incompatible with torch.compile tracing. The encoder forward is automatically wrapped with
torch.compiler.disableso the rest of the model (span representation, scoring, etc.) still benefits from compilation.
- quantize(dtype='fp16')[source]¶
Apply quantization to the model.
- Parameters:
dtype (str) –
Quantization type. Options: -
"fp16"(default): float16 half-precision. On GPU, uses Tensor Coreacceleration for ~1.4x speedup. On CPU, applies dynamic quantization (reduces memory, no speed benefit).
"bf16": bfloat16 half-precision. Better numerical stability than fp16 with slightly less speedup (~1.2x)."int8": int8 quantization (GPU and CPU). On CPU, uses PyTorch’s built-in dynamic quantization with FBGEMM int8 kernels (~1.6x speedup). On GPU, usestorchaoint8 weight-only quantization (~50% memory reduction, no speed gain; requires thetorchaopackage). Stock DeBERTa-based models lose accuracy with int8; use this with models that have been fine-tuned with quantization-aware training (QAT).
- Raises:
RuntimeError – If the model is an ONNX model (use ONNX quantization instead).
ValueError – If dtype is not a recognized quantization type.
ImportError – If
torchaois not installed and int8 on GPU is requested.
Examples
>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1", map_location="cuda") >>> model.quantize() # fp16 half-precision on GPU — ~1.4x faster >>> model.quantize("bf16") # bfloat16 on GPU — ~1.2x faster >>> model.quantize("int8") # int8 quantization (torchao on GPU, FBGEMM on CPU)
- prepare_state_dict(state_dict)[source]¶
Prepare state dict for saving, handling torch.compile artifacts.
- Parameters:
state_dict – Original state dictionary from the model.
- Returns:
Cleaned state dictionary with torch.compile prefixes removed.
- save_pretrained(save_directory, *, config=None, repo_id=None, push_to_hub=False, safe_serialization=False, **push_to_hub_kwargs)[source]¶
Save model weights and configuration to local directory.
- Parameters:
save_directory (str | Path) – Path to directory for saving.
config (BaseGLiNERConfig | None) – Model configuration. Uses self.config if None.
repo_id (str | None) – Repository ID for hub upload.
push_to_hub (bool) – Whether to push to HuggingFace Hub.
safe_serialization (bool) – Whether to use safetensors format.
**push_to_hub_kwargs – Additional arguments for push_to_hub.
- Returns:
Repository URL if pushed to hub, None otherwise.
- Return type:
str | None
- classmethod load_from_config(config, cache_dir=None, load_tokenizer=True, resize_token_embeddings=True, backbone_from_pretrained=True, compile_torch_model=False, quantize=False, map_location='cpu', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]¶
Initialize a model from configuration without loading pretrained weights.
This method creates a new model instance from scratch using the provided configuration. The backbone encoder can optionally be loaded from pretrained weights, but the GLiNER-specific layers are always randomly initialized.
- Parameters:
config (str | Path | GLiNERConfig | dict) – Model configuration (GLiNERConfig object, path to config file, or dict).
cache_dir (str | Path | None) – Cache directory for downloads.
load_tokenizer (bool) – Whether to load tokenizer.
resize_token_embeddings (bool) – Whether to resize token embeddings.
backbone_from_pretrained (bool) – Whether to load the backbone encoder from pretrained weights.
compile_torch_model (bool) – Whether to compile with torch.compile.
quantize (bool | str) – Quantization dtype.
Trueor"fp16"for float16,"bf16"for bfloat16,"int8"for int8 dynamic quantization (requirestorchao).Falseto disable.map_location (str) – Device to map model to.
max_length (int | None) – Override max_length in config.
max_width (int | None) – Override max_width in config.
post_fusion_schema (str | None) – Override post_fusion_schema in config.
_attn_implementation (str | None) – Override attention implementation.
**model_kwargs – Additional model initialization arguments.
- Returns:
Initialized model instance with randomly initialized weights (except backbone if specified).
Examples
>>> config = GLiNERConfig(model_name="microsoft/deberta-v3-small") >>> model = GLiNER.load_from_config(config) >>> model = GLiNER.load_from_config("path/to/gliner_config.json") >>> # Load with pretrained backbone but random GLiNER layers >>> model = GLiNER.load_from_config(config, backbone_from_pretrained=True)
- classmethod from_pretrained(model_id, model_dir=None, revision=None, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, token=None, map_location='cpu', strict=False, load_tokenizer=None, resize_token_embeddings=True, compile_torch_model=False, quantize=False, load_onnx_model=False, onnx_model_file='model.onnx', session_options=None, max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]¶
Load pretrained model from HuggingFace Hub or local directory.
- Parameters:
model_id (str) – Model identifier or local path.
model_dir (str | None) – Override model directory path.
revision (str | None) – Model revision.
cache_dir (str | Path | None) – Cache directory.
force_download (bool) – Force redownload.
proxies (dict | None) – Proxy configuration.
resume_download (bool) – Resume interrupted downloads.
local_files_only (bool) – Only use local files.
token (str | bool | None) – HF token for private repos.
map_location (str) – Device to map model to.
strict (bool) – Enforce strict state_dict loading.
load_tokenizer (bool | None) – Whether to load tokenizer.
resize_token_embeddings (bool | None) – Whether to resize embeddings.
compile_torch_model (bool | None) – Whether to compile with torch.compile.
quantize (bool | str) – Quantization dtype.
Trueor"fp16"for float16,"bf16"for bfloat16,"int8"for int8 dynamic quantization (requirestorchao).Falseto disable.load_onnx_model (bool | None) – Whether to load ONNX model instead of PyTorch.
onnx_model_file (str | None) – Path to ONNX model file.
session_options – ONNX runtime session options.
max_length (int | None) – Override max_length in config.
max_width (int | None) – Override max_width in config.
post_fusion_schema (str | None) – Override post_fusion_schema in config.
_attn_implementation (str | None) – Override attention implementation.
**model_kwargs – Additional model initialization arguments.
- Returns:
Loaded model instance.
- export_to_onnx(save_dir, onnx_filename='model.onnx', quantized_filename='model_quantized.onnx', quantize=False, opset=19, **export_kwargs)[source]¶
Unified ONNX export method using specifications from child classes.
- Parameters:
save_dir (str | Path) – Directory to save ONNX files.
onnx_filename (str) – Name of the ONNX model file.
quantized_filename (str) – Name of the quantized model file.
quantize (bool) – Whether to create a quantized version.
opset (int) – ONNX opset version.
**export_kwargs – Additional export arguments (model-specific).
- Returns:
onnx_path: Path to standard ONNX model
quantized_path: Path to quantized model (if quantize=True)
- Return type:
Dictionary with paths to exported models
- freeze_component(component_name)[source]¶
Freeze a specific component of the model.
- Parameters:
component_name (str) – Name of component to freeze (e.g., ‘text_encoder’, ‘labels_encoder’, ‘decoder’)
- unfreeze_component(component_name)[source]¶
Unfreeze a specific component of the model.
- Parameters:
component_name (str) – Name of component to unfreeze
- classmethod create_training_args(output_dir, learning_rate=5e-05, weight_decay=0.01, others_lr=None, others_weight_decay=None, focal_loss_alpha=-1, focal_loss_gamma=0.0, rel_focal_loss_alpha=None, rel_focal_loss_gamma=None, focal_loss_prob_margin=0.0, loss_reduction='sum', negatives=1.0, masking='none', lr_scheduler_type='linear', warmup_ratio=0.1, per_device_train_batch_size=8, per_device_eval_batch_size=8, max_grad_norm=1.0, max_steps=10000, save_steps=1000, save_total_limit=10, logging_steps=10, use_cpu=False, bf16=False, dataloader_num_workers=1, report_to='none', **kwargs)[source]¶
Create training arguments with sensible defaults.
- Parameters:
output_dir (str | Path) – Directory to save model checkpoints.
learning_rate (float) – Learning rate for main parameters.
weight_decay (float) – Weight decay for main parameters.
others_lr (float | None) – Learning rate for other parameters.
others_weight_decay (float | None) – Weight decay for other parameters.
focal_loss_alpha (float) – Alpha for focal loss.
focal_loss_gamma (float) – Gamma for focal loss.
focal_loss_prob_margin (float) – Probability margin for focal loss.
loss_reduction (str) – Loss reduction method.
negatives (float) – Negative sampling ratio.
masking (str) – Masking strategy.
lr_scheduler_type (str) – Learning rate scheduler type.
warmup_ratio (float) – Warmup ratio.
per_device_train_batch_size (int) – Training batch size.
per_device_eval_batch_size (int) – Evaluation batch size.
max_grad_norm (float) – Maximum gradient norm.
max_steps (int) – Maximum training steps.
save_steps (int) – Save checkpoint every N steps.
save_total_limit (int) – Maximum number of checkpoints to keep.
logging_steps (int) – Log every N steps.
use_cpu (bool) – Whether to use CPU.
bf16 (bool) – Whether to use bfloat16.
dataloader_num_workers (int) – Number of dataloader workers.
report_to (str) – Where to report metrics.
**kwargs – Additional training arguments.
- Returns:
TrainingArguments instance.
- Return type:
- train_model(train_dataset, eval_dataset, training_args=None, freeze_components=None, compile_model=False, output_dir=None, **training_kwargs)[source]¶
Train the model.
- Parameters:
train_dataset – Training dataset.
eval_dataset – Evaluation dataset.
training_args (TrainingArguments | None) – Training arguments (created with defaults if None).
freeze_components (list[str] | None) – List of component names to freeze (e.g., [‘text_encoder’, ‘decoder’]).
compile_model (bool) – Whether to compile model with torch.compile.
output_dir (str | Path | None) – Output directory (required if training_args is None).
**training_kwargs – Additional kwargs for creating training args.
- Returns:
Trained Trainer instance.
- Return type:
- class gliner.model.BaseEncoderGLiNER(*args, **kwargs)[source]¶
Bases:
BaseGLiNERInitialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- set_class_indices()[source]¶
Set the class token index in the configuration based on tokenizer vocabulary.
- resize_embeddings(set_class_token_index=True)[source]¶
Resize token embeddings to match tokenizer vocabulary size.
- Parameters:
set_class_token_index – Whether to update the class token index.
- prepare_inputs(texts)[source]¶
Prepare inputs for the model by tokenizing and creating index mappings.
- Parameters:
texts (List[str]) – The input texts to process.
- Returns:
all_tokens: List of tokenized texts
all_start_token_idx_to_text_idx: Start position mappings
all_end_token_idx_to_text_idx: End position mappings
- Return type:
Tuple containing
- prepare_base_input(all_tokens)[source]¶
Prepare base input format for data collation.
- Parameters:
all_tokens (List[List[str]]) – List of tokenized texts.
- Returns:
List of input dictionaries ready for collation.
- Return type:
List[Dict[str, Any]]
- inference(texts, labels, flat_ner=True, threshold=0.5, multi_label=False, batch_size=8, packing_config=None, input_spans=None, return_class_probs=False, **external_inputs)[source]¶
Predict entities for a batch of texts.
- Parameters:
texts (str | List[str]) – A list of input texts to predict entities for or a single text string.
labels (List[str]) – A list of labels to predict.
flat_ner (bool) – Whether to use flat NER. Defaults to True.
threshold (float) – Confidence threshold for predictions. Defaults to 0.5.
multi_label (bool) – Whether to allow multiple labels per token. Defaults to False.
batch_size (int) – Batch size for processing. Defaults to 8.
packing_config (InferencePackingConfig | None) – Configuration describing how to pack encoder inputs. When None the instance-level configuration set via configure_inference_packing is used.
input_spans (List[List[Dict]]) – Input entity spans that should be classified by the model.
return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.
**external_inputs – Additional inputs to pass to the model.
- Returns:
start: Start character position
end: End character position
text: Entity text
label: Entity type
score: Confidence score
class_probs: (optional) Dictionary mapping class names to probabilities (top 5)
- Return type:
List of lists with predicted entities, where each entity is a dictionary containing
- predict_entities(text, labels, flat_ner=True, threshold=0.5, multi_label=False, return_class_probs=False, **kwargs)[source]¶
Predict entities for a single text input.
- Parameters:
text (str) – The input text to predict entities for.
labels (List[str]) – The labels to predict.
flat_ner (bool) – Whether to use flat NER. Defaults to True.
threshold (float) – Confidence threshold for predictions. Defaults to 0.5.
multi_label (bool) – Whether to allow multiple labels per entity. Defaults to False.
return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.
**kwargs – Additional arguments passed to inference.
- Returns:
List of entity predictions as dictionaries.
- Return type:
List[Dict[str, Any]]
- batch_predict_entities(texts, labels, flat_ner=True, threshold=0.5, multi_label=False, **kwargs)[source]¶
Predict entities for multiple texts.
DEPRECATED: Use inference instead.
This method will be removed in a future release. It now forwards to GLiNER.inference(…) to perform inference.
- Parameters:
texts (List[str]) – Input texts.
labels (List[str]) – Labels to predict.
flat_ner (bool) – Use flat NER. Defaults to True.
threshold (float) – Confidence threshold. Defaults to 0.5.
multi_label (bool) – Allow multiple labels per token/entity. Defaults to False.
**kwargs – Extra arguments forwarded to inference (e.g., batch_size).
- Returns:
List of entity predictions for each text.
- Return type:
List[List[Dict[str, Any]]]
- evaluate(test_data, flat_ner=False, multi_label=False, threshold=0.5, batch_size=12)[source]¶
Evaluate the model on a given test dataset.
- Parameters:
test_data (List[Dict[str, Any]]) – The test data containing text and entity annotations.
flat_ner (bool) – Whether to use flat NER. Defaults to False.
multi_label (bool) – Whether to use multi-label classification. Defaults to False.
threshold (float) – The threshold for predictions. Defaults to 0.5.
batch_size (int) – The batch size for evaluation. Defaults to 12.
- Returns:
Tuple containing the evaluation output and the F1 score.
- Return type:
Tuple[Any, float]
- class gliner.model.BaseBiEncoderGLiNER(*args, **kwargs)[source]¶
Bases:
BaseEncoderGLiNERInitialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- resize_embeddings(**kwargs)[source]¶
Resize token embeddings to match tokenizer vocabulary size.
- Parameters:
set_class_token_index – Whether to update the class token index.
- encode_labels(labels, batch_size=8)[source]¶
Compute embeddings for labels using the label encoder.
- Parameters:
labels (List[str]) – A list of labels to encode.
batch_size (int) – Batch size for processing labels.
- Returns:
Tensor containing label embeddings with shape (num_labels, hidden_size).
- Raises:
NotImplementedError – If the model doesn’t have a label encoder.
- Return type:
FloatTensor
- batch_predict_with_embeds(texts, labels_embeddings, labels, flat_ner=True, threshold=0.5, multi_label=False, batch_size=8, packing_config=None, input_spans=None, return_class_probs=False)[source]¶
Predict entities for a batch of texts using pre-computed label embeddings.
- Parameters:
texts (List[str]) – A list of input texts to predict entities for.
labels_embeddings (Tensor) – Pre-computed embeddings for the labels.
labels (List[str]) – List of label strings corresponding to the embeddings.
flat_ner (bool) – Whether to use flat NER. Defaults to True.
threshold (float) – Confidence threshold for predictions. Defaults to 0.5.
multi_label (bool) – Whether to allow multiple labels per token. Defaults to False.
batch_size (int) – Batch size for processing. Defaults to 8.
packing_config (InferencePackingConfig | None) – Configuration describing how to pack encoder inputs. When None the instance-level configuration set via configure_inference_packing is used.
input_spans (List[List[Dict]]) – Input entity spans to limit predictions to. Each span is a dict with ‘start’ and ‘end’ character positions.
return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.
- Returns:
List of lists with predicted entities.
- Return type:
List[List[Dict[str, Any]]]
- predict_with_embeds(text, labels_embeddings, labels, flat_ner=True, threshold=0.5, multi_label=False, return_class_probs=False, **kwargs)[source]¶
Predict entities for a single text input using pre-computed label embeddings.
- Parameters:
text – The input text to predict entities for.
labels_embeddings – Pre-computed embeddings for the labels.
labels – List of label strings corresponding to the embeddings.
flat_ner – Whether to use flat NER. Defaults to True.
threshold – Confidence threshold for predictions. Defaults to 0.5.
multi_label – Whether to allow multiple labels per entity. Defaults to False.
return_class_probs – Whether to include class probabilities in output. Defaults to False.
**kwargs – Additional arguments passed to batch_predict_with_embeds.
- Returns:
List of entity predictions.
- class gliner.model.UniEncoderSpanGLiNER(*args, **kwargs)[source]¶
Bases:
BaseEncoderGLiNERInitialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- config_class¶
alias of
UniEncoderSpanConfig
- model_class¶
alias of
UniEncoderSpanModel
- ort_model_class¶
alias of
UniEncoderSpanORTModel
- data_processor_class¶
alias of
UniEncoderSpanProcessor
- data_collator_class¶
alias of
UniEncoderSpanDataCollator
- decoder_class¶
alias of
SpanDecoder
- class gliner.model.UniEncoderTokenGLiNER(*args, **kwargs)[source]¶
Bases:
BaseEncoderGLiNERInitialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- config_class¶
alias of
UniEncoderTokenConfig
- model_class¶
alias of
UniEncoderTokenModel
- ort_model_class¶
alias of
UniEncoderTokenORTModel
- data_processor_class¶
alias of
UniEncoderTokenProcessor
- data_collator_class¶
alias of
UniEncoderTokenDataCollator
- decoder_class¶
alias of
TokenDecoder
- class gliner.model.BiEncoderSpanGLiNER(*args, **kwargs)[source]¶
Bases:
BaseBiEncoderGLiNERInitialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- config_class¶
alias of
BiEncoderSpanConfig
- model_class¶
alias of
BiEncoderSpanModel
- ort_model_class¶
alias of
BiEncoderSpanORTModel
- data_processor_class¶
alias of
BiEncoderSpanProcessor
- data_collator_class¶
alias of
BiEncoderSpanDataCollator
- decoder_class¶
alias of
SpanDecoder
- class gliner.model.BiEncoderTokenGLiNER(*args, **kwargs)[source]¶
Bases:
BaseBiEncoderGLiNERInitialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- config_class¶
alias of
BiEncoderTokenConfig
- model_class¶
alias of
BiEncoderTokenModel
- ort_model_class¶
alias of
BiEncoderTokenORTModel
- data_processor_class¶
alias of
BiEncoderTokenProcessor
- data_collator_class¶
alias of
BiEncoderTokenDataCollator
- decoder_class¶
alias of
TokenDecoder
- class gliner.model.UniEncoderSpanDecoderGLiNER(*args, **kwargs)[source]¶
Bases:
BaseEncoderGLiNERGLiNER model with span-based encoding and label decoding capabilities.
Supports generating textual labels for entities.
Initialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- config_class¶
alias of
UniEncoderSpanDecoderConfig
- model_class¶
alias of
UniEncoderSpanDecoderModel
- ort_model_class: type = None¶
- data_processor_class¶
alias of
UniEncoderSpanDecoderProcessor
- data_collator_class¶
alias of
UniEncoderSpanDecoderDataCollator
- decoder_class¶
alias of
SpanGenerativeDecoder
- set_labels_trie(labels)[source]¶
Initialize the labels trie for constrained generation.
- Parameters:
labels (List[str]) – Labels that will be used for constrained generation.
- Returns:
Trie structure for constrained beam search.
- Raises:
NotImplementedError – If the model doesn’t have a decoder.
- generate_labels(model_output, **gen_kwargs)[source]¶
Generate textual class labels for each entity span.
- Parameters:
model_output – Model output containing decoder_embedding and decoder_embedding_mask.
**gen_kwargs – Generation parameters (max_new_tokens, temperature, etc.).
- Returns:
List of generated label strings.
- inference(texts, labels, flat_ner=True, threshold=0.5, multi_label=False, batch_size=8, gen_constraints=None, num_gen_sequences=1, packing_config=None, input_spans=None, return_class_probs=False, **gen_kwargs)[source]¶
Predict entities with optional label generation.
- Parameters:
texts (str | List[str]) – Input texts (string or list of strings).
labels (List[str]) – Entity type labels.
flat_ner (bool) – Whether to use flat NER.
threshold (float) – Confidence threshold.
multi_label (bool) – Allow multiple labels per span.
batch_size (int) – Batch size for processing.
gen_constraints (List[str] | None) – Labels to constrain generation.
num_gen_sequences (int) – Number of label sequences to generate per span.
packing_config (InferencePackingConfig | None) – Inference packing configuration.
input_spans (List[List[Dict]]) – Input entity spans to limit predictions to. Each span is a dict with ‘start’ and ‘end’ character positions.
return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.
**gen_kwargs – Additional generation parameters.
- Returns:
List of entity predictions with optional generated labels.
- Return type:
List[List[Dict[str, Any]]]
- class gliner.model.UniEncoderTokenDecoderGLiNER(*args, **kwargs)[source]¶
Bases:
UniEncoderSpanDecoderGLiNERGLiNER model with token-based encoding and label decoding capabilities.
Combines token-level BIO tagging with a decoder that generates entity type labels autoregressively.
Initialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- config_class¶
alias of
UniEncoderTokenDecoderConfig
- model_class¶
alias of
UniEncoderTokenDecoderModel
- ort_model_class: type = None¶
- data_processor_class¶
alias of
UniEncoderTokenDecoderProcessor
- data_collator_class¶
alias of
UniEncoderTokenDecoderDataCollator
- decoder_class¶
alias of
TokenGenerativeDecoder
- class gliner.model.UniEncoderSpanRelexGLiNER(*args, **kwargs)[source]¶
Bases:
BaseEncoderGLiNERGLiNER model for both entity recognition and relation extraction.
Performs joint entity and relation prediction, allowing the model to simultaneously detect entities and the relationships between them in a single forward pass.
Initialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- config_class¶
alias of
UniEncoderSpanRelexConfig
- model_class¶
alias of
UniEncoderSpanRelexModel
- ort_model_class¶
alias of
UniEncoderSpanRelexORTModel
- data_processor_class¶
alias of
RelationExtractionSpanProcessor
- data_collator_class¶
alias of
RelationExtractionSpanDataCollator
- decoder_class¶
alias of
SpanRelexDecoder
- set_class_indices()[source]¶
Set the class token indices for entities and relations in the configuration.
- inference(texts, labels, relations=[], flat_ner=True, threshold=0.5, adjacency_threshold=None, relation_threshold=None, multi_label=False, batch_size=8, packing_config=None, input_spans=None, return_relations=True, return_class_probs=False)[source]¶
Predict entities and relations.
- Parameters:
texts (str | List[str]) – Input texts (str or List[str]).
labels (List[str]) – Entity type labels (List[str]).
relations (List[str]) – Relation type labels (List[str]).
flat_ner (bool) – Whether to use flat NER (no nested entities).
threshold (float) – Confidence threshold for entities.
adjacency_threshold (float | None) – Confidence threshold for adjacency matrix reconstruction (defaults to threshold).
relation_threshold (float | None) – Confidence threshold for relations (defaults to threshold).
multi_label (bool) – Allow multiple labels per span.
batch_size (int) – Batch size for processing.
packing_config (InferencePackingConfig | None) – Inference packing configuration.
input_spans (List[List[Dict]]) – Input entity spans to limit predictions to. Each span is a dict with ‘start’ and ‘end’ character positions.
return_relations (bool) – Whether to return relation predictions.
return_class_probs (bool) – Whether to include class probabilities in output. Defaults to False.
- Returns:
Tuple of (entities, relations) if return_relations=True, else just entities.
- Return type:
List[List[Dict[str, Any]]] | Tuple[List[List[Dict[str, Any]]], List[List[Dict[str, Any]]]]
- evaluate(test_data, flat_ner=False, multi_label=False, threshold=0.5, adjacency_threshold=None, relation_threshold=None, batch_size=12)[source]¶
Evaluate the model on both NER and relation extraction tasks.
- Parameters:
test_data (List[Dict[str, Any]]) – The test data containing text, entity, and relation annotations.
flat_ner (bool) – Whether to use flat NER. Defaults to False.
multi_label (bool) – Whether to use multi-label classification. Defaults to False.
threshold (float) – The threshold for entity predictions. Defaults to 0.5.
adjacency_threshold (float | None) – Threshold for adjacency matrix reconstruction. Defaults to threshold.
relation_threshold (float | None) – The threshold for relation predictions. Defaults to threshold.
batch_size (int) – The batch size for evaluation. Defaults to 12.
- Returns:
ner_output: Formatted string with NER P, R, F1
ner_f1: NER F1 score
rel_output: Formatted string with relation extraction P, R, F1
rel_f1: Relation extraction F1 score
- Return type:
Tuple of ((ner_output, ner_f1), (rel_output, rel_f1)) containing
- class gliner.model.UniEncoderTokenRelexGLiNER(*args, **kwargs)[source]¶
Bases:
UniEncoderSpanRelexGLiNERGLiNER model for both entity recognition and relation extraction.
Performs joint entity and relation prediction, allowing the model to simultaneously detect entities and the relationships between them in a single forward pass.
Initialize a BaseGLiNER model.
- Parameters:
config – Model configuration object.
model – Pre-initialized model instance. If None, creates a new model.
tokenizer – Pre-initialized tokenizer. If None, creates a new tokenizer.
data_processor – Pre-initialized data processor. If None, creates a new processor.
backbone_from_pretrained – Whether to load the backbone from pretrained weights.
cache_dir – Directory for caching downloaded models.
**kwargs – Additional keyword arguments passed to model creation.
- config_class¶
alias of
UniEncoderTokenRelexConfig
- model_class¶
alias of
UniEncoderTokenRelexModel
- ort_model_class¶
alias of
UniEncoderTokenRelexORTModel
- data_processor_class¶
alias of
RelationExtractionTokenProcessor
- data_collator_class¶
alias of
RelationExtractionTokenDataCollator
- decoder_class¶
alias of
TokenRelexDecoder
- class gliner.model.GLiNER(*args, **kwargs)[source]¶
Bases:
Module,PyTorchModelHubMixinMeta GLiNER class that automatically instantiates the appropriate GLiNER variant.
This class provides a unified interface for all GLiNER models, automatically switching to specialized model types based on the model configuration. It supports various NER architectures including uni-encoder, bi-encoder, decoder-based, and relation extraction models.
- The class automatically detects the model type based on:
span_mode: Token-level vs span-level
labels_encoder: Uni-encoder vs bi-encoder
labels_decoder: Standard vs decoder-based
relations_layer: NER-only vs joint entity-relation extraction
- model¶
The loaded GLiNER model instance (automatically typed).
- config¶
Model configuration.
- data_processor¶
Data processor for the model.
- decoder¶
Decoder for predictions.
Examples
Load a pretrained uni-encoder span model: >>> model = GLiNER.from_pretrained(“urchade/gliner_small-v2.1”)
Load a bi-encoder model: >>> model = GLiNER.from_pretrained(“knowledgator/gliner-bi-small-v1.0”)
Load from local configuration: >>> config = GLiNERConfig.from_pretrained(“config.json”) >>> model = GLiNER.from_config(config)
Initialize from scratch: >>> config = GLiNERConfig(model_name=”microsoft/deberta-v3-small”) >>> model = GLiNER(config)
Initialize a GLiNER model with automatic type detection.
This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.
- Parameters:
config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).
**kwargs – Additional arguments passed to the specific GLiNER variant.
Examples
>>> config = GLiNERConfig(model_name="bert-base-cased") >>> model = GLiNER(config) >>> model = GLiNER("path/to/gliner_config.json")
- __init__(config, **kwargs)[source]¶
Initialize a GLiNER model with automatic type detection.
This constructor determines the appropriate GLiNER variant based on the configuration and replaces itself with an instance of that variant.
- Parameters:
config (str | Path | GLiNERConfig) – Model configuration (GLiNERConfig object, path to config file, or dict).
**kwargs – Additional arguments passed to the specific GLiNER variant.
Examples
>>> config = GLiNERConfig(model_name="bert-base-cased") >>> model = GLiNER(config) >>> model = GLiNER("path/to/gliner_config.json")
- classmethod from_pretrained(model_id, revision=None, cache_dir=None, force_download=False, proxies=None, resume_download=False, local_files_only=False, token=None, map_location='cpu', strict=False, load_tokenizer=None, resize_token_embeddings=True, compile_torch_model=False, quantize=False, load_onnx_model=False, onnx_model_file='model.onnx', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]¶
Load a pretrained GLiNER model with automatic type detection.
This method loads the configuration, determines the appropriate GLiNER variant, and delegates to that variant’s from_pretrained method.
- Parameters:
model_id (str) – Model identifier or local path.
revision (str | None) – Model revision.
cache_dir (str | Path | None) – Cache directory.
force_download (bool) – Force redownload.
proxies (dict | None) – Proxy configuration.
resume_download (bool) – Resume interrupted downloads.
local_files_only (bool) – Only use local files.
token (str | bool | None) – HF token for private repos.
map_location (str) – Device to map model to.
strict (bool) – Enforce strict state_dict loading.
load_tokenizer (bool | None) – Whether to load tokenizer.
resize_token_embeddings (bool | None) – Whether to resize embeddings.
compile_torch_model (bool | None) – Whether to compile with torch.compile.
quantize (bool | str) – Quantization dtype.
Trueor"fp16"for float16,"bf16"for bfloat16,"int8"for int8 dynamic quantization (requirestorchao).Falseto disable.load_onnx_model (bool | None) – Whether to load ONNX model instead of PyTorch.
onnx_model_file (str | None) – Path to ONNX model file.
max_length (int | None) – Override max_length in config.
max_width (int | None) – Override max_width in config.
post_fusion_schema (str | None) – Override post_fusion_schema in config.
_attn_implementation (str | None) – Override attention implementation.
**model_kwargs – Additional model initialization arguments.
- Returns:
Appropriate GLiNER model instance.
Examples
>>> model = GLiNER.from_pretrained("urchade/gliner_small-v2.1") >>> model = GLiNER.from_pretrained("knowledgator/gliner-bi-small-v1.0") >>> model = GLiNER.from_pretrained("path/to/local/model", quantize=True)
- classmethod from_config(config, cache_dir=None, load_tokenizer=True, resize_token_embeddings=True, backbone_from_pretrained=True, compile_torch_model=False, quantize=False, map_location='cpu', max_length=None, max_width=None, post_fusion_schema=None, _attn_implementation=None, **model_kwargs)[source]¶
Create a GLiNER model from configuration.
- Parameters:
config (GLiNERConfig | str | Path | dict) – Model configuration (GLiNERConfig object, path to config file, or dict).
cache_dir (str | Path | None) – Cache directory for downloads.
load_tokenizer (bool) – Whether to load tokenizer.
resize_token_embeddings (bool) – Whether to resize token embeddings.
backbone_from_pretrained (bool) – Whether to load the backbone encoder from pretrained weights.
compile_torch_model (bool) – Whether to compile with torch.compile.
quantize (bool | str) – Quantization dtype.
Trueor"fp16"for float16,"bf16"for bfloat16,"int8"for int8 dynamic quantization (requirestorchao).Falseto disable.map_location (str) – Device to map model to.
max_length (int | None) – Override max_length in config.
max_width (int | None) – Override max_width in config.
post_fusion_schema (str | None) – Override post_fusion_schema in config.
_attn_implementation (str | None) – Override attention implementation.
**model_kwargs – Additional model initialization arguments.
- Returns:
Initialized GLiNER model instance.
Examples
>>> config = GLiNERConfig(model_name="microsoft/deberta-v3-small") >>> model = GLiNER.from_config(config) >>> model = GLiNER.from_config("path/to/gliner_config.json")
- property model_map: dict[str, dict[str, Any]]¶
Map configuration patterns to their corresponding GLiNER classes.
- Returns:
Dictionary mapping model types to their classes and descriptions.