gliner.evaluation.evaluator module

class gliner.evaluation.evaluator.BaseEvaluator(all_true, all_outs)[source]

Bases: ABC

Abstract base class for evaluation of NER and relation extraction tasks.

Provides common functionality for computing precision, recall, and F1 scores from ground truth and predicted annotations. Subclasses must implement transform_data() to convert task-specific data formats.

all_true

List of ground truth annotations for all samples.

all_outs

List of predicted annotations for all samples.

Initialize the evaluator with ground truth and predictions.

Parameters:
  • all_true – List of ground truth annotations for all samples. Format depends on the specific evaluator subclass.

  • all_outs – List of predicted annotations for all samples. Format depends on the specific evaluator subclass.

__init__(all_true, all_outs)[source]

Initialize the evaluator with ground truth and predictions.

Parameters:
  • all_true – List of ground truth annotations for all samples. Format depends on the specific evaluator subclass.

  • all_outs – List of predicted annotations for all samples. Format depends on the specific evaluator subclass.

static compute_prf(y_true, y_pred, average='micro')[source]

Compute precision, recall, and F1 score.

Calculates evaluation metrics by comparing true and predicted annotations. Supports both micro-averaging (aggregate all predictions) and macro-averaging (average per-class metrics).

Parameters:
  • y_true – List of ground truth annotations in flattened format. Each annotation is [label, span] where span is tuple of positions.

  • y_pred – List of predicted annotations in flattened format. Each annotation is [label, span] where span is tuple of positions.

  • average – Averaging strategy. Defaults to “micro”. - “micro”: Aggregate TP, FP, FN across all classes - Other values: Per-class metrics (requires additional logic)

Returns:

  • ‘precision’: Precision score (float between 0 and 1)

  • ’recall’: Recall score (float between 0 and 1)

  • ’f_score’: F1 score (float between 0 and 1)

Return type:

Dictionary containing

Note

The function handles division by zero with warnings through the _prf_divide utility function.

abstract transform_data()[source]

Transform task-specific data into evaluation format.

Abstract method that must be implemented by subclasses to convert their specific annotation formats into the standard format expected by compute_prf().

Returns:

Tuple of (transformed_true, transformed_pred) where each is a list of annotations in the format: [label, span_tuple]

Raises:

NotImplementedError – If called on the base class directly.

evaluate()[source]

Evaluate predictions against ground truth.

Transforms data using transform_data() and computes precision, recall, and F1 score using micro-averaging.

Returns:

  • output_str: Formatted string with P, R, F1 percentages

  • f1: F1 score as a float

Return type:

Tuple of (output_str, f1) where

Note

This method disables gradient computation with @torch.no_grad() for efficiency during evaluation.

class gliner.evaluation.evaluator.BaseNEREvaluator(all_true, all_outs)[source]

Bases: BaseEvaluator

Evaluator for Named Entity Recognition tasks.

Evaluates NER predictions by comparing predicted entity spans and types against ground truth annotations. An entity is considered correct only if both the span boundaries and entity type match exactly.

Initialize the evaluator with ground truth and predictions.

Parameters:
  • all_true – List of ground truth annotations for all samples. Format depends on the specific evaluator subclass.

  • all_outs – List of predicted annotations for all samples. Format depends on the specific evaluator subclass.

get_ground_truth(ents)[source]

Extract ground truth entities in evaluation format.

Parameters:

ents – List of ground truth entity tuples in format (start, end, label) where start and end are word-level indices.

Returns:

List of entities in format [[label, (start, end)], …] suitable for evaluation.

get_predictions(ents)[source]

Extract predicted entities in evaluation format.

Parameters:

ents – List of predicted entity tuples in format (start, end, label) where start and end are word-level indices.

Returns:

List of entities in format [[label, (start, end)], …] suitable for evaluation.

transform_data()[source]

Transform NER data into evaluation format.

Converts both ground truth and predicted entities from their original format into the standardized format required by compute_prf().

Returns:

  • all_true_ent: List of ground truth entity lists, one per sample

  • all_outs_ent: List of predicted entity lists, one per sample

Each entity is in format [label, (start, end)]

Return type:

Tuple of (all_true_ent, all_outs_ent) where

class gliner.evaluation.evaluator.BaseRelexEvaluator(all_true, all_outs)[source]

Bases: BaseEvaluator

Evaluator for Relation Extraction tasks.

Evaluates relation extraction predictions by comparing predicted relations (head entity, tail entity, relation type) against ground truth. A relation is considered correct only if both entity spans and the relation type match exactly.

Note

The input format expects entity indices rather than entity spans directly. Entity spans are looked up from the entity list using these indices.

Initialize the evaluator with ground truth and predictions.

Parameters:
  • all_true – List of ground truth annotations for all samples. Format depends on the specific evaluator subclass.

  • all_outs – List of predicted annotations for all samples. Format depends on the specific evaluator subclass.

get_ground_truth(ents, rels)[source]

Extract ground truth relations in evaluation format.

Parameters:
  • ents – List of entity tuples in format (start, end, label).

  • rels – List of relation tuples in format (head_idx, tail_idx, rel_label) where head_idx and tail_idx are indices into the ents list.

Returns:

List of relations in format [[rel_label, (h_start, h_end, t_start, t_end)], …] where h_start, h_end are head entity boundaries and t_start, t_end are tail entity boundaries.

get_predictions(ents, rels)[source]

Extract predicted relations in evaluation format.

Parameters:
  • ents – List of entity tuples in format (start, end, label).

  • rels – List of predicted relation tuples in format (head_idx, rel_label, tail_idx) where head_idx and tail_idx are indices into the ents list.

Returns:

List of relations in format [[rel_label, (h_start, h_end, t_start, t_end)], …] where h_start, h_end are head entity boundaries and t_start, t_end are tail entity boundaries.

Note

The order of elements in predicted relations is (head_idx, rel_label, tail_idx), which differs from ground truth format (head_idx, tail_idx, rel_label).

transform_data()[source]

Transform relation extraction data into evaluation format.

Converts both ground truth and predicted relations from their original format into the standardized format required by compute_prf().

Returns:

  • all_true_rel: List of ground truth relation lists, one per sample

  • all_outs_rel: List of predicted relation lists, one per sample

Each relation is in format [rel_label, (h_start, h_end, t_start, t_end)]

Return type:

Tuple of (all_true_rel, all_outs_rel) where

Note

The input format (self.all_true and self.all_outs) is expected to contain tuples of (entities, relations) for each sample.