gliner.evaluation.evaluate_ner module¶
- gliner.evaluation.evaluate_ner.open_content(path)[source]¶
Load train, dev, test, and label files from a dataset directory.
Searches for JSON files in the specified directory and loads them based on filename patterns (train, dev, test, labels).
- Parameters:
path – Path to the directory containing dataset JSON files.
- Returns:
train: List of training examples loaded from train.json, or None if not found
dev: List of development examples loaded from dev.json, or None if not found
test: List of test examples loaded from test.json, or None if not found
labels: List of entity type labels loaded from labels.json, or None if not found
- Return type:
A tuple of (train, dev, test, labels) where
Note
Files are identified by checking if their filename contains ‘train’, ‘dev’, ‘test’, or ‘labels’. All files are expected to be in JSON format with UTF-8 encoding.
- gliner.evaluation.evaluate_ner.process(data)[source]¶
Convert character-level entity annotations to word-level annotations.
Takes a data sample with character-level entity positions and converts them to word-level positions by tokenizing the sentence on whitespace.
- Parameters:
data –
Dictionary containing: - ‘sentence’: String of the full sentence - ‘entities’: List of entity dictionaries, each with:
’pos’: Tuple of (start_char, end_char) character positions
’type’: String entity type label
- Returns:
‘tokenized_text’: List of words from the sentence
’ner’: List of tuples (start_word, end_word, entity_type) where start_word and end_word are word-level indices and entity_type is the lowercased entity type
- Return type:
Dictionary containing
Note
This function assumes whitespace-separated words and that character positions align exactly with word boundaries (including spaces).
- gliner.evaluation.evaluate_ner.create_dataset(path)[source]¶
Create train, dev, and test datasets from a directory of JSON files.
Loads all dataset splits and processes them to convert character-level annotations to word-level annotations. Also normalizes entity type labels to lowercase.
- Parameters:
path – Path to the directory containing dataset JSON files.
- Returns:
train_dataset: List of processed training samples
dev_dataset: List of processed development samples
test_dataset: List of processed test samples
labels: List of entity type labels (lowercased)
- Return type:
A tuple of (train_dataset, dev_dataset, test_dataset, labels) where
Note
Each sample in the datasets is a dictionary with ‘tokenized_text’ and ‘ner’ keys as returned by the process() function.
- gliner.evaluation.evaluate_ner.get_for_one_path(path, model)[source]¶
Evaluate a model on a single dataset.
Loads the test set from the specified path and evaluates the model’s performance. Automatically determines whether to use flat NER evaluation based on the dataset name.
- Parameters:
path – Path to the dataset directory.
model – NER model instance with an evaluate() method.
- Returns:
data_name: String name of the dataset (extracted from path)
results: Detailed evaluation results dictionary from model.evaluate()
f1: F1 score (float) for the dataset
- Return type:
A tuple of (data_name, results, f1) where
Note
Datasets with ‘ACE’, ‘GENIA’, or ‘Corpus’ in their name are evaluated with flat_ner=False, all others use flat_ner=True. Evaluation uses a threshold of 0.5 and batch size of 12.
- gliner.evaluation.evaluate_ner.get_for_all_path(model, steps, log_dir, data_paths)[source]¶
Evaluate a model across multiple datasets and log results.
Evaluates the model on all datasets in the specified directory, separating results into standard benchmarks and zero-shot benchmarks. Writes detailed results to log files and computes average scores.
- Parameters:
model – NER model instance with an evaluate() method and PyTorch parameters.
steps – Integer representing the current training step (for logging).
log_dir – Directory path where result files will be saved.
data_paths – Path to directory containing multiple dataset subdirectories.
Note
Creates two log files in log_dir: - ‘results.txt’: Detailed results for each dataset - ‘tables.txt’: Formatted tables with averages for benchmarks
Zero-shot benchmark datasets (not included in main average): - mit-movie, mit-restaurant - CrossNER_AI, CrossNER_literature, CrossNER_music,
CrossNER_politics, CrossNER_science
Datasets with ‘sample_’ in their path are skipped.
- gliner.evaluation.evaluate_ner.sample_train_data(data_paths, sample_size=10000)[source]¶
Sample training data from multiple datasets for combined training.
Creates a combined training set by sampling a fixed number of examples from each dataset (excluding zero-shot benchmark datasets). Shuffles each dataset before sampling to ensure diversity.
- Parameters:
data_paths – Path to directory containing multiple dataset subdirectories.
sample_size – Maximum number of samples to take from each dataset. Defaults to 10000.
- Returns:
‘tokenized_text’: List of words
’ner’: List of entity tuples (start, end, type)
’label’: List of all entity type labels for this dataset
- Return type:
List of training samples, where each sample is a dictionary with
Note
Excludes zero-shot benchmark datasets: - CrossNER_AI, CrossNER_literature, CrossNER_music,
CrossNER_politics, CrossNER_science, ACE 2004
Each dataset is shuffled before sampling to ensure random selection. If a dataset has fewer than sample_size examples, all examples are used.