gliner.data_processing.collator moduleΒΆ
- class gliner.data_processing.collator.BaseDataCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True)[source]ΒΆ
Bases:
ABCAbstract base class for all data collators.
Provides common functionality for collating batches and preparing model inputs. Subclasses should implement processor-specific logic and field handling.
Initialize the base data collator.
- Parameters:
config β Configuration object containing model/training parameters.
data_processor (BaseProcessor | None) β Processor instance for handling data transformations. If None, subclass should provide a default processor.
return_tokens (bool) β Whether to include tokenized text in output.
return_id_to_classes (bool) β Whether to include class ID to name mappings.
return_entities (bool) β Whether to include entity annotations.
prepare_labels (bool) β Whether to prepare labels for training.
- __init__(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True)[source]ΒΆ
Initialize the base data collator.
- Parameters:
config β Configuration object containing model/training parameters.
data_processor (BaseProcessor | None) β Processor instance for handling data transformations. If None, subclass should provide a default processor.
return_tokens (bool) β Whether to include tokenized text in output.
return_id_to_classes (bool) β Whether to include class ID to name mappings.
return_entities (bool) β Whether to include entity annotations.
prepare_labels (bool) β Whether to prepare labels for training.
- collate_batch(input_x, **kwargs)[source]ΒΆ
Collate raw input examples into a batch.
- Parameters:
input_x (List[Dict[str, Any]]) β List of raw input examples.
**kwargs β Additional arguments passed to the processorβs collate_raw_batch.
- Returns:
Dict containing collated raw batch data.
- Return type:
Dict[str, Any]
- collate_function(raw_batch, **kwargs)[source]ΒΆ
Transform raw batch into model input format.
- Parameters:
raw_batch (Dict[str, Any]) β Raw collated batch from collate_batch.
**kwargs β Additional arguments passed to the processorβs collate_fn.
- Returns:
Dict containing model-ready inputs.
- Return type:
Dict[str, Any]
- class gliner.data_processing.collator.BaseSpanCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True)[source]ΒΆ
Bases:
BaseDataCollatorBase collator for span-based processors.
Provides common logic for handling span indices, span masks, and span labels. Used by all span-level NER/RE models.
Initialize the base data collator.
- Parameters:
config β Configuration object containing model/training parameters.
data_processor (BaseProcessor | None) β Processor instance for handling data transformations. If None, subclass should provide a default processor.
return_tokens (bool) β Whether to include tokenized text in output.
return_id_to_classes (bool) β Whether to include class ID to name mappings.
return_entities (bool) β Whether to include entity annotations.
prepare_labels (bool) β Whether to prepare labels for training.
- class gliner.data_processing.collator.BaseTokenCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True)[source]ΒΆ
Bases:
BaseDataCollatorBase collator for token-based processors.
Provides common logic for handling token-level annotations and entity IDs. Used by all token-level NER models.
Initialize the base data collator.
- Parameters:
config β Configuration object containing model/training parameters.
data_processor (BaseProcessor | None) β Processor instance for handling data transformations. If None, subclass should provide a default processor.
return_tokens (bool) β Whether to include tokenized text in output.
return_id_to_classes (bool) β Whether to include class ID to name mappings.
return_entities (bool) β Whether to include entity annotations.
prepare_labels (bool) β Whether to prepare labels for training.
- class gliner.data_processing.collator.SpanDataCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True, prepare_entities=True)[source]ΒΆ
Bases:
BaseSpanCollatorUnified data collator for all span-based processors.
Handles span-based NER with various architectures: - UniEncoder: Single encoder with span classification - BiEncoder: Separate encoders for text and entity types - EncoderDecoder: Generative entity typing with decoder
Automatically adapts behavior based on processor type.
- Required Processors: UniEncoderSpanProcessor, BiEncoderSpanProcessor,
or UniEncoderSpanDecoderProcessor
Initialize unified span collator.
- Parameters:
config β Configuration object.
data_processor (UniEncoderSpanProcessor | BiEncoderSpanProcessor | UniEncoderSpanDecoderProcessor | None) β Span processor instance (Uni/Bi/EncoderDecoder).
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return class mappings.
return_entities (bool) β Whether to return entity annotations.
prepare_labels (bool) β Whether to prepare training labels.
prepare_entities (bool) β Whether to encode entity types (BiEncoder only).
- __init__(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True, prepare_entities=True)[source]ΒΆ
Initialize unified span collator.
- Parameters:
config β Configuration object.
data_processor (UniEncoderSpanProcessor | BiEncoderSpanProcessor | UniEncoderSpanDecoderProcessor | None) β Span processor instance (Uni/Bi/EncoderDecoder).
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return class mappings.
return_entities (bool) β Whether to return entity annotations.
prepare_labels (bool) β Whether to prepare training labels.
prepare_entities (bool) β Whether to encode entity types (BiEncoder only).
- class gliner.data_processing.collator.TokenDataCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True, prepare_entities=True)[source]ΒΆ
Bases:
BaseTokenCollatorUnified data collator for all token-based processors.
Handles token-level NER with various architectures: - UniEncoder: Single encoder with BIO/BIOES tagging - BiEncoder: Separate encoders for text and entity types
Automatically adapts behavior based on processor type.
Required Processors: UniEncoderTokenProcessor or BiEncoderTokenProcessor
Initialize unified token collator.
- Parameters:
config β Configuration object.
data_processor (UniEncoderTokenProcessor | BiEncoderTokenProcessor | None) β Token processor instance (Uni/Bi).
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return class mappings.
return_entities (bool) β Whether to return entity annotations.
prepare_labels (bool) β Whether to prepare training labels.
prepare_entities (bool) β Whether to encode entity types (BiEncoder only).
- __init__(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True, prepare_entities=True)[source]ΒΆ
Initialize unified token collator.
- Parameters:
config β Configuration object.
data_processor (UniEncoderTokenProcessor | BiEncoderTokenProcessor | None) β Token processor instance (Uni/Bi).
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return class mappings.
return_entities (bool) β Whether to return entity annotations.
prepare_labels (bool) β Whether to prepare training labels.
prepare_entities (bool) β Whether to encode entity types (BiEncoder only).
- class gliner.data_processing.collator.RelationExtractionSpanDataCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, return_rel_id_to_classes=False, return_relations=False, prepare_labels=True)[source]ΒΆ
Bases:
BaseSpanCollatorData collator for RelationExtractionSpanProcessor.
Handles joint entity and relation extraction at span level. Produces both entity labels and relation adjacency matrices.
This collator is kept separate due to its unique handling of: - Relation adjacency matrices - Dual classification (entities + relations) - Relation-specific configuration
Required Processor: RelationExtractionSpanProcessor
Initialize RelationExtraction span collator.
- Parameters:
config β Configuration object.
data_processor (RelationExtractionSpanProcessor | None) β RelationExtractionSpanProcessor instance.
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return entity class mappings.
return_entities (bool) β Whether to return entity annotations.
return_rel_id_to_classes (bool) β Whether to return relation class mappings.
return_relations (bool) β Whether to return relation annotations.
prepare_labels (bool) β Whether to prepare training labels.
- __init__(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, return_rel_id_to_classes=False, return_relations=False, prepare_labels=True)[source]ΒΆ
Initialize RelationExtraction span collator.
- Parameters:
config β Configuration object.
data_processor (RelationExtractionSpanProcessor | None) β RelationExtractionSpanProcessor instance.
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return entity class mappings.
return_entities (bool) β Whether to return entity annotations.
return_rel_id_to_classes (bool) β Whether to return relation class mappings.
return_relations (bool) β Whether to return relation annotations.
prepare_labels (bool) β Whether to prepare training labels.
- collate_batch(input_x, entity_types=None, relation_types=None, ner_negatives=None, rel_negatives=None, **kwargs)[source]ΒΆ
Collate raw batch data for relation extraction.
- Parameters:
input_x (List[Dict[str, Any]]) β List of input examples.
entity_types (List[str] | List[List[str]] | None) β Optional entity type specifications.
relation_types (List[str] | List[List[str]] | None) β Optional relation type specifications.
ner_negatives (List[str] | None) β Optional negative entity types for sampling.
rel_negatives (List[str] | None) β Optional negative relation types for sampling.
**kwargs β Additional arguments.
- Returns:
Collated raw batch with entity and relation information.
- Return type:
Dict[str, Any]
- class gliner.data_processing.collator.UniEncoderSpanDataCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True, prepare_entities=True)[source]ΒΆ
Bases:
SpanDataCollatorBackward compatibility alias for SpanDataCollator with UniEncoderSpanProcessor.
Use SpanDataCollator directly for new code.
Initialize unified span collator.
- Parameters:
config β Configuration object.
data_processor (UniEncoderSpanProcessor | BiEncoderSpanProcessor | UniEncoderSpanDecoderProcessor | None) β Span processor instance (Uni/Bi/EncoderDecoder).
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return class mappings.
return_entities (bool) β Whether to return entity annotations.
prepare_labels (bool) β Whether to prepare training labels.
prepare_entities (bool) β Whether to encode entity types (BiEncoder only).
- class gliner.data_processing.collator.BiEncoderSpanDataCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True, prepare_entities=True)[source]ΒΆ
Bases:
SpanDataCollatorBackward compatibility alias for SpanDataCollator with BiEncoderSpanProcessor.
Use SpanDataCollator directly for new code.
Initialize unified span collator.
- Parameters:
config β Configuration object.
data_processor (UniEncoderSpanProcessor | BiEncoderSpanProcessor | UniEncoderSpanDecoderProcessor | None) β Span processor instance (Uni/Bi/EncoderDecoder).
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return class mappings.
return_entities (bool) β Whether to return entity annotations.
prepare_labels (bool) β Whether to prepare training labels.
prepare_entities (bool) β Whether to encode entity types (BiEncoder only).
- class gliner.data_processing.collator.UniEncoderSpanDecoderDataCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True, prepare_entities=True)[source]ΒΆ
Bases:
SpanDataCollatorBackward compatibility alias for SpanDataCollator with EncoderDecoderSpanProcessor.
Use SpanDataCollator directly for new code.
Initialize unified span collator.
- Parameters:
config β Configuration object.
data_processor (UniEncoderSpanProcessor | BiEncoderSpanProcessor | UniEncoderSpanDecoderProcessor | None) β Span processor instance (Uni/Bi/EncoderDecoder).
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return class mappings.
return_entities (bool) β Whether to return entity annotations.
prepare_labels (bool) β Whether to prepare training labels.
prepare_entities (bool) β Whether to encode entity types (BiEncoder only).
- class gliner.data_processing.collator.UniEncoderTokenDataCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True, prepare_entities=True)[source]ΒΆ
Bases:
TokenDataCollatorBackward compatibility alias for TokenDataCollator with UniEncoderTokenProcessor.
Use TokenDataCollator directly for new code.
Initialize unified token collator.
- Parameters:
config β Configuration object.
data_processor (UniEncoderTokenProcessor | BiEncoderTokenProcessor | None) β Token processor instance (Uni/Bi).
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return class mappings.
return_entities (bool) β Whether to return entity annotations.
prepare_labels (bool) β Whether to prepare training labels.
prepare_entities (bool) β Whether to encode entity types (BiEncoder only).
- class gliner.data_processing.collator.BiEncoderTokenDataCollator(config, data_processor=None, return_tokens=False, return_id_to_classes=False, return_entities=False, prepare_labels=True, prepare_entities=True)[source]ΒΆ
Bases:
TokenDataCollatorBackward compatibility alias for TokenDataCollator with BiEncoderTokenProcessor.
Use TokenDataCollator directly for new code.
Initialize unified token collator.
- Parameters:
config β Configuration object.
data_processor (UniEncoderTokenProcessor | BiEncoderTokenProcessor | None) β Token processor instance (Uni/Bi).
return_tokens (bool) β Whether to return tokenized text.
return_id_to_classes (bool) β Whether to return class mappings.
return_entities (bool) β Whether to return entity annotations.
prepare_labels (bool) β Whether to prepare training labels.
prepare_entities (bool) β Whether to encode entity types (BiEncoder only).