Urchade Zaratiana

PhD Student at LIPN

I am a PhD student at Laboratoire Informatique de Paris Nord (LIPN), under the supervision of Nadi Tomeh and Thierry Charnois. My research focuses on structured prediction for Natural Language Processing.

I am interested in the science of deep learning in general, including topics such as domain generalization, zero/few-shot learning, instruction tuning, learning under noisy data/labels, scaling laws, intriguing phenomenons such as the 'double descent', the 'grokking mechanism' and the 'simplicity bias'.

Research

Conference Publications

NAACL 2024 (Main)

Named Entity Recognition (NER) is essential in various Natural Language Processing (NLP) applications. Traditional NER models are effective but limited to a set of predefined entity types. In contrast, Large Language Models (LLMs) can extract arbitrary entities through natural language instructions, offering greater flexibility. However, their size and cost, particularly for those accessed via APIs like ChatGPT, make them impractical in resource-limited scenarios. In this paper, we introduce a compact NER model trained to identify any type of entity. Leveraging a bidirectional transformer encoder, our model, GLiNER, facilitates parallel entity extraction, an advantage over the slow sequential token generation of LLMs. Through comprehensive testing, GLiNER demonstrate strong performance, outperforming both ChatGPT and fine-tuned LLMs in zero-shot evaluations on various NER benchmarks.

AAAI 2024

In this paper, we propose a novel method for joint entity and relation extract from unstructured text by framing it as a conditional sequence generation problem. In contrast to conventional generative information extraction models that generate text as output, our approach generates a linearized graph where nodes represent text spans while the edges/relation of the graph represent relation triples. For that, our method employs a transformer encoder-decoder architecture with pointing mechanism on a dynamic vocabulary of spans and relation types. Particularly, our model can capture the structural characteristics and boundaries of entities and relations through span representation, while simultaneously grounding the generated output in the original text thanks to pointer mechanism. Evaluation on benchmark datasets validates the effectiveness of our approach, demonstrating state-of-the-art results in entity and relation extraction tasks.

EMNLP 2023 (Findings)

Semi-Markov CRF has been proposed as an alternative to the traditional Linear Chain CRF for text segmentation tasks such as Named Entity Recognition (NER). Unlike CRF, which treats text segmentation as token-level prediction, Semi-CRF considers segments as the basic unit, making it more expressive. However, Semi-CRF suffers from two major drawbacks: (1) quadratic complexity over sequence length, as it operates on every span of the input sequence, and (2) inferior performance compared to CRF for sequence labeling tasks like NER. In this paper, we introduce Filtered Semi-Markov CRF, a variant of Semi-CRF that addresses these issues by incorporating a filtering step to eliminate irrelevant segments, reducing complexity and search space. Our approach is evaluated on several NER benchmarks, where it outperforms both CRF and Semi-CRF while being significantly faster. The implementation of our method will be made publicly available.

Work in progress

Preprint (2024)

Information extraction (IE) is an important task in Natural Language Processing (NLP), involving the extraction of named entities and their relationships from unstructured text. In this paper, we propose a novel approach to this task by formulating it as graph structure learning (GSL). By formulating IE as GSL, we enhance the model's ability to dynamically refine and optimize the graph structure during the extraction process. This formulation allows for better interaction and structure-informed decisions for entity and relation prediction, in contrast to previous models that have separate or untied predictions for these tasks. When compared against state-of-the-art baselines on joint entity and relation extraction benchmarks, our model, GraphER, achieves competitive results.

Preprint (2024)

Joint entity and relation extraction plays a pivotal role in various applications, notably in the construction of knowledge graphs. Despite recent progress, existing approaches often fall short in two key aspects: richness of representation and coherence in output structure. These models often rely on handcrafted heuristics for computing entity and relation representations, potentially leading to loss of crucial information. Furthermore, they disregard task and/or dataset-specific constraints, resulting in output structures that lack coherence. In our work, we introduce EnriCo, which mitigates these shortcomings. Firstly, to foster rich and expressive representation, our model leverage attention mechanisms that allow both entities and relations to dynamically determine the pertinent information required for accurate extraction. Secondly, we introduce a series of decoding algorithms designed to infer the highest scoring solutions while adhering to task and dataset-specific constraints, thus promoting structured and coherent outputs. Our model demonstrates competitive performance compared to baselines when evaluated on Joint IE datasets.

Under submission

Extractive Question Answering (ExQA) is an NLP task that aims to accurately identify the location of the answer within a given document, given a question. It is typically achieved by independently predicting the start and end positions of the answer. While simple, it outperforms more structured approaches, such as jointly predicting the start and end positions, when working with large-scale data. However, when data is limited, the joint objective, which has a stronger inductive bias, proves to be more effective, but at the cost of increased computational complexity. We propose a new ExQA approach that combines the efficiency of independent prediction with the advantages of the joint objective by allowing for soft interactions between start and end positions. Our experiments show that our approach outperforms existing baselines in both high and low data settings.

Workshop Publications

SemEval 2024
Maha Ben-Fares , Urchade Zaratiana , Simon D. Hernandez , Pierre Holat

In this paper, we present the description of our proposed system for Subtask A - multilingual track at SemEval-2024 Task 8, which aims to classify if text has been generated by an AI or Human. Our approach treats binary text classification as token-level prediction, with the final classification being the average of token-level predictions. Through the use of rich representations of pre-trained transformers, our model is trained to selectively aggregate information from across different layers to score individual tokens, given that each layer may contain distinct information. Notably, our model demonstrates competitive performance on the test dataset, achieving an accuracy score of 95.8\%. Furthermore, it secures the 2nd position in the multilingual track of Subtask A, with a mere 0.1\% behind the leading system.

Tiny Papers @ ICLR 2024

Recent studies have shown that Autoregressive Transformer Language Models (LMs) can generate text sequences without relying on positional encodings (PEs). This capability is attributed to the causal masks in these models, which prevent tokens from accessing information from future tokens, allowing implicit learning of token positions. On the other hand, Bidirectional LMs, such as BERT, tend to underperform on masked language modeling tasks when PEs are omitted. This performance dip arises because transformer layers are inherently permutation equivariant; without PEs or masks, they cannot differentiate token positions, making bidirectional processing difficult. In this analytical study, we examine a variant of bidirectional Transformer LM that operates without PEs but incorporates causal masks in its initial layers. Our findings reveal that this configuration yields performance metrics on masked language modeling tasks that are on par with traditional transformers that use PEs. However, when tested on the GLUE language understanding benchmark, the model without PEs exhibits diminished performance. These results highlight the importance of positional encodings in bidirectional LMs and indicate that pretraining loss might not always correlate with performance on downstream tasks.

Arabic Natural Language Processing Conference 2023 (co-located with EMNLP 2023)

In this paper, we study the transferability of Named Entity Recognition (NER) models between Arabic dialects. This question is important because the available manually-annotated resources are not distributed equally across dialects: Modern Standard Arabic (MSA) is much richer than other dialects for which little to no datasets exist.How well does a NER model, trained on MSA, perform on other dialects? To answer this question, we construct four datasets. The first is an MSA dataset extracted from the ACE 2005 corpus. The others are datasets for Egyptian, Morocan and Syrian which we manually annotate following the ACE guidelines.We train a span-based NER model on top of a pretrained language model (PLM) encoder on the MSA data and study its performance on the other datasets in zero-shot settings.We study the performance of multiple PLM encoders from the literature and show that they achieve acceptable performance with no annotation effort.Our annotations and models are publicly available (\url{https://github.com/niamaelkhbir/Arabic-Cross-Dialectal-NER}).

Workshop on Unimodal and Multimodal Induction of Linguistic Structures @ EMNLP 2022

Named Entity Recognition (NER) is an important task in Natural Language Processing with applications in many domains. While the dominant paradigm of NER is sequence labelling, span-based approaches have become very popular in recent times but are less well understood. In this work, we study different aspects of span-based NER, namely the span representation, learning strategy, and decoding algorithms to avoid span overlap. We also propose an exact algorithm that efficiently finds the set of non-overlapping spans that maximizes a global score, given a list of candidate spans. We performed our study on three benchmark NER datasets from different domains. We make our code publicly available at https://github.com/urchade/span-structured-prediction.

Workshop on Transfer Learning for Natural Language Processing @ NeurIPS 2022

Extractive QA is an important NLP task with numerous real-world applications. The most common method for extractive QA is to encode the input sequence with a pretrained Transformer such as BERT, and then compute the probability of the start and end positions of span answers using two leaned query vectors. This method has been shown to be effective and hard to outperform. However, the query vectors are static, meaning they are the same regardless of the input, which can be a challenging issue in improving the model's performance. To address this problem, we propose \texttt{DyReF} (\texttt{Dy}namic \texttt{Re}presentation for \texttt{F}ree), a model that dynamically learns query vectors for free, i.e. without adding any parameters, by concatenating the query vectors with the embeddings of the input tokens of the Transformer layers. In this way, the query vectors can aggregate information from the source sentence and adapt to the question, while the representations of the input tokens are also dependent on the queries, allowing for better task specialization. We demonstrate empirically that our simple approach outperforms strong baseline in a variety of extractive question answering benchmark datasets. The code is publicly available at \url{https://github.com/urchade/DyReF}.

Workshop on Unimodal and Multimodal Induction of Linguistic Structures @ EMNLP 2022

Named Entity Recognition (NER) is an important task in Natural Language Processing with applications in many domains. In this paper, we describe a novel approach to named entity recognition, in which we output a set of spans (i.e., segmentations) by maximizing a global score. During training, we optimize our model by maximizing the probability of the gold segmentation. During inference, we use dynamic programming to select the best segmentation under a linear time complexity. We prove that our approach outperforms CRF and semi-CRF models for Named Entity Recognition. We will make our code publicly available.

2nd Workshop on Efficient Natural Language and Speech Processing @ NeurIPS 2022

Extractive question answering (ExQA) is an essential task for Natural Language Processing. The dominant approach to ExQA is one that represents the input sequence tokens (question and passage) with a pre-trained transformer, then uses two learned query vectors to compute distributions over the start and end answer span positions. These query vectors lack the context of the inputs, which can be a bottleneck for the model performance. To address this problem, we propose \textit{DyREx}, a generalization of the \textit{vanilla} approach where we dynamically compute query vectors given the input, using an attention mechanism through transformer layers. Empirical observations demonstrate that our approach consistently improves the performance over the standard one. The code and accompanying files for running the experiments are available at \url{https://github.com/urchade/DyReX}.

ACL SRW 2022

There are two main paradigms for Named Entity Recognition (NER): sequence labelling and span classification. Sequence labelling aims to assign a label to each word in an input text using, for example, BIO (Begin, Inside and Outside) tagging, while span classification involves enumerating all possible spans in a text and classifying them into their labels. In contrast to sequence labelling, unconstrained span-based methods tend to assign entity labels to overlapping spans, which is generally undesirable, especially for NER tasks without nested entities. Accordingly, we propose GNNer, a framework that uses Graph Neural Networks to enrich the span representation to reduce the number of overlapping spans during prediction. Our approach reduces the number of overlapping spans compared to strong baseline while maintaining competitive metric performance. Code is available at https://github.com/urchade/GNNer.

Contact