Portrait of Urchade Zaratiana

I am a PhD student at Laboratoire Informatique de Paris Nord (LIPN) 🏫, under the supervision of Nadi Tomeh and Thierry Charnois. My research focuses on structured prediction for Natural Language Processing πŸ§ πŸ“Š.

I am also a Member of Technical Staff at a US-based startup πŸ‡ΊπŸ‡ΈπŸ’, where I am conducting research on Small Language Models πŸ€–. I am currently based in Île de la RΓ©union πŸ‡«πŸ‡·πŸ‡·πŸ‡ͺ.

I am passionate about the science of deep learning, with particular interest in topics such as domain generalization 🌐, zero/few-shot/instruction learning πŸ› οΈπŸ“š, learning under noisy data/labels πŸ“‰πŸ”, scaling laws πŸ“ˆ, and simplicity bias πŸŽ›οΈ.


During my PhD, I have worked on the following research problems:

  • Structured decoding (2021-)
    • Named Entity Recognition as Structured Span Prediction (Zaratiana et al., EMNLP 2022 UM-IoS) (Link)
    • EnriCO: Constrained decoding of information extraction using logical rules. (Zaratiana et al., ArXiv 2024) (Link)
  • Graph (structure) learning (2021-)
    • GraphER: End-to-end graph structure learning of joint entity and relation extraction (Zaratiana et al., ArXiv 2024) (Link)
    • GNNer: using graph (and Graph Neural Network) to implicitly constrain the output of neural network (Zaratiana et al., ACL 2022 SRW) (Link)
  • Structured loss functions (2022-)
    • Filtered Semi-Markov CRF (Zaratiana et al., EMNLP 2023) (Link)
    • Global Span Selection (Zaratiana et al., EMNLP 2022 UM-IoS) (Link)
  • Constrained decoding of language model (2022-)
    • Autoregressive text to graph model for Information extraction: preliminary version published at ICML 2023 workshop (Zaratiana et al., ICML 2023 SPIGM) (Link), and final version at AAAI 2024 (Zaratiana et al., AAAI 2024) (Link)
  • Zero-shot Learning for Information Extraction (2023-)
    • GLiNER (Zaratiana et al., NAACL 2024): A model designed for Zero-shot Named Entity Recognition (GitHub).
    • GraphER (Zaratiana et al., ArXiv 2024): An end-to-end model for zero-shot joint entity and relation extraction (GitHub).
  • Arabic Information Extraction (2023-)
    • Cross-Dialectal Named Entity Recognition in Arabic (El Khbir et al., ArabNLP 2023) (Link)
    • A Span-Based Approach for Flat and Nested Arabic Named Entity Recognition: Top 1 system at Wojood NER shared task (El Khbir et al., ArabNLP 2023) (Link)
  • Language model pretraining (2024-)
    • Training a BERT-like model without positional encoding (Zaratiana et al., ICLR 2024 Tiny Paper) (Link)
  • Detection of machine-generated text (2024-)
    • Our system ranked top 2 at SemEval "Multidomain, Multimodal and Multilingual Machine-Generated Text Detection" shared task using a 300M parameters model. Top 1 used a 70b model. (Ben Fares et al., SemEval 2024) (Link)

Selected OSS Projects

GLiNER (1.1k ⭐)

A lightweight model for Named Entity Recognition (NER) using a BERT-like transformer.

GraphER (45 ⭐)

End-to-end zero-shot entity and relation extraction.

ATG (36 ⭐)

An autoregressive text-to-graph framework for joint entity and relation extraction.

struct_ie (2 ⭐)

Structured Information Extraction with Large Language Models.