Urchade Zaratiana

I am a Member of Technical Staff at Fastino 🦊🇺🇸, where I lead the Data Structuring team 🧠. I'm currently based in Île de la Réunion 🇫🇷🇷🇪.

I hold a PhD from the Laboratoire Informatique de Paris Nord (LIPN) 🏫, completed under the supervision of Nadi Tomeh and Thierry Charnois. My doctoral research focused on structured prediction for Natural Language Processing 🧠📊.

I'm passionate about deep learning, particularly in areas such as domain generalization 🌐, zero/few-shot/instruction learning 🛠️📚, learning under noisy data/labels 📉🔍, scaling laws 📈, and simplicity bias 🎛️.

During my PhD, I have worked on the following research problems:

Structured decoding (2021-)

Named Entity Recognition as Structured Span Prediction (Zaratiana et al., EMNLP 2022 UM-IoS)

EnriCO: Constrained decoding of information extraction using logical rules (Zaratiana et al., ArXiv 2024)

Graph structure learning (2021-)

GraphER: End-to-end graph structure learning of joint entity and relation extraction (Zaratiana et al., ArXiv 2024)

GNNer: Using graph (and Graph Neural Network) to implicitly constrain the output of neural network (Zaratiana et al., ACL 2022 SRW) (Link)

Structured loss functions (2022-)

Filtered Semi-Markov CRF (Zaratiana et al., EMNLP 2023)

Global Span Selection (Zaratiana et al., EMNLP 2022 UM-IoS)

Constrained decoding of language model (2022-)

Autoregressive text-to-graph model for Information extraction (Zaratiana et al., AAAI 2024)

Zero-shot Learning for Information Extraction (2023-)

GLiNER (Zaratiana et al., NAACL 2024): Zero-shot Named Entity Recognition ()

GraphER (Zaratiana et al., ArXiv 2024): An end-to-end model for zero-shot joint entity and relation extraction

Arabic Information Extraction (2023-)

Cross-Dialectal Named Entity Recognition in Arabic (El Khbir et al., ArabNLP 2023)

A Span-Based Approach for Flat and Nested Arabic Named Entity Recognition: Top 1 system at Wojood NER shared task (El Khbir et al., ArabNLP 2023)

Language model pretraining (2024-)

Training a BERT-like model without positional encoding (Zaratiana et al., ICLR 2024 Tiny Paper)

Detection of machine-generated text (2024-)

Our system ranked top 2 at SemEval "Multidomain, Multimodal and Multilingual Machine-Generated Text Detection" shared task using a 300M parameters model (Ben Fares et al, 2024)

Welcome to Urchade Zaratiana's Webpage