Hugo Talibart

Hi! I'm Hugo, a passionate computer scientist.

Over 8+ years in research labs (INRIA Rennes, MNHN Paris, Université Libre de Bruxelles), I designed and implemented algorithms. So far, computational biology was my main playground: the problems are fascinating, the data is complex, and there's always so many different ways to look at them.

I've used computer science and mathematical tools (Integer Linear Programming (ILP), graphical statistical models, formal grammars, deep generative models) to address problems in protein sequence alignment, mutational landscape prediction, molecular generation, and bias-controlled benchmarking of machine learning predictors.

On the side, I also build things for fun: wikiddle.com, a daily puzzle game website based on Wikipedia.

Projects

2017 – 2021 INRIA Rennes PhD researcher

I started my career at INRIA (Institut National de Recherche en Informatique et Automatique) as a PhD researcher, under the supervision of François Coste. Together we introduced PPalign, a method for pairwise Markov Random Field alignment based on Integer Linear Programming, with application to protein sequences. Previous methods modelled protein sequence families using Hidden Markov Models (HMMs), looking at each position independently ("how frequent is this amino acid at this position?"). We proposed to use Potts models, which also capture dependencies between positions: not just individual frequencies but also how residues interact with each other. Aligning such models is significantly harder: the pairwise dependencies make the problem NP-hard. Our contribution was to formulate it as an ILP problem and solve it in tractable time, outperforming HMM baselines in alignment quality.

Early in my thesis, I also had the opportunity to work with Witold Dyrka from Politechnika Wrocławska (Wrocław, Poland) on training probabilistic context-free grammars on protein sequence families using structural contact constraints.

2021 – 2022 MNHN Paris Postdoctoral researcher

I then joined the National Museum of Natural History (MNHN Paris, Sorbonne Université) for a year as a postdoctoral researcher with Mathilde Carpentier. We experimented on alternative statistical approaches to build canonical, comparable models, and extended the PPalign solver to take into account the fact that insertions are more likely at some positions than others.

2023 – Present ULB Brussels Postdoctoral researcher

I then moved to Brussels to work with Dimitri Gilis at ULB (Université Libre de Bruxelles).

My first project there was a Transformer variational autoencoder trained to embed molecules into a structured, continuous latent space where distances reflect similarity, with application to molecular representation and optimization. The model was shipped as a user-friendly pip package.

I also collaborated with Matsvei Tsishyn on StructureDCA, a sparse Potts model inference method with structural constraints. The method, based on a C++ solver with a Python interface, achieves state-of-the-art performance in mutational landscape prediction.

Currently, I'm working on a framework for benchmark construction: given a dataset, it selects and partitions samples to control biases such as entities having strongly skewed label distributions or similar entities leaking between train and test, preventing machine learning from exploiting shortcuts, and allowing for a more reliable evaluation. Enforcing all these constraints simultaneously is a combinatorial problem which we solved using Integer Linear Programming, to be released as a pip package soon. The framework is already being applied to olfactory receptor activation prediction as part of a Master's thesis I am supervising.

Software

OptiDUP (coming soon)
A Python package for bias-controlled benchmark construction: undersampling and partitioning under user-defined constraints, powered by ILP.
Python ILP OR-Tools
A Python package for molecular encoding and generation. Embeds molecules into a structured continuous latent space where distances reflect chemical similarity, enabling smooth exploration and generation of novel molecules. Large-scale Transformer VAE, released with pre-trained weights, ready to use or fine-tune.
Python PyTorch Transformer Variational Autoencoder generative AI
A Python package for protein mutation effect prediction. Sparse structure-informed graphical model inference with a C++ solver backend.
Python C++
Exact ILP-based alignment of Potts models, applied to protein sequence alignment.
Python C++ ILP

Publications

Matsvei Tsishyn, Hugo Talibart, Marianne Rooman, Fabrizio Pucci
bioRxiv · 2026
Witold Dyrka, Mateusz Pyzik, François Coste, Hugo Talibart
PeerJ · 2019

Side projects

A daily puzzle game based on Wikipedia: guess a mystery article based on the links it contains. Full-stack, self-hosted.
JavaScript Python FastAPI SQL

Education

PhD in Computer Science 2017 – 2021
INRIA · Université de Rennes 1, Rennes, France
Master in Theoretical Computer Science 2016 – 2017
Université de Rouen, Rouen, France
Engineering Degree in Mathematics 2012 – 2017
Institut National des Sciences Appliquées (INSA), Rouen, France

Skills

Languages
Python, C/C++, JavaScript, HTML/CSS, Bash
ML & Data
PyTorch, NumPy, SciPy, Pandas, scikit-learn
Tools
Git, Conda, Docker / Singularity
Environment
Linux, cluster computing (Slurm)
Languages
French (native), English (fluent)