Hugo Talibart

Hi! I'm Hugo, a passionate computer scientist.

Over 8+ years in research labs (INRIA Rennes, MNHN Paris, Université Libre de Bruxelles), I designed and implemented algorithms. So far, computational biology was my main playground: the problems are fascinating, the data is complex, and there's always so many different ways to look at them.

I've used computer science and mathematical tools (Integer Linear Programming (ILP), graphical statistical models, formal grammars, deep generative models) to address problems in protein sequence alignment, mutational landscape prediction, molecular generation, and bias-controlled benchmarking of machine learning predictors.

On the side, I also build things for fun: wikiddle.com, a daily puzzle game website based on Wikipedia.

LinkedIn email CV

Projects

2017 – 2021 INRIA Rennes PhD researcher

I started my career at INRIA (Institut National de Recherche en Informatique et Automatique) as a PhD researcher, under the supervision of François Coste. Together we introduced PPalign, a method for pairwise Markov Random Field alignment based on Integer Linear Programming, with application to protein sequences. Previous methods modelled protein sequence families using Hidden Markov Models (HMMs), looking at each position independently ("how frequent is this amino acid at this position?"). We proposed to use Potts models, which also capture dependencies between positions: not just individual frequencies but also how residues interact with each other. Aligning such models is significantly harder: the pairwise dependencies make the problem NP-hard. Our contribution was to formulate it as an ILP problem and solve it in tractable time, outperforming HMM baselines in alignment quality.

Early in my thesis, I also had the opportunity to work with Witold Dyrka from Politechnika Wrocławska (Wrocław, Poland) on training probabilistic context-free grammars on protein sequence families using structural contact constraints.

2021 – 2022 MNHN Paris Postdoctoral researcher

I then joined the National Museum of Natural History (MNHN Paris, Sorbonne Université) for a year as a postdoctoral researcher with Mathilde Carpentier. We experimented on alternative statistical approaches to build canonical, comparable models, and extended the PPalign solver to take into account the fact that insertions are more likely at some positions than others.

2023 – Present ULB Brussels Postdoctoral researcher

I then moved to Brussels to work with Dimitri Gilis at ULB (Université Libre de Bruxelles).

My first project there was a Transformer variational autoencoder trained to embed molecules into a structured, continuous latent space where distances reflect similarity, with application to molecular representation and optimization. The model was shipped as a user-friendly pip package.

I also collaborated with Matsvei Tsishyn on StructureDCA, a sparse Potts model inference method with structural constraints. The method, based on a C++ solver with a Python interface, achieves state-of-the-art performance in mutational landscape prediction.

Currently, I'm working on a framework for benchmark construction: given a dataset, it selects and partitions samples to control biases such as entities having strongly skewed label distributions or similar entities leaking between train and test. Machine learning models, especially deep learning models, can achieve impressive performances because they are very good at finding patterns - including hidden biases in the data. The idea is to build benchmarks with that in mind: preventing models from exploiting shortcuts and assess whether they would actually perform well in realistic conditions. Enforcing all these constraints simultaneously is a combinatorial problem which we solved using Integer Linear Programming, to be released as a pip package soon. The framework is already being applied to olfactory receptor activation prediction as part of a Master's thesis I am supervising.

Software

OptiDUP (coming soon)

A Python package for bias-controlled benchmark construction: undersampling and partitioning under user-defined constraints, powered by ILP.

Python ILP OR-Tools

chembed

A Python package for molecular encoding and generation. Embeds molecules into a structured continuous latent space where distances reflect chemical similarity, enabling smooth exploration and generation of novel molecules. Large-scale Transformer VAE, released with pre-trained weights, ready to use or fine-tune.

Python PyTorch Transformer Variational Autoencoder generative AI

pip install chembed

github