PhD project

Modelling proteins with long-distance dependencies

I am currently carrying out a PhD at the University of Rennes 1, in Dyliss team, under the supervision of François Coste and Jacques Nicolas.

My thesis focuses on protein homology search and protein family modelling. As of today, the standard way of modelling a set of protein sequences is to use profile Hidden Markov Models. These statistical models represent protein families with positional information and insertion and deletion states. Widely used packages such as HH-suite provide tools to perform sequence-HMM alignment and HMM-HMM alignment and the similarity score they yield is used to determine whether two protein sequences are homologous or not. While powerful, HMMs are limited by their positional nature. Yet, it is well-known that two residues that are distant in the sequence can co-evolve, for instance because they are in contact in the 3D structure. In our work, we propose to use a different type of model: the Potts model. This model was introduced by Direct Coupling Analysis, originally to predict contacts and protein-protein interactions. Its parameters can describe both positional conservation and couplings between positions, which make it a great candidate to model sets of homologous proteins.

During my PhD, I designed ComPotts, a method to align two Potts models and give a similarity score for this alignment. Preliminary experiments to assess the quality of our alignments [1] on a small dataset of 59 reference pairwise alignments yielded encouraging results: a better recall and a slightly better precision than our main competitors (HH-align, MRFalign and BLAST). We are currently working on larger scale validation experiments (to be published [2]), and the next step will be to apply it to homology search.

I presented ComPotts at the conference JOBIM 2020 during the Protein and Structure parallel session on July 3 2020, my slides are available here.

Early in my thesis, I also had the opportunity to work with Witold Dyrka from Politechnika Wrocławska (Wrocław, Poland) on probabilistic context-free grammars with contact map constraints on proteins [3]. In this work, we established a framework to train probabilistic context-free grammars representing families of protein sequences using information on contact constraints.

Publications

[1] Hugo Talibart and François Coste. Compotts: Optimal alignment of coevolutionary models for protein sequences. In JOBIM 2020-Journées Ouvertes Biologie, Informatique et Mathématiques, 2020. [ bib ]
[2] Hugo Talibart and François Coste. Compotts: Optimal alignment of coevolutionary models for protein sequences. unpublished, 2020. [ bib ]
[3] Witold Dyrka, Mateusz Pyzik, François Coste, and Hugo Talibart. Estimating probabilistic context-free grammars for proteins using contact map constraints. PeerJ, 7:e6559, 2019. [ bib ]

Science outreach

As part of the 2018 Sciences en Cour[t]s festival, I explained my PhD project in a short video using slips of paper. (Video in french)

Resume

Download full resume here

Experience

2017-now Dyliss Team, IRISA, Université de Rennes 1, France
PhD thesis Modelling proteins with long-distance dependencies. Supervised by François Coste and Jacques Nicolas.

2017 LITIS Lab, Université de Rouen, France
6-month internship Supervised learning of emergent structures in agent-based simulations. Supervised by Pierrick Tranouez.

2016 King's College London, United Kingdom
3-month internship RNA meta-stable secondary structures clustering. Supervised by Kathleen Steinhöfel.

Education

2016-2017 Université de Rouen, France
Master's Degree in Theoretical Computer Science Automata theory, String Algorithms in Bioinformatics

2012-2017 Institut National des Sciences Appliquées de Rouen (INSA Rouen), France
Engineering Degree in Mathematics Statistics, Artificial Intelligence, Operations Research

Contact

Dyliss team, Irisa / Inria Rennes-Bretagne Atlantique,
Campus de Beaulieu, 35042 Rennes Cedex, France.
Tel: +33 (0) 2 99 84 22 32
E-mail: