Beyond Retinal: Machine Learning Models for Photochemical Control in Rhodopsins
Hector RCD Awardee Prof. Dr. Carolin Müller
Hector Fellow Prof. Dr. Klaus Robert Müller
Hector Fellow Prof. Dr. Peter Hegemann
The project develops a machine‑learning framework that can accurately predict excited‑state properties of rhodopsins. To this end, a QM/MM dataset of retinal derivatives in protein‑like environments is generated and used to train an extended SO3LR model. Through iterative synthesis and ESR‑STM‑like spectroscopy of deliberately mutated rhodopsin variants, the predictions are continuously validated and optimized. The final models enable fast, reliable estimates of absorption and emission spectra as well as photochemical reaction pathways, thereby constituting a data‑driven platform for the rational design of new light‑responsive proteins. The project involves Prof. Dr. Carolin Müller, Prof. Dr. Klaus Robert Müller, and Prof. Dr. Peter Hegemann.
Rhodopsins are light‑sensitive proteins that contain a covalently bound retinal chromophore as the photo‑active unit. Although all rhodopsins share this common core, they exhibit a wide spectrum of photochemical reactions – from simple E/Z isomerizations to multi‑step pathways – and consequently different functions. This diversity arises from the subtle interplay between the intrinsic reactivity of the chromophore and the modulating effect of the surrounding protein matrix on the dark state, the excited state, and the photoproduct state. A deep understanding of these interactions is a prerequisite for deciphering the molecular principles of biological light perception and for deliberately controlling photochemical reactivity. Experimental methods such as time‑resolved UV/Vis and Raman spectroscopy provide valuable data, but the ultrafast dynamical processes complicate their interpretation and often lead to speculative structure‑property relationships. Quantum‑chemical simulations of the excited state offer mechanistic insight, yet they are practically inaccessible for the large chromophore‑protein complexes of rhodopsins.
The proposed research addresses this limitation by developing a machine‑learning (ML) framework that describes excited states in covalently bound systems, using rhodopsins as a model. First, a high‑quality QM/MM dataset of retinal derivatives in protein‑like environments will be generated, containing both ground‑ and excited‑state properties (geometries, TD‑DFT energies, oscillator strengths, non‑adiabatic couplings). Based on this dataset, the existing SO3LR model will be extended. Work package 1 focuses on adapting SO3LR for rapid and accurate prediction of ground‑ and emission spectra by curating about 100 rhodopsin structures from the Protein Data Bank and supplementing them with high‑level QM/MM calculations (ground‑state geometry optimizations, TD‑DFT vertical excitations, excited‑state minima). Work package 2 introduces a fragment‑biased graph‑neural‑network encoding that highlights the retinal fragment, thereby better capturing the local electronic and geometric changes that govern excited‑state properties. Work package 3 employs the refined models to predict photochemical reaction pathways; static reference data will be generated by interpolating geometries between relevant minima, constructing conical intersections and CASPT2‑optimized potential energy surfaces, and the trained network will provide energies, forces and approximate non‑adiabatic couplings for the S₀ and S₁ states, which will be fed into surface‑hopping dynamics (e.g., SHARC) to obtain reaction rates, branching ratios and product yields. Work package 4 closes the iterative learning loop: model‑suggested variants (e.g., red‑shifted absorption or high fluorescence quantum yield) will be engineered, expressed in Pichia pastoris or HEK cells, purified by affinity chromatography and characterized by steady‑state and time‑resolved Raman, UV/Vis and FTIR spectroscopy, with femtosecond Raman measurements performed in collaboration with external partners. The experimental results will be fed back into model optimisation.
The project combines the expertise of Prof. Dr. Klaus Robert Müller (machine learning for chemistry and physics), Prof. Dr. Carolin Müller (high‑quality QM/MM data and extension of ML models for excited states) and Prof. Dr. Peter Hegemann (synthetic, expressed and spectroscopically studied rhodopsin derivatives). By integrating mass‑selected ion soft‑landing with ESR‑STM, a methodological breakthrough is created that does not yet exist, providing a modular platform for the controlled assembly of arbitrary molecular building blocks and their spin coupling, and which can be seamlessly transferred to larger biomolecules such as metalloproteins. In the long term, an open toolbox will be established for the community, linking fundamental surface physics with quantum information and sensing, and laying the foundation for the next generation of molecular quantum simulators and optogenetic tools.
Figure 2: Illustration of the overarching project objective: Developing excited-state machine learning (ML) models to go beyond retinal model systems (left) to predict photoinduced phenomena of retinal within its native protein environment (colored boxes). This will be addressed by combining computational chemistry, machine learning, and spectroscopy to establish a foundational ML framework.
Supervised by

Carolin Müller
Chemistry, Informatics
Hector RCD Awardee since 2024
Klaus-Robert Müller
Informatics, Mathematics & PhysicsHector Fellow since 2023

Peter Hegemann
Biology, Chemistry & MedicineHector Fellow since 2015


