Representation Learning and Causality: Theory, Practice, and Implications for Mechanistic Interpretability
Florent Draye - Hector Fellow Bernhard Schölkopf
This research projec aims to contribute to the development of methods that extract informative and interpretable features from high-dimensional datasets, with a focus on uncovering high-level causally related factors that describe meaningful semantics of the data. This, in turn, can help us gain deeper insights into the representations found within advanced generative models, particularly foundation models and LLMs, with the goal of improving their efficiency and safety.