Tolga Bolukbasi

Research Scientist
Google DeepMind


[Scholar] [Github] [Twitter] [Linkedin] [S]

I am a research scientist in the Language Team at Google DeepMind. I currently work on data quality for pretraining and fine tuning stages of large language models (Gemini).

Research-wise, I am passionate about training data attribution at scale, i.e. measuring how much each output is influenced by each training example, and using this insight to improve model quality, enable data curation with feedback from the model, and discover causal links between the training data and model behavior. Over the years, I have also worked on interpretability and model understanding for language and vision models, including feature and example-level attribution methods, counterfactual analysis and concepts in embedding spaces.

I also enjoy engineering and have built large scale AI/ML infrastructure for model and dataset debugging and understanding, e.g. Google Cloud XAI and model internals-based retrieval for large transformers on billions of examples.

I graduated from Boston University with a PhD in Electrical and Computer Engineering. My thesis is on machine learning with natural trade-offs. Some examples of this include cost-quality trade-off at inference-time for computer vision models, and bias-representation trade-off for language models.

Selected Publications

For a full list of my publications, see my Google Scholar.
Last updated: December, 2023