Tolga Bolukbasi

Research Scientist
Google DeepMind

email:

[Scholar] [Github] [Twitter] [Linkedin] [S]

I am a research scientist in the Language Team at Google DeepMind. I currently work on data quality for pretraining and fine tuning stages of large language models (Gemini).

Research-wise, I am passionate about training data attribution at scale, i.e. measuring how much each output is influenced by each training example, and using this insight to improve model quality, enable data curation with feedback from the model, and discover causal links between the training data and model behavior. Over the years, I have also worked on interpretability and model understanding for language and vision models, including feature and example-level attribution methods, counterfactual analysis and concepts in embedding spaces.

I also enjoy engineering and have built large scale AI/ML infrastructure for model and dataset debugging and understanding, e.g. Google Cloud XAI and model internals-based retrieval for large transformers on billions of examples.

I graduated from Boston University with a PhD in Electrical and Computer Engineering. My thesis is on machine learning with natural trade-offs. Some examples of this include cost-quality trade-off at inference-time for computer vision models, and bias-representation trade-off for language models.

Updates

Organizing the first workshop on attributing model behavior at scale ATTRIB at NeurIPS 2023. What makes ML models tick? How do we attribute model behavior to the training data, algorithm, architecture, or scale used in training?

Selected Publications

For a full list of my publications, see my Google Scholar.

Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs

Kelvin Guu*, Albert Webson*, Ellie Pavlick, Lucas Dixon, Ian Tenney, Tolga Bolukbasi*

arxiv, 2023.

TLDR: Most existing influence functions are additive and it is hard to verify if they are faithful to the model's reasoning. We propose a method that can learn non-additive influence (diminishing returns, example redundancy etc.) and a new seq2seq framework to evaluate faithfulness of the influence values.
Towards Tracing Factual Knowledge in Language Models Back to the Training Data

Ekin Akyürek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, Kelvin Guu

EMNLP, 2022.

TLDR: Introduce a new evaluation task for factuality attribution. Benchmark influence functions' ability to attribute facts back to their source for LLMs.
Man is to computer programmer as woman is to homemaker? debiasing word embeddings

Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, Adam T Kalai

NeurIPS, 2016.

TLDR: Discover concept directions in neural networks and measure their effect in predictions. This allows us to decompose model outputs to human-aligned concepts. We then demonstrate one such concept, gender, and how it affects the model's representation.

Last updated: December, 2023