MAXIMILIAN MOZES

X  /  BLUESKY  /  EMAIL  /  GOOGLE SCHOLAR  /  LINKEDIN

ABOUT

I'm a Senior Research Scientist at Cohere. I completed my PhD in Computer Science at University College London in 2024, under the supervision of Lewis Griffin and Bennett Kleinberg. My research focuses on the intersection of adversarial machine learning and natural language processing.


I have previously interned at Google Research, working with the PAIR Team on measuring dialog safety using large language models. Prior to that, I was a Research Scientist Intern at Spotify Research, where I focused on NLP-based content moderation in podcasts.


I obtained a Bachelor's degree in Computer Science (minor in Mathematics) from the Technical University of Munich (TUM) in March 2019. During my undergraduate studies, I have worked as a visiting research scholar at the Language and Information Technologies Group of the University of Michigan's Artificial Intelligence Lab and as a research intern in the Department of Psychology at the University of Amsterdam.

PUBLICATIONS

LLMs can Implicitly Learn from Mistakes In-Context

Alazraki, L., Mozes, M., Campus, J. A., Tan, Y. C., Rei., M. and Bartolo, M.

arXiv, 2025

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

Ruis, L., Mozes, M., Bae, J., Kamalakara, S. R., Talupuru, D., Locatelli, A., Kirk, R., Rocktäschel, T., Grefenstette, E. and Bartolo, M.

ICLR, 2025

Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge

Arora, A., He, X., Mozes, M., Swain, S., Dras, M. and Xu, Q.

Findings of ACL, 2024

Challenges and Applications of Large Language Models

Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R. and McHardy, R.

arXiv, 2023

Towards Agile Text Classifiers for Everyone

Mozes, M., Hoffmann, J., Tomanek, K., Kouate, M., Thain, N., Yuan, A., Bolukbasi, T. and Dixon, L.

Findings of EMNLP, 2023

Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning

Mozes, M., Bolukbasi, T., Yuan, A., Liu, F., Thain, N. and Dixon, L.

arXiv, 2023

Large Language Models Respond to Influence like Humans

Griffin, L.D., Kleinberg, B., Mozes, M., Mai, K., Vau, M., Caldwell, M. and Mavor-Parker, A.

First Workshop on Social Influence in Conversations (SICon), ACL, 2023

Textwash -- Automated Open-Source Text anonymization

Kleinberg, B., Davies, T. and Mozes, M.

arXiv, 2022

Identifying Human Strategies for Generating Word-Level Adversarial Examples

Mozes, M., Kleinberg, B. and Griffin, L.D.

Findings of EMNLP, 2022

Scene Graph Generation for Better Image Captioning?

Mozes, M., Schmitt, M., Golkov, V., Schütze, H. and Cremers, D.

arXiv, 2021

A Repeated-Measures Study on Emotional Responses After a Year in the Pandemic

Mozes, M., van der Vegt, I. and Kleinberg, B.

Scientific Reports, 11(1), 1-11, 2021

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Mozes, M., Bartolo, M., Stenetorp, P., Kleinberg, B. and Griffin, L.D.

EMNLP, 2021

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

Mozes, M., Stenetorp, P., Kleinberg, B. and Griffin, L.D.

EACL, 2021

No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving Text Anonymization

Mozes, M. and Kleinberg, B.

arXiv, 2021

The Grievance Dictionary: Understanding Threatening Language Use

van der Vegt, I., Mozes, M., Kleinberg, P. and Gill, P.

Behavior Research Methods, 2021

Online Influence, Offline Violence: Linguistic Responses to the "Unite the Right" Rally

van der Vegt, I., Mozes, M., Gill, P. and Kleinberg, B.

Journal of Computational Social Science, 2020

Measuring Emotions in the COVID-19 Real World Worry Dataset

Kleinberg, B., van der Vegt, I. and Mozes, M.

NLP COVID-19 Workshop, ACL 2020

Uphill From Here: Sentiment Patterns in Videos from Left- and Right-Wing YouTube News Channels

Soldner, F., Ho, J., Makhortykh, M., van der Vegt, I., Mozes, M. and Kleinberg, B.

Third Workshop on NLP and CSS, NAACL-HLT, 2019

Identifying the Sentiment Styles of YouTube's Vloggers

Kleinberg, B., Mozes, M. and van der Vegt, I.

EMNLP, 2018

Using Named Entities for Computer-Automated Verbal Deception Detection

Kleinberg, B., Mozes, M., Arntz, A. and Verschuere, B.

Journal of Forensic Sciences, 63(3), 714-723, 2018

Web-based Text Anonymization with Node.js: Introducing NETANOS

Kleinberg, B. and Mozes, M.

Journal of Open Source Software, 2(14), 293, 2017

NETANOS - Named Entity-based Text Anonymization for Open Science

Kleinberg, B., Mozes, M., van der Toolen, Y. and Verschuere, B.

OSF preprint, 2017

INVITED TALKS

Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

Machine Behaviour, University of Tilburg, November 2024

Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

Responsible AI Seminar Series, Nokia Bell Labs, January 2024

Adversarial Examples in Machine Learning

Crime Science, University of Amsterdam, November 2021

Examining Word-Level Adversarial Examples for Text Classification

AI Seminar Series, UCL Centre for Artificial Intelligence, September 2021

Recording available here.

Adversarial Examples in Machine Learning

Data Science for Crime Scientists and Applied Data Science, University College London, March 2021

Detecting Deception with AI: Promises and Pitfalls?

Current Topics: Psychology of AI, University of Amsterdam, November 2020

On the Robustness of Intelligent Systems

Data Science for Crime Scientists and Applied Data Science, University College London, March 2020

On the Robustness of Intelligent Systems

Foundations of Crime Science, University College London, December 2019

Assessing Potential Vulnerabilities of Emerging Artificial Intelligence Technologies

Crime Science, University of Amsterdam, May 2019

MEDIA COVERAGE

Podcast interview with Data Skeptic

Podcast discussion focussing on the paper "Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities", September 2023.

Available here.

News article Google's Jigsaw was trying to fight toxic speech with AI. Then the AI started talking

Fast Company, July 2023.

Available here.

YouTube Series 2019 SAGE Concept Grant Winner: Text Wash

SAGE Ocean, 2019.

Available here.

WORKSHOPS

9th Workshop on Representation Learning for NLP (RepL4NLP-2024)

Chen Zhao, Marius Mosbach, Pepa Atanasova, Seraphina Goldfarb-Tarrant, Peter Hase, Arian Hosseini, Maha Elbayad, Sandro Pezzelle, Maximilian Mozes

62nd Annual Meeting of ACL, August 2024, Bangkok, Thailand

8th Workshop on Representation Learning for NLP (RepL4NLP-2023)

Burcu Can, Maximilian Mozes, Samuel Cahyawijaya, Naomi Saphra, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Chen Zhao

61st Annual Meeting of ACL, July 2023, Toronto, Canada

7th Workshop on Representation Learning for NLP (RepL4NLP-2022)

Spandana Gella, He He, Bodhisattwa Prasad Majumder, Burcu Can, Eleonora Giunchiglia, Samuel Cahyawijaya, Siwon Min, Maximilian Mozes, Xiang Lorraine Li

60th Annual Meeting of ACL, May 2022, Dublin, Ireland

A Gentle Introduction to Word Embeddings for the Computational Social Sciences

Maximilian Mozes and Bennett Kleinberg

2019 European Symposium on Societal Challenges in Computational Social Science: Polarization and Radicalization, September 2019, Zurich, Switzerland

Linguistic Temporal Trajectory Analysis - a Dynamic Approach to Text Data

Bennett Kleinberg, Maximilian Mozes, and Isabelle van der Vegt

2018 European Symposium on Societal Challenges in Computational Social Science: Bias and Discrimination, December 2018, Cologne, Germany

TEACHING