Freda Shi

Greetings! I am a final-year Ph.D. student at the Toyota Technological Institute at Chicago. I am grateful to be advised by Professors Karen Livescu and Kevin Gimpel, and to be supported by a Google Ph.D. Fellowship since Autumn 2021. I am currently a visiting student at the MIT Department of Brain and Cognitive Sciences, hosted by Professor Roger Levy. I completed my Bachelor's degree in Intelligence Science and Technology (Computer Science Track) in 2018 at Peking University, with a minor in Sociology.


  • 05/2023: I have accepted a position as Assistant Professor (starting Summer 2024) in the David R. Cheriton School of Computer Science at the University of Waterloo and a Faculty Member at the Vector Institute.
    Prospective Graduate Students: The FAQ page and my advising statement may answer some of your questions, and I appreciate it if you read them before reaching out.
    Prospective Undergrad RAs and Visiting Students: Please complete a practice task to demonstrate your interest and skills, and submit your application here. Due to bandwidth limitation, I am sorry that I am not able to reply any email regarding internship application if you have not completed a practice task.
  • 02/2024: Talk at the Vector NLP workshop. Check out the slides.
  • 10/2023: Talk at the University of Michgan, Ann Arbor. Check out the slides.
  • 09/2023: Talk at Peking University. Content covered in this talk largely overlaps with my academic job talk in spring 2023 and (forthcoming) thesis. Check out the slides.

Research Interests

My research interests are in computational linguistics and natural language processing, and I am particularly interested in learning language through grounding, computational multilingualism and related topics. Representative work includes the grounded syntax and semantics learners, the contextualized bilingual lexicon inducer, and the substructure-based zero-shot cross-lingual dependency parser. Recently, I have also worked on analyzing pre-trained large language models from the views of cross-lingual performance, distractability, and semantic parsing (in a broad sense). For more details, check out my research topics and academic c.v.

Publications show selected / show all by date / show all by topic

Topics: Syntax / Semantics / Multilingualism / Others (*: Equal Contribution)

Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing
Freda Shi, Kevin Gimpel, Karen Livescu

Working Paper 2024 Paper / Code

Audio-Visual Neural Syntax Acquisition
Cheng-I Jeff Lai*, Freda Shi*, Puyuan Peng*, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

ASRU 2023 Paper / Code / arXiv

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Kaustubh D. Dhole et al.

NEJLT 2023 Paper / Code / arXiv

Large Language Models Can Be Easily Distracted by Irrelevant Context
Freda Shi*, Xinyun Chen*, Kanishka Misra, Nathan Scales, David Dohan, Ed H. Chi, Nathanael Schärli, Denny Zhou

ICML 2023 Paper / arXiv / Data

Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi*, Mirac Suzgun*, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei

ICLR 2023 Paper / arXiv / Data

InCoder: A Generative Model for Code Infilling and Synthesis
Daniel Fried*, Armen Aghajanyan*, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

ICLR 2023 Paper / Code / arXiv

Natural Language to Code Translation with Execution
Freda Shi, Daniel Fried, Marjan Ghazvininejad, Luke Zettlemoyer, Sida I. Wang

EMNLP 2022 Paper / Code / arXiv

Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing
Freda Shi, Kevin Gimpel, Karen Livescu

ACL 2022 Paper / Code / arXiv

Deep Clustering of Text Representations for Supervision-Free Probing of Syntax
Vikram Gupta, Haoyue Shi, Kevin Gimpel, Mrinmaya Sachan

AAAI 2022 Paper / arXiv

Grammar-Based Grounded Lexicon Learning
Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum

NeurIPS 2021 Paper / arXiv / Project Page

Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment
Haoyue Shi, Luke Zettlemoyer, Sida I. Wang

ACL-IJCNLP 2021  Best Paper Nominee    Paper / Code / arXiv

Substructure Substitution: Structured Data Augmentation for NLP
Haoyue Shi, Karen Livescu, Kevin Gimpel

Findings of ACL-IJCNLP 2021 Paper / Code / arXiv

On the Role of Supervision in Unsupervised Constituency Parsing
Haoyue Shi, Karen Livescu, Kevin Gimpel

EMNLP 2020 Paper / arXiv

A Cross-Task Analysis of Text Span Representations
Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel

RepL4NLP 2020 Paper / Code / arXiv

Visually Grounded Neural Syntax Acquisition
Haoyue Shi*, Jiayuan Mao*, Kevin Gimpel, Karen Livescu

ACL 2019  Best Paper Nominee    Paper / Code / arXiv

On Tree-Based Neural Sentence Modeling
Haoyue Shi, Hao Zhou, Jiaze Chen, Lei Li

EMNLP 2018 Paper / Code / arXiv

On Multi-Sense Word Embeddings via Matrix Factorization and Matrix Transformation
Haoyue Shi

B.S. Thesis (in Simplified Chinese), Peking University School of EECS, May 2018    Paper
Best Undergraduate Dissertation Award, PKU School of EECS

Learning Visually-Grounded Semantics from Contrastive Adversarial Samples
Haoyue Shi*, Jiayuan Mao*, Tete Xiao*, Yuning Jiang, Jian Sun

COLING 2018 Paper / Code / arXiv

Constructing High Quality Sense-Specific Corpus and Word Embedding via Unsupervised Elimination of Pseudo Multi-Sense
Haoyue Shi, Xihao Wang, Yuqi Sun, Junfeng Hu

LREC 2018 Paper / Code

Joint Saliency Estimation and Matching using Image Regions for Geo-Localization of Online Video
Haoyue Shi, Jia Chen, Alexander G. Hauptmann

ICMR 2017 Paper

Real Multi-Sense or Pseudo Multi-Sense: An Approach to Improve Word Representation
Haoyue Shi, Caihua Li, Junfeng Hu

COLING Workshop CL4LC 2016 Paper / arXiv