Publications & Presentations

Publications and Presentations

Work in Progress

Nicholas Edwards,* Yukyung Lee,* Yujun Audrey Mao, Yulu Qin, Sebastian Schuster,^† and Najoung Kim.^† RExBench: Can coding agents autonomously implement AI research extensions? arXiv. (*,^†Equal contribution)

Kanishka Misra and Najoung Kim. Generating novel experimental hypotheses from language models: A case study on cross-dative generalization. arXiv.

Arkadiy Saakyan, Najoung Kim, Smaranda Muresan, Tuhin Chakrabarty. Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity. arXiv.

Eunsu Kim, Junyeong Park, Juhyun Oh, Kiwoong Park, Seyoung Song, A. Seza Doğruöz, Najoung Kim,* and Alice Oh.* Are they lovers or friends? Evaluating LLMs’ Social Reasoning in English and Korean Dialogue. arXiv. (*Equal contribution)

2025

Yulu Qin,* Dheeraj Varghese,* Adam Dahlgren Lindström, Lucia Donatelli, Kanishka Misra,^† and Najoung Kim.^† Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It. NeurIPS. (*,^†Equal contribution)

Yukyung Lee, JoongHoon Kim, Jaehee Kim, Hyowon Cho, Jaewook Kang, Pilsung Kang, and Najoung Kim. CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists. EMNLP.

Zilu (Peter) Tang, Qiao Zhao, Gabriel Franco, Geneva Yang, Angelos Poulis, Derry Wijaya, Aaron Mueller, Sebastian Schuster, and Najoung Kim (2025). Mechanistic Understanding of Entity Tracking in Natural Language involving Multiple Operations. New England Mechanistic Interpretability Workshop (NEMI).

Hayley Ross, Kathryn Davidson, and Najoung Kim. Is analogy enough to draw novel adjective-noun inferences? SCiL.

Aditya Yedetore and Najoung Kim. Implicit mechanisms for symbol manipulation in RNNs. NENLP.

Abulhair Saparov, Srushti Pawar, Shreyas Pimpalgaonkar, Nitish Joshi, Richard Yuanzhe Pang, Vishakh Padmakumar, Seyed Mehran Kazemi, Najoung Kim,* and He He.* Transformers Struggle to Learn to Search Without In-context Exploration. ICLR. (*Equal contribution)

Hayley Ross, Najoung Kim, and Kathryn Davidson. Fake reefs are sometimes reefs and sometimes not, but are always compositional. ELM 3.

2024

Aditya Yedetore and Najoung Kim. Semantic Training Signals Promote Hierarchical Syntactic Generalization in Neural Networks. EMNLP.

Najoung Kim,* Sebastian Schuster,* and Shubham Toshniwal.* Code Pretraining Improves Entity Tracking Abilities of Language Models. arXiv. (*Equal contribution)

Arkadiy Saakyan, Josh Lee, Michal Todorovic, Deepak Ramachandran, Quan Yuan, Isabelle Guyon, and Najoung Kim. Evaluating Critic Models for Human-AI Co-Creation: A Case Study with AI Critiques of Presentation Slides.

Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, and He He. Personas as a Way to Model Truthfulness in Language Models. EMNLP.

Hayley Ross, Kathryn Davidson, and Najoung Kim. Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don’t mimic the full human distribution. GenBench @ EMNLP. 👑Best paper award

Najoung Kim and Paul Smolensky. Structural Generalization of Modification in Adult Learners of an Artificial Language. CogSci.

Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, Youwei Liang, Georgina Evans, Sahil Singla, Gang Li, Adrian Weller, Junfeng He, Deepak Ramachandran, and Krishnamurthy Dj Dvijotham. Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation. AIES.

Zhaofeng Wu, Linlu Qiu, Alexis Ross, Ekin Akyürek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, and Yoon Kim. Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks. NAACL.

Ashwin Daswani, Rohan Sawant, and Najoung Kim. Syn-(QA)^2: Evaluating False Assumptions in Long-tail Questions with Synthetic QA Datasets. arXiv.

2023

Kanishka Misra and Najoung Kim. Abstraction via exemplars? A representational case study on lexical category inference in BERT. The 47th Boston University Conference on Language Development (BUCLD).

Bingzhi Li, Lucia Donatelli, Alexander Koller, Tal Linzen, Yuekun Yao, and Najoung Kim. SLOG: A Structural Generalization Benchmark for Semantic Parsing. The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Jason Wei,* Najoung Kim,* Yi Tay, and Quoc V. Le. Inverse scaling can become U-shaped. The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). (*Equal contribution)

Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim,* and He He.* Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples. The Conference on Neural Information Processing Systems (NeurIPS). (*Equal contribution)

Mehran Kazemi, Quan Yuan, Deepti Bhatia, Najoung Kim, Xin Xu, Vaiva Imbrasaite, and Deepak Ramachandran. BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information. The Conference on Neural Information Processing Systems (NeurIPS), Datasets and Benchmark Track.

Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zhengping Zhou, Najoung Kim, Samuel R. Bowman, and Ethan Perez. Inverse Scaling: When Bigger Isn’t Better. Transactions on Machine Learning Research (TMLR). Featured certification

Wentao Wang, Wai Keen Vong, Najoung Kim, and Brenden M. Lake. Finding Structure in One Child’s Linguistic Experience. Cognitive Science.

Najoung Kim,* Phu Mon Htut,* Samuel R. Bowman, and Jackson Petty. (QA)^2: Question Answering with Questionable Assumptions. Annual Conference of the Association for Computational Linguistics (ACL). (*Equal contribution)

Najoung Kim* and Sebastian Schuster.* Entity Tracking in Language Models. Annual Conference of the Association for Computational Linguistics (ACL). (*Equal contribution) 👑Area Chair Award

Seyed Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, and Deepak Ramachandran. LAMBADA: Backward Chaining for Automated Reasoning in Natural Language. Annual Conference of the Association for Computational Linguistics (ACL).

Najoung Kim, Jatin Khilnani, Alex Warstadt, and Abed Qaddoumi. Reconstruction Probing. Findings of the Annual Conference of the Association for Computational Linguistics (ACL).

2022

Najoung Kim, Tal Linzen, and Paul Smolensky. Uncontrolled Lexical Exposure Leads to Overestimation of Compositional Generalization in Pretrained Models. arXiv.

2021

Najoung Kim. Compositional Linguistic Generalization in Artificial Neural Networks. PhD Dissertation, Johns Hopkins University.

Najoung Kim, Ellie Pavlick, Burcu Karagol Ayan, and Deepak Ramachandran. Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering. Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL).

Najoung Kim and Paul Smolensky. Testing for Grammatical Category Abstraction in Neural Language Models. Proceedings of The Society for Computation in Linguistics (SCiL).

2020

Najoung Kim and Tal Linzen. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation. In the Proceedings of The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). [talk]

Najoung Kim, Song Feng, Chulaka Gunasekara, and Luis A. Lastras. Implicit Discourse Relation Classification: We Need to Talk About Evaluation. Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL).

Sadhwi Srinivas, Najoung Kim, and Kyle Rawlins. Maximize presupposition and the Korean demonstrative ku. Presented at The 94th Annual Meeting of the Linguistic Society of America (LSA). [poster]

2019

Najoung Kim and Tal Linzen. Compositionality as Directional Consistency in Sequential Neural Networks. Workshop on Context and Compositionality in Biological and Artificial Neural Systems, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019).

Najoung Kim, Roma Patel, Adam Poliak, Alex Wang, Patrick Xia, Tom McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Sam Bowman, and Ellie Pavlick. Probing What Different NLP Tasks Teach Machines About Function Word Comprehension. Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM). 👑Best Paper Award

Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, and Samuel R. Bowman. How to Get Past Sesame Street: Sentence-Level Pretraining Beyond Language Modeling. Proceedings of the Annual Conference of the Association for Computational Linguistics (ACL).

Najoung Kim, Jung-Ho Kim, Maria K. Wolters, Sarah E. MacPherson, and Jong C. Park. Automatic Scoring of Semantic Fluency. Frontiers in Psychology.

Najoung Kim, Kyle Rawlins, Benjamin Van Durme, and Paul Smolensky. Predicting the Argumenthood of English Prepositional Phrases. Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-2019).

Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Sam Bowman, Dipanjan Das, and Ellie Pavlick. What do you learn from context? Probing for sentence structure in contextualized word representations. International Conference on Learning Representations (ICLR).

2018

Najoung Kim, Kyle Rawlins, and Paul Smolensky. A gradient blend analysis of English PP verbal dependents. Conference on Interdisciplinary Approaches to Linguistic Theory (CiALT) 2, Berlin, Oct 2018.

Najoung Kim, Kyle Rawlins, and Paul Smolensky. A gradient blend analysis of English PP verbal dependents. Acceptability judgments in current linguistic theory, Universitat Aut`onoma de Barcelona, Oct 2018.

Najoung Kim, Benjamin Van Durme and Paul Smolensky. Linguistically informed tasks for evaluating structure encoded by sentence representations</a>. Facebook WeCNLP Summit, Menlo Park, CA, Sep 2018.

Samuel R. Bowman, Ellie Pavlick, Edouard Grave, Benjamin Van Durme, Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, and Berlin Chen. Looking for ELMo’s friends: Sentence-Level Pretraining Beyond Language Modeling. arXiv.

2016

Maria K. Wolters, Najoung Kim, Jung-Ho Kim, Sarah E. MacPherson, and Jong C. Park. Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and Cognition. Interspeech.

Najoung Kim, Jung-Ho Kim, Maria K. Wolters, Sarah E. MacPherson, and Jong C. Park. Approximating the Semantic Structures behind Category Fluency Sequences. MACSIM 6, CUNY, New York, Oct 2016.

Jung-Ho Kim, Najoung Kim, Hancheol Park. and Jong C. Park. Enhanced Sign Language Transcription System via Hand Tracking and Pose Estimation. Journal of Computing Science and Engineering vol 10.3.

Najoung Kim and Jong C. Park. A Morphological Approach to the Longitudinal Detection of Dementia. Proceedings of HCI Korea 2016, The HCI Society of Korea.