GitHub - abdulhaim/llm_writing_distortion (original) (raw)

LLM Writing Distortion

arXiv GitHub

This repository accompanies the paper:
How LLMs Distort Our Written Language
by Marwa Abdulhai, Isadora White, Yanming Wan, Ibrahim Qureshi, Joel Z. Leibo, Max Kleiman-Weiner, Natasha Jaques.


Overview

We study these effects across three evaluation settings:


Repository Structure

llm_writing_distortion/
├── iclr_analysis/          # Diversity and homogenization analysis on ICLR co-writing data
├── argrewrite_analysis/    # Analysis of LLM-assisted revisions on the ArgRewrite corpus
├── ArgRewrite/             # ArgRewrite dataset files
├── NRC-Emotion-Lexicon/    # NRC Emotion Lexicon for sentiment and emotion scoring
├── human_evaluation/       # Human evaluation data and analysis notebooks
├── requirements.txt
└── README.md

Quick Start

Installation

We recommend setting up a clean conda environment:

git clone https://github.com/abdulhaim/llm_writing_distortion cd llm_writing_distortion conda create --name writing_distortion python=3.10 conda activate writing_distortion pip install -r requirements.txt

Running the Analysis

Each subdirectory contains Jupyter notebooks that can be run independently:


Datasets

ArgRewrite-v2

The ArgRewrite-v2 corpus (Chen et al., 2022) contains 86 argumentative essays written by university students in 2021, each paired with expert feedback and a human-revised second draft. Because this dataset predates the release of ChatGPT, it enables a clean counterfactual comparison: what would a human have written versus what an LLM produces given the same essay and the same expert feedback.

NRC Emotion Lexicon

The NRC Word-Emotion Association Lexicon (Mohammad and Turney, 2013) maps English words to eight basic emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust) and two sentiments (positive, negative). Used to quantify shifts in affective tone induced by LLM revisions relative to human edits.

LIWC-22

The Linguistic Inquiry and Word Count (LIWC-22) tool (Boyd et al., 2022; Tausczik and Pennebaker, 2010) categorizes words across 90+ dimensions including summary variables (analytic thinking, clout, authenticity), grammatical categories (pronouns, prepositions), and psychological processes (cognitive mechanisms, social processes). Used to measure shifts in analytical thinking style and authenticity between human-written and LLM-edited text.

ICLR 2026 Peer Reviews

Peer reviews from ICLR 2026, with LLM-generation labels from the Pangram AI classifier. We analyze 18k reviews drawn from papers that received exactly one fully human-written and one fully LLM-generated review, ensuring unbiased sampling across conditions.


Dependencies

pandas
numpy
matplotlib
scikit-learn
sentence_transformers

Citation

If you use this code or build on this analysis, please cite the associated work.