Watermarking

Cross-model paraphrasing drops Kirchenbauer watermark detection from 100% to 60% in a single pass. After ten passes, it plateaus at 40%. The watermark is partially robust — but not enough for adversarial settings where the attacker has access to any LLM. I measured this by watermarking text with GPT-2, paraphrasing with Claude Haiku, and tracking how the z-score decays. Five experiments. Six pre-registered hypotheses. Real green-list watermarking with logit access. ...