Tim G. J. Rudner Keen
"Trustworthy Agents and Collusion?"
What info hides in model outputs? How does it get there? How do we tell attack from defense? Provenance meets oversight.
Models may hide information in their reasoning traces and outputs in ways that evade monitors and oversight pipelines. Hidden information also includes what models inadvertently disclose: memorized training data recoverable from outputs and proprietary model internals extractable through query access.
Watermarks embed a recoverable signal for provenance, while steganographic collusion and monitor evasion embed signals to escape oversight. These are formally the same object viewed from opposite sides — a defense for one is an attack on the other — yet the communities working on them rarely speak to one another.
This workshop brings the provenance side (watermarking) and the oversight side (steganography detection, monitoring, control, and interpretability) together, alongside the cryptography, privacy, and infrastructure communities that share the problem.
| 08:00–08:15 | Opening remarks |
| 08:15–09:00 | Keynote 1 — Tim G. J. Rudner. Trustworthy Agents and Collusion? |
| 09:00–09:15 | Coffee break |
| 09:15–10:00 | Keynote 2 — Usman Anwar. Chain-of-thought monitoring and its information-theoretic limits |
| 10:00–10:30 | Speed networking across communities |
| 10:30–11:30 | Oral lightning presentations (2 parallel tracks, 4×15 min each) |
| 11:30–12:15 | Panel: progress on taxonomy |
| 12:15–13:45 | Lunch and poster session 1 |
| 13:45–14:30 | Keynote 3 — Mia Hopman. Covert behavior in deployed agents |
| 14:30–14:45 | Coffee break |
| 14:45–15:30 | Keynote 4 — Hua Shen. Aligning humans to AI: how people evaluate and oversee what models surface |
| 15:30–16:30 | Poster session 2 and coffee |
| 16:30–17:00 | Closing remarks |
"Trustworthy Agents and Collusion?"
"Chain-of-thought monitoring and its information-theoretic limits"
"Covert behavior in deployment"
"Aligning humans to AI: how people evaluate and oversee what models surface"
CS PhD, University of Buenos Aires. Two-time MATS scholar (mentored by Adrià Garriga-Alonso, then Neel Nanda and Arthur Conmy). Lead author on Chain-of-Thought Reasoning In The Wild Is Not Always Faithful (ICML 2026). Co-founded AI Safety Argentina (AISAR).
Previously at the Center for Human-Compatible AI. Vice Events Chair at the Northwestern University AI Safety and Governance Group. Research interests: mechanistic interpretability, AI control under limited oversight, and safe deployment of LLMs across low-resource languages.
Organizes NYC AI safety and security events with up to a hundred attendees. Advisory board of Collider, a NYC AI safety co-working space. Contributes to Poseidon Research's work on steganography in LLMs.
BS Electrical and Computer Engineering, Princeton (2026). Previously at Microsoft Azure Networking. Founded Princeton's OrangeHat Collective cybersecurity club. Research interests: profiling public LLM deployments from a network and systems perspective, systems for ML, programmable networks, and SmartNIC offload.
Former ML and engineering leadership at Citadel, Avant, and Spring Labs. At Citadel held four head-of-function roles across global equities engineering, core data engineering, portfolio management and risk, and research and modeling. Research in AI interpretability, control, and steganography.
Submissions open upon acceptance of the workshop proposal. Details on submission length, format, dual-submission policy, and key dates will be posted here.