Evidencing LLM Misuse · KDD 2026 Hands-on Tutorial

01 — Overview

When models reproduce protected text, how do you prove it?

Large language models increasingly reproduce copyrighted passages and paraphrase protected sources — raising urgent legal, ethical, and scientific questions about how to evidence such misuse. This tutorial reframes copyright infringement and plagiarism as an evidence-discovery process rather than a binary classification task, and equips attendees with practical, reproducible forensic tools to audit models even under black-box access.

Across three hours we cover two complementary halves. Part I — Copyright Infringement introduces the legal framing and walks through Copyright Detective, an interactive forensic system unifying content-recall testing, paraphrase-level similarity analysis, persuasive-jailbreak probing, and unlearning verification. Part II — Plagiarism Detection builds on “Do Language Models Plagiarize?” to examine verbatim, paraphrase, and idea-level plagiarism in model outputs — and how to measure each. Attendees leave able to run these audits themselves.

What you’ll learn

🔎

Content-recall testing

Surface verbatim memorization through next-passage prediction and direct probing of source texts.

🧬

Paraphrase-level similarity

Catch leakage that survives rewording using ROUGE, semantic, Jaccard, and MinHash signals.

🔓

Persuasive-jailbreak probing

Stress-test refusal mechanisms with Ethos, Alliance-Building, and Reciprocity strategies.

🧠

Unlearning verification

Probe whether “forgotten” content is truly erased via representational drift analysis.

📑

Plagiarism taxonomy

Distinguish verbatim, paraphrase, and idea-level plagiarism in generated text.

⚖️

Evidence-first auditing

Frame infringement as discoverable evidence and produce defensible, reproducible audit reports.

02 — Schedule

A three-hour, two-part program

Tentative agenda · all times in Korea Standard Time (UTC+9) · subject to change.

1:30

Welcome & motivation

Why LLM copyright and plagiarism are evidence problems, not classification problems. Legal landscape and scope of the tutorial.
1:45

Part I
Copyright infringement & Copyright Detective

Content-recall testing, paraphrase-level similarity, persuasive-jailbreak probing, and unlearning verification — demonstrated live in the interactive forensic system.
2:55

Coffee break
3:10

Part II
Plagiarism detection in LLMs

From “Do Language Models Plagiarize?” — measuring verbatim, paraphrase, and idea-level plagiarism, with hands-on detection workflows.
4:15

Open problems, Q&A & wrap-up

Responsible deployment, limitations of current methods, and where the research goes next.
4:30

End of tutorial

03 — Venue & date

Jeju Island, South Korea

Conference: ACM SIGKDD / KDD 2026
Conference dates: August 9–13, 2026
Tutorial session: Wednesday, August 12, 2026 · 1:30 PM – 4:30 PM
Venue: International Convention Center Jeju (ICC Jeju)
Seogwipo-si, Jeju-do, South Korea
Format: Hands-on tutorial · 3 hours · in person
Time zone: Korea Standard Time (UTC+9)

View larger map ↗

04 — Related materials

Slides, code, demos & the proposal

📄

Tutorial proposal PDF

The full accepted proposal for the KDD 2026 tutorial.

Part I

LLM Copyright

📊 SlidesTutorial slides (PDF) 💻 CodeCopyright Detective · GitHub 🚀 DemoInteractive web demo 📝 PaperarXiv:2602.05252

Part II

LLM Plagiarism

📊 SlidesComing soon Soon

💻 CodeComing soon Soon

🚀 DemoComing soon Soon

🎥 Demonstration video

05 — Speakers

Your tutors

Denghui Zhang

Assistant Professor

Stevens Institute of Technology

Corresponding tutor

✉︎ dzhang42@stevens.edu 🌐 Homepage

GZ

Guangwei Zhang

Algorithm Engineer

Pine AI

Lead author, Copyright Detective

✉︎ kwongwai@19pine.ai 🌐 Homepage

Dongwon Lee

Professor

The Pennsylvania State University

Plagiarism & trustworthy AI

✉︎ dongwon@psu.edu 🌐 Homepage

06 — References

Zhang, G., Zhu, J., Qian, C., Gong, N., Mihalcea, R., Xu, Z., He, J., Ma, J., Huang, Y., Xiao, C., Li, B., Abbasi, A., Lee, D., Ji, H., & Zhang, D. (2026). Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks. arXiv preprint arXiv:2602.05252 [cs.CL].
arxiv.org/abs/2602.05252 ↗
Lee, J., Le, T., Chen, J., & Lee, D. (2023). Do Language Models Plagiarize? In Proceedings of the ACM Web Conference 2023 (WWW ’23), pp. 3637–3647.
doi.org/10.1145/3543507.3583199 ↗

Cite the system

@misc{zhang2026copyrightdetective,
  title  = {Copyright Detective: A Forensic System to Evidence LLMs
            Flickering Copyright Leakage Risks},
  author = {Guangwei Zhang and Jianing Zhu and Cheng Qian and Neil Gong
            and Rada Mihalcea and Zhaozhuo Xu and Jingrui He and Jiaqi Ma
            and Chaowei Xiao and Bo Li and Ahmed Abbasi and Dongwon Lee
            and Heng Ji and Denghui Zhang},
  year   = {2026},
  eprint = {2602.05252},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url    = {https://arxiv.org/abs/2602.05252}
}

Evidencing
LLM Misuse

When models reproduce protected text, how do you prove it?

What you’ll learn

Content-recall testing

Paraphrase-level similarity

Persuasive-jailbreak probing

Unlearning verification

Plagiarism taxonomy

Evidence-first auditing

A three-hour, two-part program

Welcome & motivation

Copyright infringement & Copyright Detective

Coffee break

Plagiarism detection in LLMs

Open problems, Q&A & wrap-up

End of tutorial

Jeju Island, South Korea

Slides, code, demos & the proposal

Tutorial proposal PDF

LLM Copyright

LLM Plagiarism

🎥 Demonstration video

Your tutors

Denghui Zhang

Guangwei Zhang

Dongwon Lee

Read more

Cite the system

EvidencingLLM Misuse

When models reproduce protected text, how do you prove it?

What you’ll learn

Content-recall testing

Paraphrase-level similarity

Persuasive-jailbreak probing

Unlearning verification

Plagiarism taxonomy

Evidence-first auditing

A three-hour, two-part program

Welcome & motivation

Copyright infringement & Copyright Detective

Coffee break

Plagiarism detection in LLMs

Open problems, Q&A & wrap-up

End of tutorial

Jeju Island, South Korea

Slides, code, demos & the proposal

Tutorial proposal PDF

LLM Copyright

LLM Plagiarism

🎥 Demonstration video

Your tutors

Denghui Zhang

Guangwei Zhang

Dongwon Lee

Read more

Cite the system

Evidencing
LLM Misuse