ACM SIGKDD 2026 · Lecture-style Hands-on Tutorial

Evidencing
LLM Misuse

A hands-on forensic tutorial on copyright infringement and plagiarism detection in large language models.

01 — Overview

When models reproduce protected text, how do you prove it?

Large language models increasingly reproduce copyrighted passages and paraphrase protected sources — raising urgent legal, ethical, and scientific questions about how to evidence such misuse. This tutorial reframes copyright infringement and plagiarism as an evidence-discovery process rather than a binary classification task, and equips attendees with practical, reproducible forensic tools to audit models even under black-box access.

Across three hours we cover two complementary halves. Part I — Copyright Infringement introduces the legal framing and walks through Copyright Detective, an interactive forensic system unifying content-recall testing, paraphrase-level similarity analysis, persuasive-jailbreak probing, and unlearning verification. Part II — Plagiarism Detection builds on “Do Language Models Plagiarize?” to examine verbatim, paraphrase, and idea-level plagiarism in model outputs — and how to measure each. Attendees leave able to run these audits themselves.

What you’ll learn

🔎

Content-recall testing

Surface verbatim memorization through next-passage prediction and direct probing of source texts.

🧬

Paraphrase-level similarity

Catch leakage that survives rewording using ROUGE, semantic, Jaccard, and MinHash signals.

🔓

Persuasive-jailbreak probing

Stress-test refusal mechanisms with Ethos, Alliance-Building, and Reciprocity strategies.

🧠

Unlearning verification

Probe whether “forgotten” content is truly erased via representational drift analysis.

📑

Plagiarism taxonomy

Distinguish verbatim, paraphrase, and idea-level plagiarism in generated text.

⚖️

Evidence-first auditing

Frame infringement as discoverable evidence and produce defensible, reproducible audit reports.

02 — Schedule

A three-hour, two-part program

Tentative agenda · all times in Korea Standard Time (UTC+9) · subject to change.

  1. 1:30

    Welcome & motivation

    Why LLM copyright and plagiarism are evidence problems, not classification problems. Legal landscape and scope of the tutorial.

  2. 1:45
    Part I

    Copyright infringement & Copyright Detective

    Content-recall testing, paraphrase-level similarity, persuasive-jailbreak probing, and unlearning verification — demonstrated live in the interactive forensic system.

  3. 2:55

    Coffee break

  4. 3:10
    Part II

    Plagiarism detection in LLMs

    From “Do Language Models Plagiarize?” — measuring verbatim, paraphrase, and idea-level plagiarism, with hands-on detection workflows.

  5. 4:15

    Open problems, Q&A & wrap-up

    Responsible deployment, limitations of current methods, and where the research goes next.

  6. 4:30

    End of tutorial

03 — Venue & date

Jeju Island, South Korea

Conference
ACM SIGKDD / KDD 2026
Conference dates
August 9–13, 2026
Tutorial session
Wednesday, August 12, 2026 · 1:30 PM – 4:30 PM
Venue
International Convention Center Jeju (ICC Jeju)
Seogwipo-si, Jeju-do, South Korea
Format
Hands-on tutorial · 3 hours · in person
Time zone
Korea Standard Time (UTC+9)
04 — Related materials

Slides, code, demos & the proposal

05 — Speakers

Your tutors

06 — References

Read more

  1. Zhang, G., Zhu, J., Qian, C., Gong, N., Mihalcea, R., Xu, Z., He, J., Ma, J., Huang, Y., Xiao, C., Li, B., Abbasi, A., Lee, D., Ji, H., & Zhang, D. (2026). Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks. arXiv preprint arXiv:2602.05252 [cs.CL].

    arxiv.org/abs/2602.05252 ↗
  2. Lee, J., Le, T., Chen, J., & Lee, D. (2023). Do Language Models Plagiarize? In Proceedings of the ACM Web Conference 2023 (WWW ’23), pp. 3637–3647.

    doi.org/10.1145/3543507.3583199 ↗

Cite the system

@misc{zhang2026copyrightdetective,
  title  = {Copyright Detective: A Forensic System to Evidence LLMs
            Flickering Copyright Leakage Risks},
  author = {Guangwei Zhang and Jianing Zhu and Cheng Qian and Neil Gong
            and Rada Mihalcea and Zhaozhuo Xu and Jingrui He and Jiaqi Ma
            and Chaowei Xiao and Bo Li and Ahmed Abbasi and Dongwon Lee
            and Heng Ji and Denghui Zhang},
  year   = {2026},
  eprint = {2602.05252},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url    = {https://arxiv.org/abs/2602.05252}
}