Research on text watermark algorithm for generative large language models
Wang, Pengjie (2025)
Kandidaatintyö
Wang, Pengjie
2025
School of Engineering Science, Tietotekniikka
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2025050738159
https://urn.fi/URN:NBN:fi-fe2025050738159
Tiivistelmä
In recent years, the widespread use of large language models has raised issues of traceability and copyright. This requires the use of methods to distinguish whether a piece of text was written by a human or generated by a large language model. Large language model text watermarking techniques aim to distinguish whether the source of a piece of text is a human or a large language model. However, these approaches usually require a trade-off between detectability, text quality, and robustness against attacks. Existing methods use fixed watermark strength parameters, which cannot be flexibly adjusted for different contextual environments. In addition, embedding specific watermarking information in the text to realize multi-bit text watermarking is still a difficult task.
Therefore, for the problem that watermarking cannot be flexibly adjusted according to the context, this thesis develops a controlled watermarking technique, and experiments show that the proposed method achieves a better trade-off between detectability, text quality, and robustness against attacks, and outperforms the original method.
For the scenarios where multi-bit information needs to be embedded in the text, on the basis of the controlled watermarking technique, the direction of multi-bit watermarking is further explored, and a controlled intermittent embedding technique for multi-bit watermarking is developed, which can reliably insert additional information. Experimental results show that the method achieves reliable accuracy.
Therefore, for the problem that watermarking cannot be flexibly adjusted according to the context, this thesis develops a controlled watermarking technique, and experiments show that the proposed method achieves a better trade-off between detectability, text quality, and robustness against attacks, and outperforms the original method.
For the scenarios where multi-bit information needs to be embedded in the text, on the basis of the controlled watermarking technique, the direction of multi-bit watermarking is further explored, and a controlled intermittent embedding technique for multi-bit watermarking is developed, which can reliably insert additional information. Experimental results show that the method achieves reliable accuracy.