Elevating AI, Ensuring Trust

We champion open-source research. We prioritize simplicity. We pursue practical solutions.

Our Research


TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification

arXiv, 2024

Large Language Models (LLMs) come with usage rules to protect interests and prevent misuse. This study introduces Black-box Identity Verification (BBIV), aiming to identify if a service uses a specific LLM via chat for compliance. The method, Targeted Random Adversarial Prompt (TRAP), uses adversarial suffixes to get a pre-defined answer from the specific LLM, while other models give random answers. TRAP offers a novel approach for ensuring compliance with LLM usage policies.


ProPILE: Probing Privacy Leakage in Large Language Models

NeurIPS 2023 (spotlight)

Large language models (LLMs) absorb web data, which may include sensitive personal information. Our tool, ProPILE, acts as a detective, helping individuals assess potential personal data exposure within LLMs. It allows users to tailor prompts to monitor their data, fostering awareness and control over personal information in the age of LLMs.