Apart

Responsible Disclosure Policy V0.3

Read about how Apart handles dissemination of dual use research

The goal of our responsible disclosure policy is to mitigate the risk of Apart disseminating information that could lead to potentially dangerous and hazardous use of AI or substantially weaken defenses against AI risks, also known as info hazards. Apart recognizes the high value of open science and reproducible research and is committed to making our scientific output widely accessible (within the limits of this responsible disclosure policy).

Key Questions and Answers

What do we mean by "info hazards"?

By "info hazards," we refer to the disclosure of information that could lead to potentially dangerous and hazardous use of AI or substantially weaken defenses against AI risks.

What are examples of info hazards?

The following is a very non-exhaustive list of examples:

  • Presenting a jailbreak that works on current state-of-the-art models with a higher probability of success than already known jailbreaks.
  • Open-sourcing a technique or dataset that allows model developers to fake good performance on critical safety evaluations (without making models intrinsically safer) in the absence of obvious solutions for evaluation designers.
  • Publishing a guide with novel and innovative methods for how current AI models can be used for mass manipulation.
Which type of information disclosure falls under the responsible disclosure policy?

Our responsible disclosure policy covers information disclosure by members of the Apart team, as well as information disclosure by Apart Lab fellows or Apart Sprint participants on the research undertaken during their fellowship or sprint.

It covers information in written as well as oral form insofar as it contains original research or information that is not yet widely known to the audience.

This includes but is not limited to:

  • Research papers (both peer-reviewed and preprint)
  • Blog posts and forum posts
  • Conference presentations
  • Podcast appearances
Which risk levels do we distinguish, and what is the appropriate course of action?
  • Highest: Do not disclose the existence of the project if mere knowledge of its existence poses a significant risk.
  • High: Disclose the existence of the project but do not publish findings or code.
  • Intermediate: Redact critical implementation details while disclosing the main findings. Do not publish critical pieces of code; pure data analysis code may still be published.
  • Low: All details of the project are allowed to be published.
When, how, and by whom will projects be evaluated for potential info hazards?
  • Apart Sprint write-ups will be reviewed for info hazards at the end of the research sprint as part of the judging process for the research sprint by the jury for the corresponding sprint
  • Apart Lab fellowship projects will be reviewed for info hazards at the end of the research sprint (see above) and additionally by Apart Lab mentors before publication of any write-up (including preprints and blog posts).
  • All other information within the scope of the responsible disclosure policy must be reviewed for info hazards by an Apart team member before disclosure
How will decisions under the responsible disclosure policy be communicated to all stakeholders?
  • If an Apart Sprint project or an Apart Lab fellowship project is assessed as anything other than low risk, all project team members will be informed by an Apart mentor in writing about the assessment and the consequences for disclosure.
  • If any other information within the scope of the responsible publication policy is assessed as anything other than low risk, the authors or team members of the corresponding project, as well as all members of the leadership team, will be informed in writing about the assessment and the consequences for publishing.
Who can review and update the responsible disclosure policy?

The responsible disclosure policy can be updated by a majority decision of the Apart leadership team to adapt to emerging risks or improve our handling of info hazards.