How this works
We count what people say happened. We don't grade their LLM, and we're not pretending to.
Voting
- One vote per hashed client per UTC day.
- Hashes are salted server-side and not reversible to you.
- Votes reset at 00:00 UTC.
Confessions
- Required: the model, what happened, the failure type.
- Optional: impact area, what evidence you have, severity, did it double down.
- We don't host files. "evidence" is a label, not an upload.
Most reported
- The default view — raw share of submitted reports for each model.
- Not adjusted for how many people use that model overall.
- A model with more users will likely appear more here, all else equal.
- This view powers the daily results, monthly rankings, and the heatmap.
Relative to usage
- A second view, only shown when there is enough first-party data to support it.
- Shows complaints per 100 participating users who said they used that model in the window.
- The denominator comes from an optional weekly survey: "Which AI did you use most this week?"
- This is self-reported. It reflects the people who use this site, not the general population.
- Each model is only shown when at least 10 respondents reported using it in the window. Below that threshold, the result is suppressed.
- This is not a market-wide claim. Do not treat it as one.
What's public
- The feed, the heatmap, monthly ranks, 30-day trends.
- Flagged records are excluded from all of those, pending review.
- Rankings are volume of votes only unless you switch to the Relative to usage tab.
Time
- Everything is UTC. The monthly view resets on the 1st at 00:00 UTC.
- Usage survey responses are bucketed by UTC week (Monday start).