The Value of Failure Taxonomy
Detecting that an AI system is failing is the easier problem. A failure taxonomy maps observable loss signals to the system layer responsible and a …
Read Full ArticleDetecting that an AI system is failing is the easier problem. A failure taxonomy maps observable loss signals to the system layer responsible and a …
Read Full ArticleWhen GPT-4 launched in March 2023, the topic mix of real user prompts shifted in the exact direction the model quality gaps would predict: coding and …
Read Full ArticleCalifornia Denti-Cal records show Anaheim's payment intensity rose after the March 2015 ownership transition while every peer office fell or held …
Read Full ArticleThe model quality framework originally relied on scalar ratings and active days. By replacing those inputs with ARC trajectory scores and 7-day …
Read Full ArticleUsing synthetic data and a causal framework, I modeled why flat-rate Claude subscriptions likely broke down for heavy third-party OAuth users and …
Read Full ArticleGlobal AI usage is becoming more concentrated, not less. The top twenty countries account for 48% of all per-capita AI usage, and the physical cost of …
Read Full ArticleBloom flags that a model shows self-preferential bias. ARC tells you whether the bias set in at turn one or drifted in six rounds later. Those are …
Read Full ArticleA classification framework is itself an instrument, and instruments fail in specific ways. Before putting ARC into production, the right move is to …
Read Full ArticleMost agent evaluations only check whether the final answer was right. That is like judging effort in a mountain race by finish time alone. ARC is a …
Read Full ArticleEvery AI company claims better models mean more engagement. I built a framework to actually test that claim, and the answer isn't what most people …
Read Full Article