• Home
  • About
  • Methodology
  • More
    • Home
    • About
    • Methodology
  • Home
  • About
  • Methodology

 

View Methodology

Independent verification and calibration for AI systems

We evaluate whether model confidence survives contact with reality using pre-registered, reproducible methodologies. 

Learn More

PUBLIC CASE STUDY SUMMARY

DistilBERT SST-2 Calibration Evaluation

DistilBERT SST-2 Calibration Evaluation

DistilBERT SST-2 Calibration Evaluation

 

Executive Summary

Avenridge Institute conducted a pre-registered calibration evaluation of the publicly available model:


  • distilbert-base-uncased-finetuned-sst-2-english

using the SST-2 validation dataset.


The objective was not to evaluate raw classification accuracy alone, but to evaluate whether the model's probability outputs were statistically calibrated relative to observed outcomes.


The evaluation protocol was locked before execution using a documented pre-registration methodology.

Key Result

DistilBERT SST-2 Calibration Evaluation

DistilBERT SST-2 Calibration Evaluation

 

The model achieved:

  • Accuracy: 91.06%
  • Brier Skill Score: +0.6664

However:

the model failed the pre-registered calibration criterion.

The strongest observed pattern was bimodal overconfidence:

  • Predictions near 0% confidence were substantially more wrong than implied by the model probabilities.
  • Predictions near 100% confidence were also materially overconfident.

The evaluation demonstrated that:

strong benchmark accuracy does not necessarily imply reliable probabilistic calibration.

Get the Report

Contact Us

Inquiries regarding verification methodology, calibration audits, and institutional partnerships are welcome.

avenridgeinstitute.com

Hours

Today

Closed

Drop us a line!

Attach Files
Attachments (0)

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Cancel

Copyright © 2026 avenridgeinstitute.com - All Rights Reserved.

Powered by

  • Home

This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

Accept