← BACK TO LIVE RANKINGSABOUT AI STUPID LEVEL
Independent watchdog platform for AI model performance monitoring
We're an independent watchdog platform monitoring AI model performance to protect developers and businesses from undisclosed capability reductions. Built from frustration. Driven by transparency. Community-owned.
[→] OUR MISSION
In early 2024, developers noticed something troubling: AI models they relied on seemed to be performing worse over time. OpenAI's GPT-4 appeared "dumber" than at launch. Claude started refusing more requests. But no one was systematically tracking these changes.
AI Stupid Level was born from frustration. We built this platform because:
→AI vendors don't disclose model changes
Silent updates, capability reductions, and performance shifts happen without warning
→Existing benchmarks are incomplete
Single measurements, no confidence intervals, no drift detection
→Developers deserve transparency
You need reliable data to choose AI providers and build production systems
→The industry needs accountability
Independent monitoring keeps vendors honest
[→] OUR TEAM
TA
The Architect
Lead Researcher and Platform Engineer
→10+ years in AI/ML infrastructure and performance optimization
→Former Senior Engineer at enterprise AI platforms
→Expert in statistical analysis and algorithm design
→Open source contributor to ML tooling ecosystem
Our methodology has been reviewed and validated by statisticians, ML researchers, and industry practitioners. We welcome contributions from the community — check our GitHub:
Web •
API
[→] FUNDING AND INDEPENDENCE
✓100% Independent Funding
Supported through community donations, sponsorships, and grant funding. No revenue from AI vendors.
✓No Vendor Relationships
Zero financial relationships with OpenAI, Anthropic, Google, xAI, or any AI model provider.
✓No Affiliate Links
We don't earn commissions from API signups or referrals. All rankings are merit-based.
✓Own Infrastructure
All benchmarks run on our servers using our API keys. No vendor influence whatsoever.
✓Transparent Methodology
Complete source code, benchmark tasks, and scoring algorithms are publicly auditable.
HOW WE FUND OPERATIONS
Enterprise Data Licensing
Premium datasets for security teams, compliance officers, and ML researchers
Community Support
Donations from developers who value independent AI monitoring
Sponsorships
Non-vendor companies supporting open source AI infrastructure
Research Grants
Grants for AI evaluation and transparency projects
[→] METHODOLOGY VALIDATION
✓ Open Source Since 2024
500+ GitHub stars, community code reviews, full transparency in implementation
✓ Peer Reviewed
Statistical methodology reviewed by academic researchers in ML evaluation
✓ Community Validated
Referenced in technical blogs, Reddit discussions, and developer communities
✓ User Verifiable
"Test Your Keys" feature allows independent verification of all benchmarks
[→] ENTERPRISE DATA LICENSING
Beyond our free public platform, we offer premium enterprise datasets that provide deeper insights into AI model behavior, safety vulnerabilities, and performance patterns.
Safety and Security Dataset
Comprehensive adversarial testing results including jailbreak attempts, prompt injection vulnerabilities, and safety bypass patterns.
→10,000+ adversarial test results/month
→Vulnerability profiles by model and attack type
→Compliance-ready security reports
Bias and Fairness Dataset
Statistical analysis of performance variations across demographic groups, gender bias indicators, and EU AI Act compliance metrics.
→5,000+ demographic variant tests/month
→Gender, ethnicity, and age bias analysis
→EU AI Act compliance documentation
Robustness and Reliability Dataset
Prompt sensitivity analysis, consistency metrics, hallucination patterns, and behavioral stability measurements.
→15,000+ prompt variation tests/month
→Hallucination detection and classification
→Failure mode taxonomy and examples
Version and Regression Dataset
Model version tracking, performance regression root cause analysis, API update correlation, and historical genealogy.
→Complete version change timeline
→Regression diagnostics and root causes
→Automated incident detection and alerts
INTERESTED IN ENTERPRISE DATA ACCESS?
Continuously updated datasets including historical data going back to platform launch. Custom data packages, API access, and dedicated support available.
VIEW PRICING AND CONTACT SALES →
[→] OPEN SOURCE AND TRANSPARENCY
Public API
All benchmark data accessible via API. Download historical scores, confidence intervals, and trends.
GET /api/dashboardDetailed Documentation
Complete technical documentation of our 7-axis scoring, CUSUM drift detection, and statistical methods.
Read Methodology →Test Your Keys
Run benchmarks with your own API keys to verify we're not making up numbers.
Test Now →
[→] OUR VALUES
Scientific Rigor
We use proper statistical methods, confidence intervals, and peer-reviewed algorithms. No hand-waving, no marketing fluff — just math.
Radical Transparency
Everything is open source. Every decision documented. Every benchmark reproducible. Trust through verification, not through claims.
Independence
No vendor funding. No affiliate revenue. No conflicts of interest. Our only loyalty is to developers who need accurate data.
Community First
Built by developers, for developers. We listen to feedback, accept contributions, and evolve based on community needs.
[→] CONTACT AND SOCIAL
READY TO EXPLORE?
Start with our live rankings, learn the methodology, or verify our benchmarks yourself.
AI Stupid Level • Independent benchmarking since 2024 •
View Rankings