← BACK TO LIVE RANKINGS
ABOUT AI STUPID LEVEL
Independent watchdog platform for AI model performance monitoring
We're an independent watchdog platform monitoring AI model performance to protect developers and businesses from undisclosed capability reductions. Built from frustration. Driven by transparency. Community-owned.
[→] OUR MISSION
In early 2024, developers noticed something troubling: AI models they relied on seemed to be performing worse over time. OpenAI's GPT-4 appeared "dumber" than at launch. Claude started refusing more requests. But no one was systematically tracking these changes.
AI Stupid Level was born from frustration. We built this platform because:
AI vendors don't disclose model changes
Silent updates, capability reductions, and performance shifts happen without warning
Existing benchmarks are incomplete
Single measurements, no confidence intervals, no drift detection
Developers deserve transparency
You need reliable data to choose AI providers and build production systems
The industry needs accountability
Independent monitoring keeps vendors honest

[→] OUR TEAM
TA
The Architect
Lead Researcher and Platform Engineer
10+ years in AI/ML infrastructure and performance optimization
Former Senior Engineer at enterprise AI platforms
Expert in statistical analysis and algorithm design
Open source contributor to ML tooling ecosystem
Our methodology has been reviewed and validated by statisticians, ML researchers, and industry practitioners. We welcome contributions from the community — check our GitHub: Web API

[→] FUNDING AND INDEPENDENCE
100% Independent Funding
Supported through community donations, sponsorships, and grant funding. No revenue from AI vendors.
No Vendor Relationships
Zero financial relationships with OpenAI, Anthropic, Google, xAI, or any AI model provider.
No Affiliate Links
We don't earn commissions from API signups or referrals. All rankings are merit-based.
Own Infrastructure
All benchmarks run on our servers using our API keys. No vendor influence whatsoever.
Transparent Methodology
Complete source code, benchmark tasks, and scoring algorithms are publicly auditable.
HOW WE FUND OPERATIONS
Enterprise Data Licensing
Premium datasets for security teams, compliance officers, and ML researchers
Community Support
Donations from developers who value independent AI monitoring
Sponsorships
Non-vendor companies supporting open source AI infrastructure
Research Grants
Grants for AI evaluation and transparency projects

[→] METHODOLOGY VALIDATION
Open Source Since 2024
500+ GitHub stars, community code reviews, full transparency in implementation
Peer Reviewed
Statistical methodology reviewed by academic researchers in ML evaluation
Community Validated
Referenced in technical blogs, Reddit discussions, and developer communities
User Verifiable
"Test Your Keys" feature allows independent verification of all benchmarks

[→] ENTERPRISE DATA LICENSING
Beyond our free public platform, we offer premium enterprise datasets that provide deeper insights into AI model behavior, safety vulnerabilities, and performance patterns.
Safety and Security Dataset
Comprehensive adversarial testing results including jailbreak attempts, prompt injection vulnerabilities, and safety bypass patterns.
10,000+ adversarial test results/month
Vulnerability profiles by model and attack type
Compliance-ready security reports
Bias and Fairness Dataset
Statistical analysis of performance variations across demographic groups, gender bias indicators, and EU AI Act compliance metrics.
5,000+ demographic variant tests/month
Gender, ethnicity, and age bias analysis
EU AI Act compliance documentation
Robustness and Reliability Dataset
Prompt sensitivity analysis, consistency metrics, hallucination patterns, and behavioral stability measurements.
15,000+ prompt variation tests/month
Hallucination detection and classification
Failure mode taxonomy and examples
Version and Regression Dataset
Model version tracking, performance regression root cause analysis, API update correlation, and historical genealogy.
Complete version change timeline
Regression diagnostics and root causes
Automated incident detection and alerts
INTERESTED IN ENTERPRISE DATA ACCESS?
Continuously updated datasets including historical data going back to platform launch. Custom data packages, API access, and dedicated support available.
VIEW PRICING AND CONTACT SALES →

[→] OPEN SOURCE AND TRANSPARENCY
Full Source Code
Every line of code is public on GitHub. Audit our methodology, suggest improvements, or run locally.
Frontend (Web) →Backend (API) →
Public API
All benchmark data accessible via API. Download historical scores, confidence intervals, and trends.
GET /api/dashboard
Detailed Documentation
Complete technical documentation of our 7-axis scoring, CUSUM drift detection, and statistical methods.
Read Methodology →
Test Your Keys
Run benchmarks with your own API keys to verify we're not making up numbers.
Test Now →

[→] OUR VALUES
Scientific Rigor
We use proper statistical methods, confidence intervals, and peer-reviewed algorithms. No hand-waving, no marketing fluff — just math.
Radical Transparency
Everything is open source. Every decision documented. Every benchmark reproducible. Trust through verification, not through claims.
Independence
No vendor funding. No affiliate revenue. No conflicts of interest. Our only loyalty is to developers who need accurate data.
Community First
Built by developers, for developers. We listen to feedback, accept contributions, and evolve based on community needs.

[→] CONTACT AND SOCIAL
For General Inquiries
Twitter/X: @GOATGameDev →GitHub: ionutvi →Reddit: r/aistupidlevel →
For Technical Questions
Read our FAQ →Review methodology docs →GitHub Discussions →
READY TO EXPLORE?
Start with our live rankings, learn the methodology, or verify our benchmarks yourself.
VIEW LIVE RANKINGS
Current AI model performance scores
LEARN METHODOLOGY
Understand how we benchmark
TEST YOUR KEYS
Verify benchmarks with your API keys
AI Stupid Level • Independent benchmarking since 2024 • View Rankings