Connect with us

Technology

CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark

Published

on

SAN FRANCISCO, Jan. 23, 2025 /PRNewswire/ — The Center for AI Safety (CAIS) and Scale AI today announced the results of a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge and whether the models are capable of chain-of-thought reasoning. The results demonstrated a significant improvement from the reasoning capabilities of earlier models, but current models still were only able to answer fewer than 10 percent of the expert questions correctly.

CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark

The new benchmark, called “Humanity’s Last Exam,” evaluated whether AI systems have achieved world-class expert-level reasoning and knowledge capabilities across a wide range of fields, including math, humanities, and the natural sciences. Throughout the fall, CAIS and Scale AI crowdsourced questions from experts to assemble the hardest and broadest problems to stump the AI models. The exam was developed to address the challenge of “benchmark saturation”: models that regularly achieve near-perfect scores on existing tests, but may not be able to answer questions outside of those tests. Saturation reduces the utility of a benchmark as a precise measurement of future model progress.

“We wanted problems that would test the capabilities of the models at the frontier of human knowledge and reasoning,”  said Dan Hendrycks, CAIS co-founder and executive director. “We can’t predict how quickly the models will advance. When I released the MATH benchmark—a challenging competition mathematics dataset—in 2021, the best model scored less than 10%; few predicted that scores higher than 90% would be achieved just three years later. Right now, Humanity’s Last Exam shows that there are still some expert closed-ended questions that models are not able to answer.  We will see how long that lasts.”

Testing Methodology

Altogether, CAIS and Scale researchers collected more than 70,000 trial questions. That led to a selection of 13,000 questions for human expert review which, in turn, were finalized to a set of 3,000 questions on the final exam’s public release. The questions were aimed at world-class expert levels and were put to several multi-modal, frontier LLMs including OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, Google Gemini 1.5 Pro, and OpenAI o1.

“We know the AI revolution is being shaped by human ingenuity, and we’re proud to be at the forefront. To help humans measure AI progress, we engineered what might be the ultimate test, meticulously distilled and designed to challenge the world’s most advanced models at the frontiers of intelligence—requiring precise, multi-step logical reasoning and unambiguous answers at a level that pushes even the most sophisticated AI systems to their limits.” Summer Yue, Director of Research at Scale AI said.

Humanity’s Last Exam was a global collaborative effort involving nearly 1,000 contributors from more than 500 institutions across 50 countries, with most contributors being active researchers or professors. The questions spanned multiple formats, including text-only and multi-modal challenges that integrated images and diagrams.

The questions were designed to deeply test the capability of the models across diverse domains. For example, a question submitted in Ecology asked:
“Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.”

Additional sample questions can be found here (lastexam.ai).

In the final round of testing, Yue said they saw some of the models begin to answer a fraction of the questions correctly (less than 10%); however, she said variations frequently happen in model testing and could be the result of randomness. CAIS and Scale AI said they will open up the dataset to the research community, to dig deeper into the variations and to evaluate new AI systems while continuing to explore the limitations of existing models. A small subset of questions will be held back to preserve integrity for future evaluations.

Top Questions

CAIS and Scale AI offered financial awards for the best contributions to Humanity’s Last Exam, with $5,000 USD awarded for each of the top 50 questions and $500 USD for the next 500 best submissions, along with the opportunity for coauthorship of the final paper.

“By identifying the gaps in AI’s reasoning capabilities, Humanity’s Last Exam not only benchmarks current systems but also provides a roadmap for future research and development,” said Yue.

About The Center for AI Safety
The Center for AI Safety (CAIS) is a research organization whose mission is to reduce societal-scale and national security risks from AI. CAIS research focuses on mitigating high-consequence risks in areas like monitoring, alignment, and systemic safety. CAIS works to expand the field of AI safety and security by providing compute resources and technical infrastructure to top researchers and engaging with the global research community. Through its CAIS Action Fund, CAIS advocates for safe and secure AI. CAIS was founded in 2022 and is headquartered in San Francisco.

About Scale AI
Scale AI is the Humanity-first AI Company. Backed by our Data Foundry, we generate high quality data and provide technology solutions that allow our enterprise and public sector customers to build, deploy, and evaluate the smartest AI tools and applications. By making data abundant, rigorous, and high-quality, we are accelerating the progress of AI. Scale AI was founded in 2016 and is headquartered in San Francisco.

Contact:
Richard Crook, richardc@safe.ai
Fiorella Riccobono, fiorella.riccobono@scale.com

View original content to download multimedia:https://www.prnewswire.com/news-releases/cais-and-scale-ai-unveil-results-of-humanitys-last-exam-a-groundbreaking-new-benchmark-302358108.html

SOURCE Center for AI Safety

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology

Hexagon Composites ASA: Eirik Løhre appointed permanent CFO

Published

on

By

OSLO, Norway, May 5, 2026 /PRNewswire/ — Reference is made to the stock exchange announcement dated 11 November 2025, where Eirik Løhre was appointed interim CFO in Hexagon Composites.

The Company is pleased to inform that Eirik Løhre has been appointed permanently to the role of CFO in Hexagon Composites, effective today.

Eirik Løhre has been with the Company since 2021 and prior to his role as interim CFO, he served as EVP Corporate Development on the Executive Team.  

“Eirik has demonstrated strong financial leadership and execution, and he has been instrumental in strengthening our financial performance. I look forward to continuing our work together to develop and position Hexagon in this next phase of growth,” said Philipp Schramm, CEO, Hexagon Composites. 

For more information:
Berit-Cathrin Høyvik, Senior Director, Communications, Hexagon Composites
Tel: +47 988 92 161, berit-cathrin.hoyvik@hexagongroup.com

About Hexagon Composites ASA
Hexagon delivers safe and innovative solutions for a cleaner energy future. Our solutions enable storage, transportation and conversion to clean energy in a wide range of mobility and industrial applications. Learn more at www.hexagongroup.com and follow @HexagonASA on LinkedIn.

This information was brought to you by Cision http://news.cision.com

https://news.cision.com/hexagon-composites-asa/r/hexagon-composites-asa–eirik-lohre-appointed-permanent-cfo,c4344308

 

View original content:https://www.prnewswire.co.uk/news-releases/hexagon-composites-asa-eirik-lohre-appointed-permanent-cfo-302762250.html

Continue Reading

Technology

LONGPORT Whale Enters Malaysian Market with Next Generation Trading Infrastructure for Local Brokerages

Published

on

By

LONGPORT Whale, with proven track record across 100+ institutional clients in Asia, makes its Malaysia debut at Bursa Malaysia Stockbroking Trade Fair 2026

KUALA LUMPUR, Malaysia, May 5, 2026 /PRNewswire/ — LONGPORT Whale, a provider of AI-Ready securities trading infrastructure, is making its entry into the Malaysian market at the Bursa Malaysia Stockbroking Trade Fair 2026. The move comes as Malaysia’s Capital Market Masterplan 2026–2030 (CMP4) continues to hone in on local brokerages to modernise core systems, balancing investor experience, regulatory compliance, and operational resilience simultaneously.

Malaysian brokerages are increasingly confronted by a challenge that goes beyond front-end upgrades. Legacy architectures struggle to keep pace with digital-native investor expectations, rising cybersecurity standards, and the demand for multi-market expansion simultaneously. For many such brokerages, the question is no longer whether to modernize, but how to do so without adding complexity or disrupting the business continuity that clients depend on.

Zhong Hua, CEO, LONGPORT Whale, said, “Core trading infrastructure must support continuous evolution — in investor experience, compliance, and AI readiness — without adding unnecessary complexity. The brokerages that lead the next decade won’t be the ones with the best system today; they’ll be the ones whose systems are designed to keep getting better. LONGPORT Whale aims to bring its Asia-proven experience to help Malaysian brokers strike that balance.”

Built on a cloud-native microservices architecture and trusted by more than 100 institutional clients in Asia, Whale’s platform is engineered by industry professionals and refined through years of first-hand operational experience. For the Malaysian market, it addresses four priorities: a best-in-class trading experience validated across competitive, highly regulated markets in Asia; system resilience and performance built for institutional scale, with high system performance and output, real time risk management, and low system latency; global market connectivity spanning Malaysia, Singapore, Hong Kong SAR, US, and Japan without requiring system rebuilds; and an API-first, data-unified architecture that gives brokerages a practical foundation for AI adoption.

Hong Kong SAR and Singapore, where Whale serves online brokers, traditional banking firms, banks and wealth management institutes in a stringent regulatory environment, serve as the primary reference market for its Malaysia expansion. The company said it aims to work with local industry participants as both an infrastructure partner and a contributor to broader conversation on responsible modernization under CMP4.

About LONGPORT Whale

LONGPORT Whale provides integrated securities trading infrastructure to brokers, banks, fund houses, wealth managers, and family offices across Asia. Its cloud-native platform supports multi-market, multi-asset trading across front-, middle-, and back-office workflows, with a deployment model designed for regulatory alignment and long-term scalability. Website: www.longportwhale.com

Media Contact
LONGPORT Whale PR Team
Email: media@longportwhale.com

View original content:https://www.prnewswire.com/apac/news-releases/longport-whale-enters-malaysian-market-with-next-generation-trading-infrastructure-for-local-brokerages-302761411.html

SOURCE LONGPORT Whale

Continue Reading

Technology

Thunes and Vodacom Tanzania Unite to Power Cross-Border M-Pesa Payments Across China and Uganda

Published

on

By

Collaboration revolutionises trade & financial convenience for Tanzanian merchants and consumers

SINGAPORE, May 5, 2026 /PRNewswire/ — Thunes, the Smart Superhighway to move money around the world, has joined forces with Vodacom Tanzania, the country’s leading telco company, to transform cross-border trade and digital financial inclusion with Vodacom’s new M-Pesa Global Payment solution. Thanks to the collaboration, Vodacom customers in Tanzania can now seamlessly pay merchants in Uganda and China directly from their mobile phones.

This milestone solution responds to growing demand from Tanzanian traders who engage in commerce with Ugandan and Chinese markets but often face challenges with costly, slow, and insecure payment methods. With this innovation, leveraging the Thunes Direct Global Network, Vodacom aims to bridge those gaps, offering secure, real-time digital payments across borders and reinforcing its leadership in mobile money innovation in Africa.

The solution supports trade with two key markets for Tanzania. For eight consecutive years, China has been Tanzania’s largest trading partner, with bilateral trade hitting $8.8 billion in 2024. In the same year, bilateral trade between Tanzania and Uganda reached approximately $2.23 billion, an increase of 64% on the previous year.

Epimack Mbeteni, M-Pesa Director at Vodacom Tanzania said: “This is more than just a payment feature, it is a catalyst for economic empowerment and a gateway for small and medium businesses and entrepreneurs in Tanzania to compete and thrive in regional and global markets. Through Thunes’ expansive and trusted Network we are enabling seamless, secure, and affordable cross-border payments that empower people, fuel trade and place M-Pesa at the center of Africa’s digital commerce future.”

Through Thunes’ Direct Global Network, customers can now send payments to merchants in Uganda using MTN MoMo and to Chinese merchants through the Alipay network, all through the M-Pesa USSD menu or the M-Pesa Super App. The process is secure, user-friendly, and eliminates the burden of traditional banking barriers for everyday traders and businesses.

Dawei Wang, SVP Network at Thunes, added: “Vodacom Tanzania joining the Thunes Direct Global Network to digitise cross-border payments is a game changer for local businesses. By combining Vodacom’s technology with Thunes’ trusted and proprietary Network, Tanzanian customers can pay partners in China and Uganda in real time. This innovation accelerates interoperability along with international trade and business growth and supports our vision of connecting the next billion end users to the global economy.”

This initiative stands as a strategic enabler for consumers and micro, small, and medium enterprises (MSMEs) who need reliable and quick financial tools such as mobile money. A 2025 GeoPoll survey on Tanzania Financial Services and Usage found that 94% of the survey’s respondents use mobile money.

The Thunes and Vodacom Tanzania alliance is set to transform the lives of millions of consumers by dismantling cross-border barriers. By hyper-connecting Tanzania to global powerhouses like China and streamlining intra-African trade, the collaboration is helping to build an inclusive economy and grow Tanzania’s role as a force in the global market.

About Vodacom Tanzania

For more information, visit: https://www.vodacom.co.tz/

About Thunes

For more information, visit: https://www.thunes.com/

Logo – https://mma.prnewswire.com/media/2831061/Thunes_Logo.jpg

View original content:https://www.prnewswire.co.uk/news-releases/thunes-and-vodacom-tanzania-unite-to-power-cross-border-m-pesa-payments-across-china-and-uganda-302760085.html

Continue Reading

Trending