Connect with us

Technology

CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark

Published

on

SAN FRANCISCO, Jan. 23, 2025 /PRNewswire/ — The Center for AI Safety (CAIS) and Scale AI today announced the results of a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge and whether the models are capable of chain-of-thought reasoning. The results demonstrated a significant improvement from the reasoning capabilities of earlier models, but current models still were only able to answer fewer than 10 percent of the expert questions correctly.

CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark

The new benchmark, called “Humanity’s Last Exam,” evaluated whether AI systems have achieved world-class expert-level reasoning and knowledge capabilities across a wide range of fields, including math, humanities, and the natural sciences. Throughout the fall, CAIS and Scale AI crowdsourced questions from experts to assemble the hardest and broadest problems to stump the AI models. The exam was developed to address the challenge of “benchmark saturation”: models that regularly achieve near-perfect scores on existing tests, but may not be able to answer questions outside of those tests. Saturation reduces the utility of a benchmark as a precise measurement of future model progress.

“We wanted problems that would test the capabilities of the models at the frontier of human knowledge and reasoning,”  said Dan Hendrycks, CAIS co-founder and executive director. “We can’t predict how quickly the models will advance. When I released the MATH benchmark—a challenging competition mathematics dataset—in 2021, the best model scored less than 10%; few predicted that scores higher than 90% would be achieved just three years later. Right now, Humanity’s Last Exam shows that there are still some expert closed-ended questions that models are not able to answer.  We will see how long that lasts.”

Testing Methodology

Altogether, CAIS and Scale researchers collected more than 70,000 trial questions. That led to a selection of 13,000 questions for human expert review which, in turn, were finalized to a set of 3,000 questions on the final exam’s public release. The questions were aimed at world-class expert levels and were put to several multi-modal, frontier LLMs including OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, Google Gemini 1.5 Pro, and OpenAI o1.

“We know the AI revolution is being shaped by human ingenuity, and we’re proud to be at the forefront. To help humans measure AI progress, we engineered what might be the ultimate test, meticulously distilled and designed to challenge the world’s most advanced models at the frontiers of intelligence—requiring precise, multi-step logical reasoning and unambiguous answers at a level that pushes even the most sophisticated AI systems to their limits.” Summer Yue, Director of Research at Scale AI said.

Humanity’s Last Exam was a global collaborative effort involving nearly 1,000 contributors from more than 500 institutions across 50 countries, with most contributors being active researchers or professors. The questions spanned multiple formats, including text-only and multi-modal challenges that integrated images and diagrams.

The questions were designed to deeply test the capability of the models across diverse domains. For example, a question submitted in Ecology asked:
“Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.”

Additional sample questions can be found here (lastexam.ai).

In the final round of testing, Yue said they saw some of the models begin to answer a fraction of the questions correctly (less than 10%); however, she said variations frequently happen in model testing and could be the result of randomness. CAIS and Scale AI said they will open up the dataset to the research community, to dig deeper into the variations and to evaluate new AI systems while continuing to explore the limitations of existing models. A small subset of questions will be held back to preserve integrity for future evaluations.

Top Questions

CAIS and Scale AI offered financial awards for the best contributions to Humanity’s Last Exam, with $5,000 USD awarded for each of the top 50 questions and $500 USD for the next 500 best submissions, along with the opportunity for coauthorship of the final paper.

“By identifying the gaps in AI’s reasoning capabilities, Humanity’s Last Exam not only benchmarks current systems but also provides a roadmap for future research and development,” said Yue.

About The Center for AI Safety
The Center for AI Safety (CAIS) is a research organization whose mission is to reduce societal-scale and national security risks from AI. CAIS research focuses on mitigating high-consequence risks in areas like monitoring, alignment, and systemic safety. CAIS works to expand the field of AI safety and security by providing compute resources and technical infrastructure to top researchers and engaging with the global research community. Through its CAIS Action Fund, CAIS advocates for safe and secure AI. CAIS was founded in 2022 and is headquartered in San Francisco.

About Scale AI
Scale AI is the Humanity-first AI Company. Backed by our Data Foundry, we generate high quality data and provide technology solutions that allow our enterprise and public sector customers to build, deploy, and evaluate the smartest AI tools and applications. By making data abundant, rigorous, and high-quality, we are accelerating the progress of AI. Scale AI was founded in 2016 and is headquartered in San Francisco.

Contact:
Richard Crook, richardc@safe.ai
Fiorella Riccobono, fiorella.riccobono@scale.com

View original content to download multimedia:https://www.prnewswire.com/news-releases/cais-and-scale-ai-unveil-results-of-humanitys-last-exam-a-groundbreaking-new-benchmark-302358108.html

SOURCE Center for AI Safety

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology

ADX welcomes Morgan Stanley as the first international investment bank Remote Trading Member, expanding global access to Abu Dhabi’s capital markets

Published

on

By

ABU DHABI, UAE, May 5, 2026 /PRNewswire/ — The Abu Dhabi Securities Exchange (ADX) Group today announced that Morgan Stanley, a leading investment bank and financial services company, has joined the ADX as its first international investment bank Remote Trading Member — enabling Morgan Stanley’s clients to access the ADX directly.

This milestone strengthens ADX’s global connectivity and supports growing international institutional demand for exposure to UAE markets. It also reinforces its position as one of the world’s fastest-growing exchanges by market capitalization, while highlighting the market’s continued progress in depth, liquidity, and inclusion in major global indices.

Remote membership enables Morgan Stanley to provide its clients with direct market access to the ADX, with trading conducted via the firm’s global trading platform. The ADX continues to play a pivotal role in advancing Abu Dhabi’s long-term economic ambitions, as a mechanism for a diversified, innovation-led, knowledge-based economy.

Morgan Stanley’s direct trading access to ADX reflects the strength of Abu Dhabi’s investment proposition and the continued institutionalization of UAE capital markets. Morgan Stanley’s membership will enhance execution quality, optimize order routing, and provide greater control across the end-to-end trade lifecycle, delivering an advanced trading experience for global investors.

The structure follows a proven international access model used by Morgan Stanley and is designed to meet growing client demand for efficient, transparent, and seamless access to ADX-listed opportunities.

Abdulla Salem Alnuaimi, Group Chief Executive Officer of Abu Dhabi Securities Exchange (ADX) Group, said: “This marks a significant step in advancing our ambition to be a leading financial marketplace that drives opportunity and sustainable economic growth. This momentum is reflected in the strong foreign investor participation, with trading value exceeding 85 billion dirhams in the first quarter of 2026 up by 22% year on year. This performance underscores the growing depth and global relevance of our market, while reinforcing our commitment to expanding international access, strengthening cross-border connectivity, and building a world-class market infrastructure that attracts global capital, supports a diverse range of issuers and contributes to Abu Dhabi’s long-term economic prosperity.”

Patrick Delivanis, Regional Co-Head of MENA at Morgan Stanley, said: “Becoming a Remote Trading Member of ADX reflects our focus on providing clients with efficient, seamless access to Abu Dhabi’s capital markets through our market–leading trading platform. We see continued momentum in the institutionalization and international participation of UAE markets, and we’re pleased to support that evolution by enabling international investors to access opportunities in MENA with direct connectivity to local markets, alongside greater transparency and control across the trading lifecycle.”

Morgan Stanley’s participation aligns with ADX’s strategy to strengthen international connectivity, with remote memberships selectively offered to global firms to attract high-quality cross-border liquidity. The announcement builds on the ADX’s expansion momentum: in 2025, foreign investment rose by nearly 14% and institutional trading increased by 10% year on year. Subject to final operational readiness, Morgan Stanley expects to begin trading as a remote member in the coming weeks.

About Abu Dhabi Securities Exchange (ADX)

The Abu Dhabi Securities Exchange (ADX) was established on 15 November 2000 pursuant to Local Law No. (3) of 2000, which granted the exchange legal rights with independent financial and administrative status, as well as the necessary supervisory and executive powers necessary to carry out its functions. On 17 March 2020, the ADX was converted from a public entity into a Public Joint Stock Company (PJSC) in accordance with Law No. (8) of 2020.

The ADX Group, a market infrastructure group comprising the exchange (ADX) and its post-trade ecosystem, including its wholly owned subsidiaries AD Depository and AD Clear, was established. Through its integrated and globally aligned business structure, the ADX Group supports efficient, transparent, and resilient capital markets across trading, clearing, settlement, and custody.

The Group provides an efficient and regulated marketplace for the trading of securities, including equities issued by public joint-stock companies, bonds issued by governments and corporations, exchange-traded funds (ETFs), and other financial instruments approved by the UAE Capital Market Authority.

The ADX is the second-largest exchange in the Arab region by market capitalization. Its strategy of delivering stable financial performance through diversified revenue streams is aligned with the UAE’s national development agenda, “Towards the Next 50”, which aims to build a sustainable, diversified, and high-value-added economy.

For more information, please contact:
Abdulrahman Saleh ALKhateeb
Manager of Corporate Communication
Abu Dhabi Securities Exchange (ADX)
Mobile: +971 (50) 668 9733
Email: ALKhateebA@adx.ae

 

 

View original content:https://www.prnewswire.com/apac/news-releases/adx-welcomes-morgan-stanley-as-the-first-international-investment-bank-remote-trading-member-expanding-global-access-to-abu-dhabis-capital-markets-302762404.html

SOURCE Abu Dhabi Securities Exchange (ADX)

Continue Reading

Technology

Geotab integrates Polestar vehicles into its OEM telematics network

Published

on

By

Fleet operators across North America, Europe, and APAC can now access Polestar vehicle data directly in MyGeotab — no aftermarket hardware required.

LONDON, UK, May 5, 2026 /PRNewswire/ — Geotab, a global leader in connected vehicle and asset management solutions, today announced the integration of Polestar vehicles into its OEM telematics network, giving commercial fleet operators seamless access to Polestar data within MyGeotab from day one — with no aftermarket hardware installation required. The integration is available globally across North America, Europe, and Asia Pacific, supporting all Polestar models.

Developed in collaboration with Geotab, among other telematics service providers, Polestar Fleet Telematics integrates directly into MyGeotab. The Geotab integration enables fleet managers to manage Polestar vehicles alongside all other makes and models on a single unified platform — without fitting additional devices.

Connected vehicle data where it matters most

Through Polestar Fleet Telematics, fleet operators gain near-real-time access to a comprehensive dataset — covering EV battery and charging status, location, tyre information, vehicle security, maintenance alerts, and climate data — flowing directly from Polestar’s connected vehicle architecture into MyGeotab, with no physical installation required.

This breadth of data enables fleet managers to move from reactive to proactive operations — scheduling maintenance before failures occur, optimising charge planning across depots, and maintaining duty-of-care oversight across the entire fleet.

Supporting Europe’s Mixed-Fleet Reality

OEM-embedded telematics removes the need for aftermarket device installation across mixed-manufacturer fleets, reducing logistical overhead and supporting compliance with works council and GDPR requirements — a critical consideration for European fleet operators.

“Polestar Fleet Telematics combines sustainability with intelligence, integrating seamlessly with Geotab to deliver these capabilities directly into the platforms fleet operators trust. Continuous data visibility enables more efficient and informed fleet operations, from day-to-day management to long-term planning. By leveraging Polestar vehicles’ embedded connectivity, fleet managers can make smarter, data-driven decisions — without adding hardware or complexity to their operations.” said Emma Knapp, Manager of Global Key Accounts at Polestar.

Polestar joins an OEM telematics network that already spans over 80% of leading global vehicle manufacturers by fleet market share, including BMW Group, Ford, Stellantis, Volkswagen Group, and Volvo Cars. For fleet operators already using MyGeotab, Polestar vehicles can be connected and deliver data without any additional hardware or installation.

“OEM-embedded telematics represents a change in how fleet data reaches the platform — and Polestar’s connected vehicle architecture makes this integration particularly well-suited for markets that are seriously considering transitioning to electric vehicles.” said Christoph Ludewig, Vice President OEM Global at Geotab. “Fleet operators managing mixed EV and internal combustion engine fleets no longer need separate tools or hardware for each vehicle type. Polestar data flows directly into MyGeotab alongside every other vehicle in the fleet — giving operators the consolidated visibility they need to drive efficiency, support duty of care, and manage their EV transition with confidence.”

Global Availability

The integration is available now across North America, Europe, and Asia Pacific, supporting all Polestar models. Fleet managers can activate the service via the Geotab Marketplace or by contacting their Geotab representative.

About Polestar

Polestar (Nasdaq: PSNY) is the Swedish electric performance car brand with a focus on uncompromised design and innovation, and the ambition to accelerate the change towards a sustainable future. Headquartered in Gothenburg, Sweden, its cars are available in 28 markets globally across North America, Europe and Asia Pacific.

Polestar has four models in its line-up: Polestar 2, Polestar 3, Polestar 4, and Polestar 5. Planned models include the Polestar 7 compact SUV (to be introduced in 2028) and the Polestar 6 roadster. With its vehicles currently manufactured on two continents, North America and Asia, Polestar plans to diversify its manufacturing footprint further, with production of Polestar 7 planned in Europe.

Polestar has an unwavering commitment to sustainability and has set an ambitious roadmap to reach its climate targets: halve greenhouse gas emissions by 2030 per-vehicle-sold and become climate-neutral across its value chain by 2040. Polestar’s comprehensive sustainability strategy covers the four areas of Climate, Transparency, Circularity, and Inclusion.

About Geotab

Geotab is a global leader in connected vehicle and asset management solutions, with headquarters in Oakville, Ontario and Atlanta, Georgia. Our mission is to make the world safer, more efficient, and sustainable. We leverage advanced data analytics and AI to transform fleet performance and operations, reducing cost and driving efficiency. Backed by top data scientists and engineers, we serve approximately 100,000 global customers, processing 100 billion data points daily from more than 5 million vehicle subscriptions. Geotab is trusted by Fortune 500 organisations, mid-sized fleets, and the largest public sector fleets in the world, including the US Federal government. Committed to data security and privacy, we hold FIPS 140-3 and FedRAMP authorisations. Our open platform, ecosystem of outstanding partners, and Geotab Marketplace deliver hundreds of fleet-ready third-party solutions. This year, we’re celebrating 25 years of innovation. Learn more at www.geotab.com/uk and follow us on LinkedIn or visit our blog.

GEOTAB and GEOTAB MARKETPLACE are registered trademarks of Geotab Inc. in Canada, the United States and/or other countries.

Media Contact: Geotab Contact, Romina Dashghachian, Strategic Communications Lead, EMEA, pr@geotab.com

Photo – https://mma.prnewswire.com/media/2972188/Geotab_Inc__Geotab_integrates_Polestar_vehicles_into_its_OEM_tel.jpg 
Logo – https://mma.prnewswire.com/media/2972187/Geotab_Inc__Geotab_integrates_Polestar_vehicles_into_its_OEM_tel.jpg 

 

View original content:https://www.prnewswire.co.uk/news-releases/geotab-integrates-polestar-vehicles-into-its-oem-telematics-network-302761910.html

Continue Reading

Technology

IDX Opens Geneva Office and Strengthens Global Data & Insights Capability

Published

on

By

New Swiss presence and specialist team integration support growing global demand for evidence-based, defensible communications strategies

LONDON, May 5, 2026 /PRNewswire/ — IDX today announced the opening of its new Geneva office and the integration of a specialist Data & Insights team, strengthening the company’s international footprint and expanding its ability to help clients worldwide build communications strategies grounded in evidence, market intelligence and audience insight.

The expansion gives IDX an on-the-ground presence in Switzerland while adding further depth to its Data & Insights capability. The Geneva-based team will work closely with IDX specialists across performance marketing and corporate communications, helping clients develop a clearer view of the markets they operate in and the forces shaping their growth.

The move aligns with Destination 250 – Customers First, IDX’s global strategy to grow its team by 250, focused on deepening client value, strengthening delivery and investing in the capabilities that matter most to clients.

The investment strengthens the Data pillar of IDX’s Connected Content™ model, which combines Creative, Data, Technology and Media to create what IDX calls The Multiplier Effect, helping clients multiply what matters through more connected, measurable and effective work.

“IDX is experiencing phenomenal growth, and our new Geneva office gives us boots on the ground to better serve clients across Europe and globally across performance marketing, investor relations and corporate communications,” said Crispin Beale, Worldwide CEO, IDX. “Data has been at the heart of this business for decades, and this centre of excellence reflects our continued investment in that capability. It’s an incredibly exciting time for IDX, and I look forward to the next phase of our growth as we continue to expand globally.”

“This is an exciting step in IDX’s growth story and a clear response to what clients are asking for: more evidence-based thinking, stronger market context and clearer rationale behind their communications strategies,” said Chris Corrigan, Chief Customer Growth Officer, IDX. “Our new presence in Geneva, combined with deeper Data & Insights expertise, strengthens the way we support clients globally, giving them earlier access to the insight and market context they need to make better-informed decisions and turn evidence into action.”

The Geneva office will strengthen relationships with existing clients in the region, support re-engagement with former partners and create new opportunities for IDX with organisations operating across European and global markets. It reflects IDX’s continued investment in the capabilities that matter most to clients as communications, marketing and corporate reputation work become increasingly data-led and commercially accountable.

“IDX’s integrated offer across insights, performance marketing and corporate communications, powered by the combination of human intelligence, advanced technology and AI, represents exactly where the industry is heading,” said Lonneke de Roo, Head of Data & Insights, IDX. “I am delighted to join the business and help clients navigate increasingly complex markets with clearer evidence, sharper insight and more connected strategies.”

ABOUT IDX  

IDX is a global strategic communications and marketing agency, headquartered in London with offices around the world, including New York, London, Phoenix, Helsinki, Gothenburg, Geneva, and Vadodara. Working with more than 1,600 clients across sectors, IDX combines deep industry knowledge with a data-first mindset to help ambitious brands thrive in complex, fast-moving markets. The firm specialises in performance marketing, investor relations, and stakeholder engagement, delivering integrated campaigns that drive meaningful business outcomes. Visit www.idx.inc to learn more.

Logo – https://mma.prnewswire.com/media/2668561/IDX_black_Logo.jpg

View original content:https://www.prnewswire.co.uk/news-releases/idx-opens-geneva-office-and-strengthens-global-data–insights-capability-302762181.html

Continue Reading

Trending