Connect with us

Technology

CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark

Published

on

SAN FRANCISCO, Jan. 23, 2025 /PRNewswire/ — The Center for AI Safety (CAIS) and Scale AI today announced the results of a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge and whether the models are capable of chain-of-thought reasoning. The results demonstrated a significant improvement from the reasoning capabilities of earlier models, but current models still were only able to answer fewer than 10 percent of the expert questions correctly.

CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark

The new benchmark, called “Humanity’s Last Exam,” evaluated whether AI systems have achieved world-class expert-level reasoning and knowledge capabilities across a wide range of fields, including math, humanities, and the natural sciences. Throughout the fall, CAIS and Scale AI crowdsourced questions from experts to assemble the hardest and broadest problems to stump the AI models. The exam was developed to address the challenge of “benchmark saturation”: models that regularly achieve near-perfect scores on existing tests, but may not be able to answer questions outside of those tests. Saturation reduces the utility of a benchmark as a precise measurement of future model progress.

“We wanted problems that would test the capabilities of the models at the frontier of human knowledge and reasoning,”  said Dan Hendrycks, CAIS co-founder and executive director. “We can’t predict how quickly the models will advance. When I released the MATH benchmark—a challenging competition mathematics dataset—in 2021, the best model scored less than 10%; few predicted that scores higher than 90% would be achieved just three years later. Right now, Humanity’s Last Exam shows that there are still some expert closed-ended questions that models are not able to answer.  We will see how long that lasts.”

Testing Methodology

Altogether, CAIS and Scale researchers collected more than 70,000 trial questions. That led to a selection of 13,000 questions for human expert review which, in turn, were finalized to a set of 3,000 questions on the final exam’s public release. The questions were aimed at world-class expert levels and were put to several multi-modal, frontier LLMs including OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, Google Gemini 1.5 Pro, and OpenAI o1.

“We know the AI revolution is being shaped by human ingenuity, and we’re proud to be at the forefront. To help humans measure AI progress, we engineered what might be the ultimate test, meticulously distilled and designed to challenge the world’s most advanced models at the frontiers of intelligence—requiring precise, multi-step logical reasoning and unambiguous answers at a level that pushes even the most sophisticated AI systems to their limits.” Summer Yue, Director of Research at Scale AI said.

Humanity’s Last Exam was a global collaborative effort involving nearly 1,000 contributors from more than 500 institutions across 50 countries, with most contributors being active researchers or professors. The questions spanned multiple formats, including text-only and multi-modal challenges that integrated images and diagrams.

The questions were designed to deeply test the capability of the models across diverse domains. For example, a question submitted in Ecology asked:
“Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.”

Additional sample questions can be found here (lastexam.ai).

In the final round of testing, Yue said they saw some of the models begin to answer a fraction of the questions correctly (less than 10%); however, she said variations frequently happen in model testing and could be the result of randomness. CAIS and Scale AI said they will open up the dataset to the research community, to dig deeper into the variations and to evaluate new AI systems while continuing to explore the limitations of existing models. A small subset of questions will be held back to preserve integrity for future evaluations.

Top Questions

CAIS and Scale AI offered financial awards for the best contributions to Humanity’s Last Exam, with $5,000 USD awarded for each of the top 50 questions and $500 USD for the next 500 best submissions, along with the opportunity for coauthorship of the final paper.

“By identifying the gaps in AI’s reasoning capabilities, Humanity’s Last Exam not only benchmarks current systems but also provides a roadmap for future research and development,” said Yue.

About The Center for AI Safety
The Center for AI Safety (CAIS) is a research organization whose mission is to reduce societal-scale and national security risks from AI. CAIS research focuses on mitigating high-consequence risks in areas like monitoring, alignment, and systemic safety. CAIS works to expand the field of AI safety and security by providing compute resources and technical infrastructure to top researchers and engaging with the global research community. Through its CAIS Action Fund, CAIS advocates for safe and secure AI. CAIS was founded in 2022 and is headquartered in San Francisco.

About Scale AI
Scale AI is the Humanity-first AI Company. Backed by our Data Foundry, we generate high quality data and provide technology solutions that allow our enterprise and public sector customers to build, deploy, and evaluate the smartest AI tools and applications. By making data abundant, rigorous, and high-quality, we are accelerating the progress of AI. Scale AI was founded in 2016 and is headquartered in San Francisco.

Contact:
Richard Crook, richardc@safe.ai
Fiorella Riccobono, fiorella.riccobono@scale.com

View original content to download multimedia:https://www.prnewswire.com/news-releases/cais-and-scale-ai-unveil-results-of-humanitys-last-exam-a-groundbreaking-new-benchmark-302358108.html

SOURCE Center for AI Safety

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology

Booking.com’s Latest Travel and Sustainability Research Reveals Australian Generational Paradox

Published

on

By

SYDNEY, May 4, 2026 /PRNewswire/ — Today Booking.com released its 11th annual research report into consumer attitudes and understanding of the social and environmental impact of travel. With insights from 32,500 travellers across 35 markets globally, including 1,000 from Australia, this year’s research highlights a generational paradox.

While 83% of Australian travellers say that more sustainable travel is important to them, older generations demonstrate greater commitment through concrete actions, despite younger generations often expressing stronger sustainability intentions.

Older generations demonstrate greater commitment through concrete sustainable actions, with almost two–thirds of Boomers across Asia Pacific (63%) saying they will shop more at local, independent stores on their trips, three–quarters (75%) planning to reduce general waste, and 63% intending to reduce energy consumption, higher than Gen X, Millennials and Gen Z.However, younger travellers are leading in cultural and conservation management; more than a quarter of Gen Z (27%) and Millennials (26%) participated in a tour or activity that contributed to the health or conservation of the local ecosystem or wildlife.Extreme weather is actively reshaping travel choices. Nearly three-quarters of Australian travellers said they consider extreme weather risk when choosing both destination (72%), and timing (75%), and 27% reported having cancelled or changed trip plans in the past twelve months due to extreme weather or natural disaster.Australian travellers are shifting when and where they travel, seeking reliable climates and comfort. 43% say they plan to avoid overcrowded tourist destinations, 42% plan to travel outside of peak season, and 26% will seek out destinations with cooler temperatures.In 2025, travellers booked more than 100 million room nights on Booking.com at accommodation partners with a third-party sustainability certification, including at 363 third-party certified Australian properties.

To find Booking.com’s full 2026 research, visit here.

 

View original content:https://www.prnewswire.com/apac/news-releases/bookingcoms-latest-travel-and-sustainability-research-reveals-australian-generational-paradox-302760310.html

SOURCE Booking.com

Continue Reading

Technology

Florida Physician Specialists Data Breach: Edelson Lechtzin LLP Launches Investigation into Exposure of Personal Information

Published

on

By

National class action firm offering free case evaluations to individuals impacted by the Florida Physician Specialists cybersecurity incident

JACKSONVILLE, Fla., May 3, 2026 /PRNewswire/ — Edelson Lechtzin LLP, a national class action law firm, is investigating data privacy claims arising from the Florida Physician Specialists data breach. Florida Physician Specialists learned of the cybersecurity incident between November 27 and 29, 2025.

What Happened

Florida Physician Specialists discovered that its network was hacked between November 27 and 29, 2025. An investigation launched in late November 2025 confirmed that an unauthorized third party accessed its network. The review of the exposed data was completed on April 6, 2026.

Information Exposed

Affected personal data includes full names and one or more of the following: Social Security numbers, driver’s license numbers or state identification numbers, other government identification numbers, financial account information, credit or debit card information, medical information, and/or health insurance policy information.

Who May Be Impacted

Individuals who received a data breach notification from Florida Physician Specialists may face an increased risk of identity theft and fraud.

Your Legal Options

Edelson Lechtzin LLP is investigating a potential class action to pursue legal remedies on behalf of individuals whose sensitive personal data may have been compromised in the Florida Physician Specialists breach. The firm will evaluate your rights and potential claims at no cost.

Recommended Protective Steps

Review account statements and credit reports regularly and remain vigilant for suspicious activity. Confirm whether your information was involved in the Florida Physician Specialists incident and preserve any letters or emails you received about the breach. Consider placing fraud alerts and credit monitoring.

Contact Us for a Free Case Evaluation

Speak confidentially with a data privacy attorney today: Marc Edelson, Esq., Edelson Lechtzin LLP, 411 S. State Street, Suite N-300, Newtown, PA 18940; Phone: 844-696-7492 ext. 2; Email: medelson@edelson-law.com; Web: www.edelson-law.com. Or click HERE to request a free consultation.

About Florida Physician Specialists

Based in Jacksonville, Florida, Florida Physician Specialists is a multi-specialty private physician practice serving patients in Northeast Florida.

About Edelson Lechtzin LLP

Edelson Lechtzin LLP is a national class action law firm with offices in Pennsylvania and California. In addition to data breach litigation, the firm handles class and collective actions involving securities and investment fraud, federal antitrust violations, ERISA employee benefit plans, wage theft, and consumer fraud

Media and Partnership Inquiries: Use the contact information above to connect with our team regarding interviews, co-counsel opportunities, and referral partnerships.

Legal Notice: This press release may be considered Attorney Advertising in some jurisdictions.

View original content to download multimedia:https://www.prnewswire.com/news-releases/florida-physician-specialists-data-breach-edelson-lechtzin-llp-launches-investigation-into-exposure-of-personal-information-302760742.html

SOURCE Edelson Lechtzin LLP

Continue Reading

Technology

Sandhills Medical Foundation, Inc., d/b/a Sandhills Medical Data Breach: Edelson Lechtzin LLP Launches Investigation into Exposure of Personal Information

Published

on

By

National class action firm offering free case evaluations to individuals impacted by the Sandhills Medical cybersecurity incident

MCBEE, S.C., May 3, 2026 /PRNewswire/ — Edelson Lechtzin LLP, a national class action law firm, is investigating data privacy claims arising from the Sandhills Medical data breach. Sandhills Medical learned of the cybersecurity incident between November 27 and 29, 2025.

What Happened

On May 8, 2025, Sandhills Medical discovered it had been the victim of a ransomware attack. Sandhills Medical began an investigation with the help of cybersecurity experts and a forensic firm. That investigation determined an unauthorized third party accessed Sandhills Medical’s server directly and obtained personal information for select patients.

Information Exposed

Affected personal data includes names, personal health information, and birth dates. This data breach has affected an estimated 169,017 people.

Who May Be Impacted

Individuals who received a data breach notification from Sandhills Medical may face an increased risk of identity theft and fraud.

Your Legal Options

Edelson Lechtzin LLP is investigating a potential class action to pursue legal remedies on behalf of individuals whose sensitive personal data may have been compromised in the Sandhills Medical breach. The firm will evaluate your rights and potential claims at no cost.

Recommended Protective Steps

Review account statements and credit reports regularly and remain vigilant for suspicious activity. Confirm whether your information was involved in the Sandhills Medical incident and preserve any letters or emails you received about the breach. Consider placing fraud alerts and credit monitoring.

Contact Us for a Free Case Evaluation

Speak confidentially with a data privacy attorney today: Marc Edelson, Esq., Edelson Lechtzin LLP, 411 S. State Street, Suite N-300, Newtown, PA 18940; Phone: 844-696-7492 ext. 2; Email: medelson@edelson-law.com; Web: www.edelson-law.com. Or click HERE to request a free consultation.

About Sandhills Medical

Based in McBee, South Carolina, Sandhills Medical operates as a Federally Qualified Community Health Center (FQHC) that provides community-based primary health care services.

About Edelson Lechtzin LLP

Edelson Lechtzin LLP is a national class action law firm with offices in Pennsylvania and California. In addition to data breach litigation, the firm handles class and collective actions involving securities and investment fraud, federal antitrust violations, ERISA employee benefit plans, wage theft, and consumer fraud

Media and Partnership Inquiries: Use the contact information above to connect with our team regarding interviews, co-counsel opportunities, and referral partnerships.

Legal Notice: This press release may be considered Attorney Advertising in some jurisdictions.

View original content to download multimedia:https://www.prnewswire.com/news-releases/sandhills-medical-foundation-inc-dba-sandhills-medical-data-breach-edelson-lechtzin-llp-launches-investigation-into-exposure-of-personal-information-302760743.html

SOURCE Edelson Lechtzin LLP

Continue Reading

Trending