Connect with us

Technology

CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark

Published

on

SAN FRANCISCO, Jan. 23, 2025 /PRNewswire/ — The Center for AI Safety (CAIS) and Scale AI today announced the results of a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge and whether the models are capable of chain-of-thought reasoning. The results demonstrated a significant improvement from the reasoning capabilities of earlier models, but current models still were only able to answer fewer than 10 percent of the expert questions correctly.

CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark

The new benchmark, called “Humanity’s Last Exam,” evaluated whether AI systems have achieved world-class expert-level reasoning and knowledge capabilities across a wide range of fields, including math, humanities, and the natural sciences. Throughout the fall, CAIS and Scale AI crowdsourced questions from experts to assemble the hardest and broadest problems to stump the AI models. The exam was developed to address the challenge of “benchmark saturation”: models that regularly achieve near-perfect scores on existing tests, but may not be able to answer questions outside of those tests. Saturation reduces the utility of a benchmark as a precise measurement of future model progress.

“We wanted problems that would test the capabilities of the models at the frontier of human knowledge and reasoning,”  said Dan Hendrycks, CAIS co-founder and executive director. “We can’t predict how quickly the models will advance. When I released the MATH benchmark—a challenging competition mathematics dataset—in 2021, the best model scored less than 10%; few predicted that scores higher than 90% would be achieved just three years later. Right now, Humanity’s Last Exam shows that there are still some expert closed-ended questions that models are not able to answer.  We will see how long that lasts.”

Testing Methodology

Altogether, CAIS and Scale researchers collected more than 70,000 trial questions. That led to a selection of 13,000 questions for human expert review which, in turn, were finalized to a set of 3,000 questions on the final exam’s public release. The questions were aimed at world-class expert levels and were put to several multi-modal, frontier LLMs including OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, Google Gemini 1.5 Pro, and OpenAI o1.

“We know the AI revolution is being shaped by human ingenuity, and we’re proud to be at the forefront. To help humans measure AI progress, we engineered what might be the ultimate test, meticulously distilled and designed to challenge the world’s most advanced models at the frontiers of intelligence—requiring precise, multi-step logical reasoning and unambiguous answers at a level that pushes even the most sophisticated AI systems to their limits.” Summer Yue, Director of Research at Scale AI said.

Humanity’s Last Exam was a global collaborative effort involving nearly 1,000 contributors from more than 500 institutions across 50 countries, with most contributors being active researchers or professors. The questions spanned multiple formats, including text-only and multi-modal challenges that integrated images and diagrams.

The questions were designed to deeply test the capability of the models across diverse domains. For example, a question submitted in Ecology asked:
“Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.”

Additional sample questions can be found here (lastexam.ai).

In the final round of testing, Yue said they saw some of the models begin to answer a fraction of the questions correctly (less than 10%); however, she said variations frequently happen in model testing and could be the result of randomness. CAIS and Scale AI said they will open up the dataset to the research community, to dig deeper into the variations and to evaluate new AI systems while continuing to explore the limitations of existing models. A small subset of questions will be held back to preserve integrity for future evaluations.

Top Questions

CAIS and Scale AI offered financial awards for the best contributions to Humanity’s Last Exam, with $5,000 USD awarded for each of the top 50 questions and $500 USD for the next 500 best submissions, along with the opportunity for coauthorship of the final paper.

“By identifying the gaps in AI’s reasoning capabilities, Humanity’s Last Exam not only benchmarks current systems but also provides a roadmap for future research and development,” said Yue.

About The Center for AI Safety
The Center for AI Safety (CAIS) is a research organization whose mission is to reduce societal-scale and national security risks from AI. CAIS research focuses on mitigating high-consequence risks in areas like monitoring, alignment, and systemic safety. CAIS works to expand the field of AI safety and security by providing compute resources and technical infrastructure to top researchers and engaging with the global research community. Through its CAIS Action Fund, CAIS advocates for safe and secure AI. CAIS was founded in 2022 and is headquartered in San Francisco.

About Scale AI
Scale AI is the Humanity-first AI Company. Backed by our Data Foundry, we generate high quality data and provide technology solutions that allow our enterprise and public sector customers to build, deploy, and evaluate the smartest AI tools and applications. By making data abundant, rigorous, and high-quality, we are accelerating the progress of AI. Scale AI was founded in 2016 and is headquartered in San Francisco.

Contact:
Richard Crook, richardc@safe.ai
Fiorella Riccobono, fiorella.riccobono@scale.com

View original content to download multimedia:https://www.prnewswire.com/news-releases/cais-and-scale-ai-unveil-results-of-humanitys-last-exam-a-groundbreaking-new-benchmark-302358108.html

SOURCE Center for AI Safety

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology

139th Canton Fair Phase 3 Advances Toward a Better Life with New and Strengthened Product Zones

Published

on

By

GUANGZHOU, China, May 3, 2026 /PRNewswire/ — The 139th China Import and Export Fair (Canton Fair) has rolled out nine newly established product zones. Phase 3 features an expanded and upgraded Intelligent Healthcare zone and the inaugural presentation of a Functional & Technical Fabrics zone.

The upgraded Intelligent Healthcare zone brings together 50 companies presenting a full spectrum of intelligent medical solutions, spanning AI-powered diagnostics, surgical robotics, and next‑generation eldercare technologies. Exhibits highlight how medical devices are becoming smaller, more precise, and increasingly non‑invasive. Capsule endoscopy systems demonstrate how gastrointestinal screening can be completed without discomfort, while AI‑enabled traditional Chinese medicine analyzers compress the inspection and inquiry process into minutes. Wearable glucose monitors make chronic disease management easier and more convenient.

Robotic technologies play a prominent role as well. Endoscopic and orthopedic surgical robots showcase enhanced precision through integrated human‑machine coordination, while bionic prosthetic hands use non‑invasive myoelectric sensing to independently control each finger. Intelligent rehabilitation systems, including lower‑limb exoskeletons and hand‑training devices, provide consistent support for patients recovering mobility. Companion‑style eldercare robots, equipped with monitoring and telemedicine functions, signal the rise of integrated home‑based health services.

The debuting Functional & Technical Fabrics zone highlights how the traditional textile industry is moving toward higher-end and smarter products. Exhibitors present materials that combine multi‑layered performance with intelligent responsiveness. Textiles featuring temperature‑regulating fibers, phase‑change materials, and light‑ or heat‑sensitive color‑shifting effects illustrate how fabrics are evolving into adaptive platforms capable of responding to environmental conditions.

Sustainability emerges as a defining theme. Bio‑based fibers, degradable films, recycled polyester, and organic cotton reflect a shift from isolated eco‑products toward full‑chain green manufacturing. High‑performance outdoor and protective applications further shape the narrative. Materials engineered for waterproof breathability, UV resistance, flame retardancy, and long‑term durability address rising demand across sportswear, professional protection, and medical environments. Smart textiles with embedded health‑monitoring modules demonstrate how apparel is beginning to function as a continuous wellness interface.

Both technology‑driven healthcare and advanced textiles are converging around a shared pursuit of a better life. As these advancements continue to evolve, they reflect a manufacturing landscape increasingly shaped by innovation, resilience, and a commitment to improving everyday living.

For pre-registration, please click: https://buyer.cantonfair.org.cn/register/buyer/email?source_type=16

 

View original content to download multimedia:https://www.prnewswire.com/news-releases/139th-canton-fair-phase-3-advances-toward-a-better-life-with-new-and-strengthened-product-zones-302760704.html

SOURCE Canton Fair

Continue Reading

Technology

CupidFeel Insights Show How Shared Interests Affect Initial Connection Outcomes

Published

on

By

New behavioral insights from CupidFeel offer a carefully considered look at how shared interests influence whether an initial connection on a dating platform is sustained or abandoned in those first critical exchanges.

GIBRALTAR, May 3, 2026 /PRNewswire-PRWeb/ — The findings by CupidFeel are not dramatic, but they are telling. People who referenced a shared interest — whether a genre of music, a type of cuisine, a sport, a creative practice, or even a shared discomfort with small talk — within the first few exchanges of a new conversation were found to be measurably more likely to continue that conversation beyond the initial contact window. The effect was not uniform across all interest categories; certain types of shared interest appeared to carry more relational weight than others.

It was also observed by CupidFeel that the timing of when shared interests entered a conversation mattered. Connections where common ground was discovered organically — through the natural flow of exchange rather than prompted by a profile field or a direct question — showed stronger indicators of sustained interest. The discovery, in other words, carried more meaning when it felt like something found rather than something declared.

Among the most quietly striking findings in the CupidFeel data was the role of specificity. Broad shared categories — “we both like travel,” “we both enjoy cooking” — were associated with polite, often brief exchanges that rarely extended past pleasantries. But when specificity entered the picture — when one person mentioned a particular documentary that had stayed with them, or a city they had visited and could not stop thinking about — the conversational energy shifted. Something opened up.

In a CupidFeel review of trends in profile engagements, those whose profiles reflected specific, idiosyncratic interests — rather than broadly appealing ones — also showed higher rates of receiving first messages, a finding that runs gently counter to the instinct many people have to present themselves in the most universally appealing terms possible.

What seemed to matter most was not the quantity of overlap but whether the overlap that existed was felt — whether it produced a sense of being seen in some particular, non-generic way. A CupidFeel review of early conversation patterns suggests that a single deeply resonant shared interest may be more generative for early connection than a long list of surface-level commonalities that, taken together, feel more like a demographic profile than a person.

About CupidFeel

CupidFeel is an online dating platform built around the belief that meaningful connections begin with emotional honesty and the willingness to let a conversation go somewhere real. It came into being for people who are less interested in the mechanics of dating and more drawn to the possibility of something that feels grounded — exchanges that move at their own pace, guided by genuine curiosity rather than performance.

A CupidFeel review of its own design principles returns consistently to the same question: what does it take for a first message to feel like it might be worth the journey? The platform makes room for the kind of interaction that doesn’t always have a clear destination but feels, from the first exchange, like something real. CupidFeel is a place where the unexpected is not something to be managed, but something to be welcomed.

Media Contact

Timothy Albers, CupidFeel, 1 14845691657, smm@cupidfeel.com, https://cupidfeel.com/

View original content:https://www.prweb.com/releases/cupidfeel-insights-show-how-shared-interests-affect-initial-connection-outcomes-302759951.html

SOURCE CupidFeel

Continue Reading

Technology

Las Vegas Review-Journal Launches VegasBusiness, a New Multiplatform Business Brand Serving Southern Nevada Leaders

Published

on

By

LAS VEGAS, May 3, 2026 /PRNewswire/ –The Las Vegas Review Journal today announced the launch of VegasBusiness, a new multiplatform brand designed to deliver authoritative reporting, analysis, and insight to Southern Nevada’s business leaders and decisionmakers.

VegasBusiness debuts with a biweekly print section distributed in the Sunday Las Vegas Review Journal, a new digital hub at VegasBusiness.com, a weekday business email newsletter, and a bi monthly podcast. Together, the platforms create the most comprehensive local business news and intelligence offerings in the Las Vegas market.

Vegas Business, powered by the state’s largest and most experienced business reporting team, focuses on the people, companies, and forces shaping Southern Nevada’s economy, including gaming and hospitality, real estate development, technology, finance, logistics, and small business.

“Las Vegas is one of the fastest evolving business markets in the country, and its leaders need timely, trustworthy intelligence to compete and grow,” Keith Moyer, publisher and editor of the Las Vegas Review Journal, said. “VegasBusiness expands our business journalism across platforms and delivers deeper insight, greater frequency, and broader reach for Southern Nevada’s decision makers.”

Expanded Business Coverage Across Platforms

VegasBusiness features business coverage designed for how executives consume news today—through print, digital, email, audio, video, and social channels.

Editorial franchises include CSuite Insider, featuring executive interviews; Building Las Vegas, with in depth reporting on major projects and deals; Innovation, spotlighting people and ideas driving change; Price Points, examining data and consumer trends; Small Business, highlighting enterprising local companies; and a curated Calendar of business events and conferences.

“We created this section to provide Las Vegas business leaders with practical, locally grounded intelligence so t hey can make faster, more informed decisions on growth, talent, and regulation,” Erin Edgemon, business editor of the Review Journal, said. “VegasBusiness is designed to help business leaders understand what’s happening in the economy, why it matters, and how it impacts their organizations and communities.”

A Premium Environment for Advertisers

In addition to expanded editorial coverage, VegasBusiness provides a new advertising and sponsorship platform for brands seeking to reach business decision makers in Southern Nevada. The brand offers integrated opportunities across print, digital, newsletters, podcasts, video, social media, and events, supported by advanced audience targeting and measurement.

“VegasBusiness gives advertisers access to the largest concentration of local business decision- makers in the market,” Michael LaBonia, senior vice president of sales for the Las Vegas Review Journal, said. “It’s a high impact, data driven platform that connects brands with influence, intent, and measurable results.”

Availability

VegasBusiness Magazine: Biweekly, Sundays in the Las Vegas Review JournalDigital: 24/7 at VegasBusiness.com**Newscast:** Weekday segment mornings on 7@7 a.m.**Newsletter:** Featured in the weekday Business Brief

Additional information is available at VegasBusiness.com.

Since 1909, the Las Vegas Review-Journal has been Nevada’s comprehensive media leader. Over the years, it has transformed from an award-winning newspaper that delivers local and community news and spans the globe with far-reaching topics and coverage. The Review-Journal has forged into multimedia and digital platforms through its website, Reviewjournal.com, which includes a network of niche publications, breaking news, e-newsletters, customized content, custom printing, a production studio, and more to meet the specific needs of readers and advertisers alike.

Media Contact

Wanda English Blair, Vice President of Marketing
(702) 383-0223
wblair@reviewjournal.com

View original content to download multimedia:https://www.prnewswire.com/news-releases/las-vegas-review-journal-launches-vegasbusiness-a-new-multiplatform-business-brand-serving-southern-nevada-leaders-302760627.html

SOURCE Las Vegas Review-Journal

Continue Reading

Trending