Leni Tops Four Major AI Benchmarks, Outperforming Systems from OpenAI, Anthropic, Google, and Perplexity
"Most teams obsess over models, but the key engineering needed for effective AI adoption, which delivers highly accurate and reliable results for teams, relies on architecture or harness," said Leni CEO and Co-Founder
DRACO, developed by Perplexity AI and Harvard, measures whether AI can produce in-depth research that a senior analyst would sign off on. Leni scored 71.6 percent, ahead of the deep research products from Perplexity, Google, and OpenAI. SpreadsheetBench Verified, which grades AI on hundreds of real spreadsheet tasks, ranked Leni in the top two globally, completing 365 of 400 tasks correctly. On BullshitBench (Version 2), which tests whether AI pushes back on nonsensical questions instead of inventing an answer, Leni caught 98 percent of fabricated premises, ahead of all 142 public AI models on the leaderboard. GAIA, developed by Meta and HuggingFace, measures whether AI can complete real-world tasks that involve multiple steps without making mistakes early on, which would throw off the final answer. Leni scored 77.0 percent on the validation set, ahead of Genspark, Manus, and OpenAI Deep Research. In commercial real estate, where the margin for error is zero, these benchmarks measure whether a system can accurately produce the analysis that determines the closing of a deal.
The results matter because the gap between AI promise and AI reliability is costing companies real money, according to Dastidar. A staggering 99 percent of companies reported financial losses tied to AI-related risks, with an average loss of
"If I had to describe Leni's impact, it's simple: faster and easier," said Scott Jones, Vice President of IT at Ram Realty Advisors. "On the asset management side in particular, teams are no longer stuck doing manual work. The data flows directly from the source, and they can trust it. Leni shifts the focus away from aggregating information and building reports to what actually matters: finding deals, executing them better, and running assets more effectively."
Leni's agentic AI platform is designed for investment, asset management, and operations teams across commercial real estate, pulling data from PDFs, spreadsheets, and core systems to execute complex workflows end to end. At the platform's core is its Universal Data Model (UDM), the industry's first standardized data framework for multifamily real estate, developed over three years by a team that includes alums from MIT, Greystar, EY, and
"Trust is the most important part of any AI system that a business actually uses," said Leni's Head of Industry Strategy,
He added, "What these benchmarks measure is exactly that gap: whether a system can be trusted to produce finished work, not just plausible-sounding output. That is the bar we hold ourselves to with every customer."
About Leni
Leni is a secure, accuracy-driven AI platform purpose-built for serious investment work across the commercial real estate, lending, and investment sectors. Since its public launch in 2023, the company has raised
View original content to download multimedia:https://www.prnewswire.com/news-releases/leni-tops-four-major-ai-benchmarks-outperforming-systems-from-openai-anthropic-google-and-perplexity-302769724.html
SOURCE Leni
Serious News for Serious Traders! Try StreetInsider.com Premium Free!
You May Also Be Interested In
- SAMARITAN'S PURSE EBOLA TREATMENT CENTERS OPEN IN THE EPICENTER OF THE DEADLY OUTBREAK
- Tradr to Ring Opening Bell at Cboe to Celebrate SpaceX ETF Launches
- Lilly's Jaypirca (pirtobrutinib) significantly reduced the risk of disease progression or death by 45% when added to a venetoclax time-limited regimen in people with previously treated CLL/SLL
Create E-mail Alert Related Categories
PRNewswire, Press ReleasesSign up for StreetInsider Free!
Receive full access to all new and archived articles, unlimited portfolio tracking, e-mail alerts, custom newswires and RSS feeds - and more!



Tweet
Share