Inference Engine Examples

Volcano Engine bets better models, not lower prices, will decide the MaaS race

Over the past three years, Volcano Engine president Tan Dai has repeated the same cycle when setting revenue targets for his ...

Can India develop a self-sustainable AI ecosystem?

India's overdependence on foreign artificial intelligence models risks a large forex outflow, especially as many of them are ...

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85%

DSpark can make decoding faster, but acceptance quality still determines how much speed the system actually realizes.

Interesting Engineering

Jalapeño, OpenAI’s first custom chip, targets gigawatt-scale data center deployment

OpenAI's new Jalapeño inference chip promises faster performance, improved efficiency, and lower computing costs.

Defense One

First B-52s to get new engines this year

Two B-52 bombers will head back to their manufacturer for new engines this year, kicking off a long-awaited upgrade meant to help keep flying the Stratofortress until nearly their 100th birthday. On ...

Nasdaq

DigitalOcean Launches Inference Engine with New Capabilities for Production AI, Including Inference Router for Efficient Scaling of Agentic Workloads

Built alongside early design partners, the Inference Engine gives AI developers unified control over performance, cost, and scale — with customers reporting up to 67% lower inference costs. Inference ...

Morningstar

AI-Native Startups Are Leaving Hyperscalers for DigitalOcean's Agentic Inference Cloud

AI-native startups report 50% faster training cycles and 40% decrease in latency when running production AI on DigitalOcean. DigitalOcean (NYSE: DOCN), the Agentic Inference Cloud built for production ...

SDxCentral

Nvidia, hyperscaler-backed open standard for AI inference torch passed to Linux Foundation

An open standard for AI inference backed by Google Cloud, IBM, Red Hat, Nvidia and more was given to the Linux Foundation for stewardship in further proof training has been superseded by inference in ...

Wall Street Journal

Amazon Announces Inference Chips Deal With Cerebras

Amazon Web Services plans to deploy processors designed by Cerebras inside its data centers, the latest vote of confidence in the startup, which specializes in chips that power artificial-intelligence ...

VentureBeat

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Every GPU cluster has dead time. Training jobs finish, workloads shift and hardware sits dark while power and cooling costs keep running. For neocloud operators, those empty cycles are lost margin.

GitHub

llminfer: A GPU-efficient LLM inference engine

This is a python package focused on systems performance: quantized weights, KV cache reuse, dynamic batching, token streaming, and rigorous benchmarking across backends. llminfer is for engineers who ...

The Next Platform

Taalas Etches AI Models Onto Transistors To Rocket Boost Inference

Adding big blocks of SRAM to collections of AI tensor engines, or better still, a waferscale collection of such engines, turbocharges AI inference, as has been shown time and again by AI upstarts ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results