- Majestic Labs is helmed by the execs who built Google and Meta's silicon units
- They're now trying to build a new memory pipeline big enough for AI - from the ground up
- Initial system delivery is expected in 2027
The team that built Google and Meta’s silicon divisions is now behind a startup tackling the biggest – and least talked about – problem facing AI: its memory wall.
Founded in late 2023 by former hyperscaler executives Ofer Shacham, Sha Rabii and Masumi Reynders, Majestic Labs has just come out of stealth with $100 million in funding. The company’s goal?
“We are starting from the memory, making it a first class citizen,” Shacham told Fierce of its plan to build a new class of advanced server architecture. In other words, Majestic Labs is building new signaling, silicon, hardware designs and software compiler toolchains from the ground up. All with the goal of building a big enough memory pipeline to feed AI’s growing appetite.
We’ve talked about the memory wall before. But as a quick recap, a lack of memory capacity and bandwidth is hindering AI performance. That’s because in addition to pure processing power (ahem, GPUs), AI needs to access data. It does that through memory, but as things stand today, there’s not nearly enough of it. This, of course, isn’t something the industry likes to talk much about.
There are a few ways to tackle the memory problem. You could, in theory, just continue scaling across more and more GPU servers until you have enough memory for the task in question. But that would be very expensive and leave you with a lot of idle GPU power. You could use high-bandwidth memory chips, but these are currently in short supply and cost a pretty penny.
CPUs also tend to have more memory than GPUs. So, you could also try virtualizing the memory in your CPUs to create a sort of memory pool that is accessible and dynamically allocated to your GPUs. Both Nvidia and a company called Kove:SDM have solutions along these lines.
But Shacham argued the problem with this approach is that slows your average memory access because making the leap beyond memory local to the GPU takes time. Distance equals time and all that. Put another way “You have really fast access to a tiny sliver of your memory [on the GPU] and really slow access to the majority of your memory [in the CPU pool].”
For the record, Kove:SDM claims to have solved this issue, stating in a presentation at this year’s Red Hat Summit that it can serve pooled memory up to 150 meters or farther away with the same or better latency than local memory. The result is an increase in server utilization resulting in increased efficiency, a reduction in the number of servers needed and, in turn, less power consumption.
Majestic is seeking the same ends – that is, greater efficiency, reduced power consumption and better compute ROI – through its own means.
Shacham said the startup is already “deep in development” on its new designs. On the software front, he said it has already shown it can compile any AI model in Hugging Face onto its architecture without modifications. It’s also bustling along with new silicon designs for both processing and memory aggregation and drafting server designs to house it all.
The next 18 months will be key for the company, he added. “We are intending to have our lead customers get their first systems in 2027, which is about 18 months from now, and more general availability after that,” he concluded. “So what’s going to happen in those months is just working day and night to make that happen.”