The End of Hand-Crafted Systems
My research pursues a fundamental shift: self-designing data and AI systems. By discovering the alphabets and grammars that govern system architectures, we enable machines—not humans—to write the sentences.
A Design Space Beyond Human Reach
The AI revolution is transforming every field and industry, driving unprecedented demand for data-centric computation. As new data types, hardware platforms, and workloads appear faster than ever, the backbone systems that power this revolution must evolve just as quickly.
Yet a single system architecture faces a design space larger than 10100 alternatives. We still cling to a handful of "good" templates, each requiring years of manual design and implementation tuning. It is time to abandon this artisanal practice.
Alphabets, Grammars, Calculators
The breakthrough: model the design space of systems as an alphabet of low-level design primitives, and whole architectures as sentences in a grammar over that alphabet. Systems calculators can then synthesize fresh blueprints on demand.
The Alphabet
Decompose systems into their fundamental design atoms—the smallest decisions that shape how data is laid out, accessed, and processed.
The Grammar
Define rules for how primitives combine into coherent architectures, enabling systematic exploration of the entire design space.
The Calculator
Build engines that navigate this space intelligently, finding optimal designs tailored to specific workloads, hardware, and constraints.
From Theory to Systems That Work
The first engine for interactive data structure design. By capturing the first principles of data layout—how nodes organize data and relate to each other—the Data Calculator explores trillions of previously unknown data structure variants to find optimal layouts without implementation or even hardware access.
Self-designing key-value storage engines that generate novel NoSQL stores running up to three orders of magnitude faster than today's best deployments. Cosine spans designs from LSM-trees to B-trees to hash tables—and trillions of hybrids that exist nowhere in literature or industry.
Extends the self-designing paradigm to vision systems, co-designing entirely new storage formats alongside neural network architectures. By optimizing both together, we achieve order-of-magnitude speedups in end-to-end vision pipelines.
Applies self-designing principles to distributed training of large AI models. These systems invent novel distributed-training algorithms that extract every flop and byte from modern accelerators, automatically adapting to hardware topology and model architecture. TorchTitan ships with PyTorch.
Machines Write the Sentences.
Humans Ask Deeper Questions.
These results signal a future in which systems research increasingly focuses on crafting richer alphabets and grammars while machines write the sentences—freeing designers and researchers to pursue more profound questions.
Practitioners will dial in cost, latency, and accuracy with surgical precision. The era of hand-crafted systems is ending. The era of self-designing systems has begun.
Expanding the Grammar of Intelligence
Building on the foundations of self-designing systems, we are now extending these principles to the full stack of modern AI infrastructure.
RAG Agents
Applying self-designing principles to retrieval-augmented generation, enabling systems that automatically synthesize optimal retrieval strategies, index structures, and agent orchestration patterns tailored to specific knowledge domains.
Managing Context
Developing grammars for context management that allow systems to self-design how they store, compress, retrieve, and reason over long-range dependencies—optimizing the fundamental bottleneck of modern AI systems.
Large Model Compilers
Creating compilers that transform model specifications into optimized execution plans, automatically navigating the vast design space of hardware mappings, parallelization strategies, and memory hierarchies.
Model Fine-Tuning
Extending the calculator paradigm to model adaptation, synthesizing optimal fine-tuning recipes by reasoning over the design space of data selection, parameter-efficient methods, and training dynamics.
Where It All Started: Database Cracking
Systems That Learn from Their Workload
The ideas behind self-designing systems trace back to my PhD work on Database Cracking with my amazing advisors Martin Kersten and Stefan Manegold—a paradigm where data systems continuously adapt their physical storage layout in response to the queries they receive.
Rather than requiring administrators to manually create indexes upfront, cracking systems treat each query as an opportunity to incrementally reorganize data. Over time, the storage layout converges to one that is perfectly tailored to the actual workload—adapting to data properties, query patterns, and hardware characteristics.
Self-designing systems take this philosophy to its logical extreme: if a system can learn to optimize its storage layout, why not learn to optimize its entire architecture?
Stratos Idreos
I am a Professor at Harvard's John A. Paulson School of Engineering and Applied Sciences and Faculty Co-Director of the Harvard Data Science Initiative. I lead DASlab, the Harvard Data and AI Systems Laboratory, where my research pursues a "grammar of data systems"—enabling machines to design and tune systems architectures that are tailored to their context, faster, and more scalable.
Before Harvard, I was a researcher at CWI Amsterdam and earned my PhD from the University of Amsterdam. I have co-chaired ACM SIGMOD 2021 and IEEE ICDE 2022, co-founded the ACM/IMS Journal of Data Science, and currently serve as chair of the ACM SoCC Steering Committee.