Innovating a post-Moore's Law architecture for AI
It’s Monday, so once again here comes the end of Moore's Law: the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years. How do we know it’s over? In recent years the law simply hasn’t proven true.
The traditional methods of scaling down transistor size are reaching physical limits, making it increasingly difficult to continue the rapid advancements in performance and power efficiency that we have seen in the past. As you try to put more and more transistors in an IC, then complexity goes up, and the issues go up too. And so less and less companies can afford to migrate to the latest process nodes for manufacturing.
The optimizations that we can deliver are now quite limited: you can only put so many transistors on an integrated circuit. As you move to smaller geometries for chip manufacture, certain key microprocessor functions are hard to scale – the benefits of smaller geometry are offset by increased complexity of physical layout required, whereby you end up adding more transistors and therefore reduce the benefits of the smaller geometry in terms of latency and power consumption. You’ve hit the buffers.
Dennard scaling, which projected that power consumption per unit chip area would remain constant as transistor density increased, is also failing. To say this is a touchy subject for chip designers whose customers are used to regular performance increases is an understatement. One IP company recently went so far as to publish a blog begging customers to stop asking them about PPA (Power Performance & Area) – which tends to be the one thing most chip makers care about above all else.
As complexity and cost continue to skyrocket at each new node, the value of scaling no longer applies to most designs, and the economic basis for scaling erodes for all but the largest-volume chips. Moore’s Law describes significant integration and scaling that were never imaginable before. But at the same time, focusing on that integration sucked in all the talent to do mostly scaling and integration types of engineering, instead of creating new innovations. Confronting the end of Moore’s law should enable a new era of creativity, I hope. What should we create? New architectures that will deliver the benefits we once got from Moore’s Law but using advancements in software and co-optimised silicon to do so, rather than relying on improved manufacturing.
RED Semiconductor is founded on the belief that we need new approaches and innovations in computer architecture to overcome these challenges and continue driving progress in the industry.
We think the fundamental places we can design better architectures are:
-
A more efficient instruction set philosophy: a small number of instructions, all of which can be used as scalar (performing a single operation on a single data item at a time) or be loop-vectorised (operating on multiple data elements simultaneously.)
-
Complex repetitive tasks, such as algorithms used for AI (Artificial Intelligence), security etc., should be accomplished using far fewer instructions with an efficient loop-vector scheme.
-
Fewer instructions result in smaller binaries, smaller chips, and the highest execution performance per clock cycle and per watt of power consumed.
-
Reduction or elimination of memory bandwidth, moving less data, especially out to external memory, which cuts power consumption and wasted clock cycles waiting for memory access. (This also eliminates an attack surface for cyber security breaches.)
Architectural improvements have paved the way for new computing paradigms, such as heterogeneous chips, where different types of processors and accelerators work together synergistically to deliver superior performance for specific tasks. With our approach you can design highly heterogeneous systems with a single common ISA (Instruction Set Architecture, the instructions and programming conventions that define the interface between the hardware and software of a computer system).
The father of RISC (Reduced Instruction Set Computing, a computer architecture that focuses on simplicity and efficiency), David Patterson, said: “It’s conventional wisdom among computer architects that the only thing we haven’t tried is domain-specific architectures. That idea is relatively simple: It’s that you design a computer that does one domain really well, and you don’t worry about the other domains. If we do that, we can get giant improvements.” Examples of domain-specific architectures include graphics processing units (GPUs) for graphics-intensive applications and chips for specialized tasks like cryptography or machine learning. He's not wrong, AI specific domain processors have exceeded the performance gains predicted by Moore’s Law by a considerable margin, driven by their application specific nature and a lot of low-hanging AI related optimisations. But while domain specificity might deliver very fast implementations for specific applications, it does not deliver as much flexibility, and therefore is not the solution for devices that need to run a wider range of applications.
My personal favourite domain specific processor is the GPU. Why? It pushes at the boundaries of what domain specific means. GPUs want to be more and have already broken free of their original domain of pushing pixels to enable AI. I propose that we resurrect the CPU (Central Processing Unit) in a new form. One capable of running both general purpose and algorithm specific code to a very high degree of efficiency. No extensions, just a profoundly efficient and flexible architecture that does the best things that a CPU or GPU can, and that can take us through the decades to the next major paradigm shift, which is perhaps quantum.
This is why RED Semiconductor created VISC (Vector Instruction Set Computing), a new kind of processor that speeds up cryptographic and AI applications through the power of loop-vectorisation.
To continue driving progress, new approaches and innovations in architecture are needed. Designing architectures with fewer instructions, smarter execution through vectorization, and reduced memory bandwidth, can put us back on track and deliver the ongoing performance gains, beyond Moore’s Law, that AI has so far enjoyed.