Tiered architectures are all around us. You encountered one the last time you boarded an airplane. By dividing the aircraft cabin into multiple classes (or tiers) of service, airlines are attempting to maximize their profit. First Class provides the highest level of service, and consequently, is the most expensive. Economy Class provides a lower service level, but with a corresponding lower price. The classes are sized according to established expectations of customer demand…i.e. most air travelers will prioritize cost over comfort (hence the high-capacity Economy Class), with a small segment of travelers willing to pay more for premium service (hence the smaller First Class). The system works well and, over time, additional sub-classes (e.g. Premium Economy) have emerged to more closely align with customer preferences and extract additional revenue.
Similar tier-based maximization strategies are employed widely in the Engineering domain, and are especially prevalent within Computing architectures. From CPU caches (L1 => L2 => L3) to storage devices (solid state drives => hard disks => tape drives), tiered architectures optimize efficiency by closely matching resource allocation to system requirements. Within each Computing subsystem, there is typically a small allocation of a “premium” resource. For example, the L1 cache within a CPU, or the solid state drives within a storage subsystem. Just like the First Class aircraft cabin, these premium resources provide the highest level of service (where “better service” equates to faster data access speeds), are extremely costly, and are deployed with limited capacity. As you move down the respective hierarchies to L2 caches and hard disks, capacity increases while service level and cost both decrease. As with aircraft classes, this tiered strategy optimally aligns differing resources to maximize efficiency within each subsystem.
Historically, however, one area within Computing has remained stubbornly homogenous…main memory. Also known as “Random Access Memory (RAM)” or “system memory”, main memory holds the applications and data that are being actively used by the CPU. Since its inception, this crucial subsystem has been implemented with a rigid, single-tier architecture, serviced by of a single memory technology known as Dynamic Random Access Memory (DRAM).
DRAM is a premium resource. It provides an extremely high level of service (access latencies in the nanoseconds), but has very high cost (tens of dollars per gigabyte), and can only be deployed in limited capacities (typically 512GB or less in each server). As a result, system architects developed a key workaround to cope with constrained amounts of main memory……swapping data into memory on-demand from storage devices. Loading data from the storage subsystem is slow, but the cost-vs-performance tradeoff made sense. Though sub-optimal, it was the best solution available and worked well enough…until recently.
In recent years, applications and data have dramatically increased in size and complexity to meet the needs of an ever-growing, never-satisfied, information-hungry user base. As a result, finding ways to feed CPUs more efficiently, thereby facilitating faster results for the end user, has become a major focus in the computing industry. This trend is evidenced by the ongoing rise of solid state storage. At its core, the demand for solid state drives (SSDs) is a symptom of the conflict between growing user demands and the constrained nature of main memory. As the size of CPU-relevant datasets increases, more and more data must be loaded from storage during active processing, which creates a need for faster and faster storage devices (i.e. SSDs) to swap from. Unfortunately, faster storage is only a band-aid solution. The pressure to process more data, more quickly, is also driving increasing usage of high-performance, in-memory applications. These applications rely on the data being permanently resident within main memory and they cannot be effectively serviced via data swaps from storage.
Thus far, however, though in-memory applications continue to gain popularity, main memory capacity (constrained to reliance on DRAM technology) has been unable to scale accordingly.
That is now changing. The emergence of new memory technologies is catalyzing deployment of tiered architectures within main memory. As within other domains, multiple memory tiers will coexist with varying service and capacity levels.
It’s likely that these new, complementary technologies will be somewhat slower than DRAM. However, it should be noted that they need only perform sufficiently to meet the needs of the applications relying on them. Today’s single-tier memory subsystem can be likened to our early-stage airline system, when there was only one class of service a very limited number of seats on each plane. To effectively meet a broad-ranging rise in the demand for air travel, it wouldn’t make sense to simply create larger, single-class planes. Instead, aircraft evolved to incorporate the more effective, multi-class system currently in use today. Similarly, to evolve main memory, simply finding ways to add more expensive DRAM (e.g. via networking) is not the answer. Instead, efficient, multi-tiered memory architectures are needed.
The first salvos in this (r)evolution have already been fired. Look no further than Diablo Technologies’ all-flash Memory1 DIMMs and Intel’s upcoming 3D XPoint DIMMs. The emergence of these solutions herald the arrival of tiering within main memory. Moving forward, DRAM will remain as the premium tier, with these complementary technologies offering lower cost, higher capacity, and service levels that closely align with the requirements of the targeted applications.
Ultimately, however, it all comes down to user satisfaction. As users, we’ve been conditioned us to assume that a better, faster, and cheaper experience is always on the horizon:
• Once upon a time, the ability to download static, point-to-point driving directions was considered a major innovation. These days, we expect dynamic, audio-visual navigation, adjusted in real-time to account for traffic and weather conditions.
• Not so long ago, we were amazed by the ability to access music on portable devices. Today, we expect every song from every artist to be instantly available at the press of a button or at the sound of our voice.
• Sequencing the human genome, a major milestone in modern science, was a 10-year, multi-billion dollar international endeavor. Today, our genomes can be sequenced in a matter of hours for less than $1000.
As applications evolve to meet our ever-increasing expectations of intelligence, speed, and convenience, the underlying computing infrastructure is being stressed to its limits. Most notably, the capacity-constrained, legacy memory architecture is struggling to keep pace. Change is needed.
Fortunately, a massive disruption is just around the corner. While the full impact cannot yet be predicted, it’s certain to be conspicuous and far-reaching. Tiered memory is on its way.