The Flash and the Furious

Kane Hsieh Jan 17, 2013 Back to blog

If the recent exit of our portfolio company Nexsan is any indication, more and more companies are starting to take notice of low-level server technologies previously left to sys admins and IT wonks.

In 2012, the internet created more than 1,200 exabytes of new data. That’s 1,200,000,000,000 gigabytes, or 150,000,000,000 blu ray discs, which would be a stack that reaches halfway to the moon. The advent of ubiquitous sensing and monitoring and cloud computing has necessitated a lot of clever hardware and software advancements to cope with the firehose of cat pictures and stupid comments flowing through the internet.

 

 

The ability to sift through this data and find meaningful content quickly can make or break a company. Some more than others: high frequency traders hire physics Ph.Ds to optimize chips in order to squeeze every microsecond of performance out of their trading systems. That’s one extreme. For a company like Facebook, speed is the difference between unsynced social updates (a bug that existed even two years ago), or a seamless user experience. Twitter has had to add servers specifically for Justin Bieber. And what New Yorker working in tech hasn't stared blankly at a Tumblr systems maintence page?

So how do you build a fast server? We consider a very simple model of how data moves through a server:

    • A query comes from the internet or a piece of software and hits the processor (CPU)
    • CPU processes the query and retrieves relevant data
    • CPU processes the relevant data
    • CPU pushes data back to the requestor
 

Pretty straightforward model. However, there is a lot of nuance in the second point. How does a CPU get the data, and how does it do it quickly? To understand this, we consider the four places where data can live, in order of decreasing speed:

    • Registers
    • Static Random Access Memory (SRAM) <- “caches”
    • Dynamic Random Access Memory (DRAM) <- colloquilally “RAM” or “Memory”
    • Disk storage
 

Just how much faster is a register from a disk? In terms of orders of magnitude, if you were a CPU, and you need a piece of information, register data would be on the screen in front of you. Data in SRAM would be a short walk away. Data in DRAM would require you to get up and take a cab for 15 minutes to retrieve. And if you wanted disk data you would have to fly to Hong Kong and back.

Registers are incredibly fast, tiny pieces of memory on the CPU die (colloquially "the chip")  that hold small pieces of data the CPU is immediately working on. This is the domain of chipmakers (Intel, Qualcomm, etc) and essentially a black box to programmers. SRAM is a “cache” – slower than registers, but faster than DRAM, it allows the processor put temporarily store important data. SRAM caches are also on the CPU die, but might also be on a motherboard.

The CPU, it’s registers and caches are connected via a “north bridge” to DRAM (“RAM” or “Main Memory” from now on) and through a “south bridge” to disk storage. Memory and storage is where storage companies come into play.

Memory sits on motherboards and the hardware form factor is standardized, so companies focusing on main memory solutions tend to be software based. Memcached is an example of a product that does databasing within main memory, and MemSQL is another. The reason there’s value in software at the memory level is because caches and memory are both relatively small in terms of data capacity; if the data you want is NOT in a cache or memory, you get what’s known as a “thrash”, and your CPU has to go into storage to retrieve it. This might only take milliseconds, but for a processor, that can be millions of wasted computations. Clever software makes sure that the data you'll most likely need will be in main memory when a CPU needs it. But it's not cost effective to store all data in memory, so we need to cheaper, slower disk storage.

For a long time, disk storage has been spinning magnetic hard disks. While incredibly cheap relative to memory, they were much slower – that means expensive servers would be twiddling their thumbs (and burning through power) waiting for data to come from storage. This meant that historically, those that have needed absolutely bleeding edge performance spent a lot of money on main memory.

However, the recent decline in the prices of flash storage (similar to the technology in thumb drives), also known as SSD ("Solid State Disk"), has brought the speed of disk data retrieval a few orders of magnitudes closer to that of main memory data retrieval. SSD is a bit of a misnomer - there is no "disk," in the traditional sense, in an SSD - just chunks of flash storage.

You can’t just drop in flash storage and call it a day though – with exponentially larger data sets, the time it takes to find data within a database (even on flash) can be a bottleneck. That’s why we see software plays in storage, such as 10gen and Aerospike; while these are agnostic to the storage medium (flash versus spinning disk), they work best with SSDs since SSDs are strictly faster than spinning disks - and therefore software and hardware compliment each other at the storage level.

The long and short of it all is that performance is a delicate dance of hardware and software. More efficient performance allowers servers to be run cheaper, and faster performance often correlates directly to higher revenues for companies. Understanding where bottlenecks are in datacenters - and the various hardware and software solutions that can fix them - are valuable to any modern company.

Some of the above examples are based on the excellent course notes of Prof. Stephen Chong, for those of you want a more technical reference. [source]

Comments