Erasure Coding

Learn how Intel® Intelligent Storage Library supports erasure coding that provides access to data in spite of failures.

Hi. I'm Praveen from Intel. In this video, we're going to talk about Intel® Intelligent Storage Acceleration Library erasure coding.

A lot of people intuitively understand RAID. But erasure coding is little more esoteric. Many of the clouds are using erasure recording, especially people who build systems to scale to many nodes, more than 10 to 20.

For these systems, erasure codes make lots of sense because it gives you all of the same redundancy guarantees as triple replication but with half the raw data footprint, or potentially less, depending the way how you configure your erasure-coded system.

Triple replication is the process by which you know a single copy of data is mirrored in at least two other places. Say if one or two whole nodes disappear often at work, you still haven't lost access to the data. Essentially, erasure coding in general will continue to give access to that data even though there are failures.

If we can shrink the cost of providing those access guarantees—in this case, half of the cost in using erasure coding scheme as opposing to triple replication—then that presents giant savings from operating and capital expenditures. Just about anyone, enterprises or hyperscalers who are building systems above a few nodes, can start taking advantage.

The reason they weren't ubiquitously adopted prior to ISA-L [sic] is because they were computationally expensive. The performance [? delta ?] is gigantic.

The reason for ISA-L [sic] to implement these is to enable the people to get these economies of storage media as we started looking towards to the solid-state transformations. Just to give a sense of scale about the performance, looking at the example that I'm going to introduce in this playlist, we see roughly five gigabits per second per code of erasure coding calculations on [? E5E4 ?] that we used in the example, which is quite substantial.

The flip side of the throughput is a latency, and especially [? software-in-use ?] latency. It is a key aspect for people who are building at larger scales. You want to minimize the latency that you can incur in software, as any given operation is split up and parallelized and touching thousands of systems.

That software latency compounds quite tremendously in those parallel systems. Removing thoughtful latency and, as a side effect, getting high throughput is one of the main goals of the Intel ISA-L.

Thanks for watching. Be sure to watch the code sample portion of this example of Intel ISA-L of erasure coding and the rest of the playlist. Don't forget to like this video and subscribe.