Software Bills of Materials (SBOM): The Basics

author-image

作者

Modern software development is incredibly complex. In fact, probably the fastest way to catch someone in a lie is if they tell you: “I wrote this entire piece of software, it’s all mine.” Don’t believe it. Software nowadays is always comprised of a combination of components. These components are typically modules and libraries called by other code or even standalone programs that are used in conjunction with other programs.  

Until a few years ago, the “80/20” rule was valid: in any significant piece of software, 80% of the content should not be yours! It makes no economic sense to try to develop more than 20% of any software, since it’s likely someone has already built components with the necessary functionality. Instead, focus on developing what gives you a competitive advantage. In recent years, this balance might have even shifted to 90/10. 

That’s where the Software Bill of Materials (SBOM) comes in: It’s a formal record containing details and supply-chain relationships of all the components used in building software. These components can be open source or proprietary, freely available or paid-for, widely available or access-restricted. The information present in an SBOM can be used in a multitude of ways, helping answer various contractual, legal, or technical queries about the software. 

Dependency, by xkcd, under CC-BY-NC-2.5

Early efforts for providing SBOMs were mostly spearheaded by the desire for legal compliance. Every software component is under a specific license, which might impose some obligations on the use. In order to be legally compliant, one must satisfy all the obligations of all the licenses. This is straightforward, but not easily accomplished. An obvious first step is to have a record of all components and all licenses, which is exactly what an SBOM is.  

However, in the past couple of years, as a result of software supply chain attacks like SolarWinds, the driving force behind SBOM adoption and the need to know the exact components inside each piece of software has been security. We are now in a situation where SBOMs are expected to accompany some types of software delivery. For example, the United States Executive Order (EO) 14028 advises US government agencies to start requiring SBOMs for any hardware or software product they acquire. 

At a conceptual level, an SBOM is like a simple table of contents: it’s a comprehensive list of software components, with information on name, version, origin and possible additional information about licensing, vulnerabilities, provenance, or any other area of interest. Because it can be easily understood, this information can be expressed in several formats: as a table, as a text document, as a spreadsheet, etc. For the information to be useful, the same format should be understood and agreed upon by both members of an exchange.  

More than ten years ago, a group of interested individuals representing various companies started working on the problem of defining a common, standardized format that they called Software Package Data Exchange* (SPDX*). Everyone agreed that this standard should not be a competitive advantage for any specific company, so the work progressed following the open source principles completely, with open participation by anyone who wanted to contribute. 

SPDX is an open standard for communicating SBOM information. Last year it was ratified as the international standard ISO/IEC 5962:2021. The SPDX specification is produced in a collaborative way gathering a large number of participants, organized into working groups according to their interests and expertise. Intel has been an active participant in many groups since the beginning, such as the Technical team defining the SPDX specification, the legal team working on the SPDX License List, and the Outreach team promoting the use of SPDX. 

The approach taken by SPDX is that the information present in an SBOM should be factual. For example, it simply records the license declared for each software component and avoids legal interpretations of license terms or obligations. Another important characteristic of SPDX is that the information can be encoded in a variety of formats, like pure text with minimal structure, JSON, XML, RDF or even spreadsheets.

The structure of an SPDX document is hierarchical: in addition to information relevant to the document itself, like author and date, the information is presented at levels of increasing granularity, corresponding to packages, files, or snippets. Almost all the information at every level is optional, so one can generate an SBOM giving a general view or one containing information in excruciating detail. The flexibility of the format makes it ideal for any number of real-world use cases. For example, a recipient of an SBOM might only be interested in security vulnerability information, while another might care about which licenses the different components are under and the legal obligations they impose. 

A number of tools can handle SPDX documents. Depending on the functionality and the precise point in the software supply chain where the tool operates, one can have a full taxonomy of tools. For example, the SPDX document might be produced while software is being built or it might be generated afterwards by analyzing the software already built. Other tools consume this information and can analyze, transform, compare, or merge SPDX documents. 

Working groups are currently designing the next major release version. SPDX version 3 is a major effort, restructuring the SBOM information into modular, compartmentalized sections. This will make it possible, for example, to have an SBOM with special emphasis on security and vulnerability information and less content on licensing details. Given the ever-increasing use cases for SBOMs, this modular approach is expected to result in more widespread adoption.

Intel is planning to introduce generation of SBOMs to accompany its software offerings in 2023. Meanwhile, Intel will also continue to actively participate in the efforts of defining SPDX. 

This post is based on a recent talk given at the South Tyrol Free Software Conference (SFScon), you can catch the video and check out the slides here.

About the author

Alexios Zavras 

Chief Open Source Compliance Officer 

An open source and free software licensing expert, Alexios is an active participant in the TODO Group, Software Package Data Exchange (SPDX), and OpenChain*. He frequently speaks at industry and academic conferences, including the Open Source Leadership Summit, FOSDEM, and CopyleftConf.