Unlock the Power of DAOS in Python with PyDAOS

ID 标签 751481
已更新 11/17/2021
版本 Latest
公共

author-image

作者

Introduction

Distributed Asynchronous Object Storage (DAOS) is high-performance storage that pushes the limits of Intel hardware. It's based on Intel Xeon, Persistent Memory and NVMe SSDs. It has been awarded top spots in the IO500 and IO500 10-node Challenge multiple times in the past several years. For more information, please refer to DAOS: Revolutionizing High-Performance Storage with Intel® Optane™ Technology.

This article will show how to easily interface with DAOS from Python through the PyDAOS package. There are two key advantages of using a dictionary stored in DAOS as opposed to a regular Python dictionary. The first is that you can manipulate a gigantic Key-Value (KV) store, given that nothing is stored in local memory. The second is that your dictionary is persistent (no need to sync your data to disk), which means that if you quit your program and reload it, your data will still be there.

It is esential to point out that PyDAOS is a work in progress. For example, only keys and values of type string are currently supported. In addition, the only supported data structure is a dictionary (although arrays will be included in the near future).

Installing DAOS also installs PyDAOS automatically. The location (in Linux) is:

<DAOS_INSTALLATION_DIR>/lib64/python3.6/site-packages/

If such path is not found by Python automatically, you can add it manually using sys:

import sys
sys.path.append("<DAOS_INSTALLATION_DIR>/lib64/python3.6/site-packages/")

This is usually not required if you install DAOS from repository packages.

Pools and Containers

The first thing you will need to use PyDAOS is an existing DAOS pool and container. At the moment, both have to exist beforehand; it is not possible to create pools or containers from the Python API. To create a pool, you run:

$ dmg pool create --label=pydaos --size=1G
Creating DAOS pool with automatic storage allocation: 1.0 GB total, 6,94 tier ratio
Pool created with 100.00%,0.00% storage tier ratio
--------------------------------------------------
  UUID                 : 1a0ce47b-fb70-46e9-9564-c7dfbf43fdd7
  Service Ranks        : 0                                   
  Storage Ranks        : 0                                   
  Total Size           : 1.0 GB                              
  Storage tier 0 (SCM) : 1.0 GB (1.0 GB / rank)              
  Storage tier 1 (NVMe): 0 B (0 B / rank)

That command will create 1GiB pool labeled pydaos (you can choose any other name). Next, we can create our container inside this pool running:

$ daos cont create --type=PYTHON --pool=pydaos --label=kvstore
  Container UUID : a7e1fa95-89f0-4685-8a66-61abf20e57db
  Container Label: kvstore                             
  Container Type : PYTHON

We pass the type PYTHON to indicate that the container will be used from a client written in Python. The type serves to designate a pre-defined layout of the data in terms of the underneath DAOS object model. For example, other available types are HDF5 and POSIX. The label kvstore is arbitrary: you can choose any other name.

PyDAOS Step by Step

First, we have to make sure that we import all the necessary classes:

from pydaos import (DCont, DDict, DObjNotFound)

DCont represents a container, DDict a dictionary, and DObjNotFound is used to catch exceptions raised when objects are not found in a container.

But before we can get or create objects, we have to create a Python container object by passing the pool and container labels:

daos_cont = DCont("pydaos", "kvstore", None)

We can also use the last parameter to create our object using the path to the container in unified namespace:

daos_cont = DCont(None, None, "daos://pydaos/kvstore")

Now we can get (or create) a dictionary object. We can use the DObjNotFound exception to create the dictionary if it doesn't exist:

daos_dict = None
try:
        daos_dict = daos_cont.get("dict-0")
except DObjNotFound:
        daos_dict = daos_cont.dict("dict-0")

Again, the name dict-0 is arbitrary. Now that we have our dictionary object, we can start inserting, reading, and deleting keys against our DAOS container.

Insert a New Key

To insert a new key, use put():

key   = "dog"
value = "perro"
daos_dict.put(key, value)

Get a Key

To get a key, use the [] interface (as in native Python dictionaries):

try:
        value = str(daos_dict[key])
except KeyError:
        print("key not found")

Delete a Key

To delete a key, use pop():

daos_dict.pop(key)

Iterate the Whole Dictionary

We can iterate the whole dictionary as we would do with a native Python dictionary:

for key in daos_dict:
        print("key=" + key + "  value=" + str(daos_dict[key]))

Bulk Insertion

PyDAOS dictionaries allow us to also insert and read in bulk. We can do bulk insertion by passing a Python dictionary to bput():

python_dict = {}
python_dict[key0] = value0
python_dict[key1] = value1
python_dict[key2] = value2
...
daos_dict.bput(python_dict)

Read in Bulk

To read in bulk, we pass a Python dictionary with the keys that we want to read to bget():

python_dict = {}
python_dict[key0] = None
python_dict[key1] = None
python_dict[key2] = None
...
daos_dict.bget(python_dict)

It is also possible to read all keys in bulk with dump():

python_dict = daos_dict.dump()

Total Number of Keys

Finally, we can get the total number of keys stored in our dictionary with len():

print("dictionary has " + str(len(daos_dict)) + " keys")

A complete Example

Now that we have all the pieces, let’s put them together to create a complete example. The example is a simple program (kvmanage.py) to manage a DAOS KV store interactively through a command line interface (CLI):

from pydaos import (DCont, DDict, DObjNotFound)

print("==========================")
print("== KV STORE WITH PYDAOS ==")
print("==========================")

daos_cont = DCont("pydaos", "kvstore", None)
daos_dict = None
try:
    	daos_dict = daos_cont.get("dict-0")
except DObjNotFound:
        daos_dict = daos_cont.dict("dict-0")

while True:
	cmd = input("\ncommand (? for help)> ")
        if cmd == "?":
                print("?\t- print this help")
                print("r\t- read a key")
                print("ra\t- read all keys")
                print("i\t- insert new key")
                print("d\t- delete key")
                print("ib\t- insert new keys in bulk")
                print("rb\t- read keys in bulk")
                print("q\t- quit")
        elif cmd == "r":
                key     = input("key? ")
                try:
                    	print("\tkey: " + key + "\tvalue: " + str(daos_dict[key]))
                except KeyError:
                        print("\tError! key not found")
        elif cmd == "ra":
                print("\ndict len = " + str(len(daos_dict)))
                for key in daos_dict:
                        print("\tkey: " + key + "\tvalue: " + str(daos_dict[key]))
        elif cmd == "i":
                print("\ninserting new key")
                print("(enter nothing for key to skip)")
                value   = ""
                key     = input("key? ")
                while key != "" and value == "":
                        value   = input("value? ")
                        if value != "":
                                daos_dict.put(key, value)
        elif cmd == "d":
                print("\ndeleting key")
                print("(enter nothing for key to skip)")
                key     = input("key? ")
                if key != "":
                        daos_dict.pop(key)
        elif cmd == "ib":
                print("\ninserting new keys in bulk")
                print("(enter nothing for key to finish)")
                python_dict     = {}
                value           = ""
                key             = input ("key[0]? ")
                i               = 0
                while key != "":
                        value   = input("value[" + str(i) + "]? ")
                        if value == "":
                                continue
                        python_dict[key] = value
                        i	+= 1
                        key     = input ("key[" + str(i) + "]? ")
                print("inserting ", end = " ")
                print(python_dict)                
                daos_dict.bput(python_dict)
                print("done")
        elif cmd == "rb":
                print("\nread keys in bulk")
                print("(enter nothing for key to finish)")
                python_dict     = {}
                key             = input ("key[0]? ")
                i               = 0
                while key != "":
                        python_dict[key] = None
                        i	+= 1
                        key     = input ("key[" + str(i) + "]? ")
                print("reading = ", end = " ")
                print(python_dict)
                daos_dict.bget(python_dict)
                print("result = ", end = " ")
                print(python_dict)
        elif cmd == "q":
                break
        print("---")
print("\nend")

The program accepts multiple commands to manage a KV store: read a key, read all keys, insert a new key, delete a key, insert new keys in bulk, read keys in bulk, and quit. The program runs an infinite loop until the user selects the quit command.

For example, to insert a new key:

$ python3 kvmanage.py 
==========================
== KV STORE WITH PYDAOS ==
==========================

command (? for help)> ?
?	- print this help
r	- read a key
ra	- read all keys
i	- insert new key
d	- delete key
ib	- insert new keys in bulk
rb	- read keys in bulk
q	- quit
---

command (? for help)> i

inserting new key
(enter nothing for key to skip)
key? dog
value? perro
---

command (? for help)>

Now we can read all keys and see our newly inserted key:

command (? for help)> ra

dict len = 1
	key: dog	value: b'perro'
---

command (? for help)> 

An Additional Example Using JSON Files

Below is a simple example demonstrating the use of PyDAOS with json files. Traditionally, a user can perform read/write operations in memory, but with the PyDAOS API, we can utilize DAOS’s performance with simple KV store operations.

First create your respective pool and container (which was already sown above), and then verify their creation through some simple commands on the CLI.

$ daos pool query pydaos_json
Pool 31d7b053-b4c6-4c73-8d58-2c221d829815, ntarget=512, disabled=0, leader=1, version=1
Pool space info:
- Target(VOS) count:512
- Storage tier 0 (SCM):
  Total size: 2.0 TB
  Free: 2.0 TB, min:4.0 GB, max:4.0 GB, mean:4.0 GB
- Storage tier 1 (NVMe):
  Total size: 12 TB
  Free: 11 TB, min:22 GB, max:22 GB, mean:22 GB
Rebuild idle, 0 objs, 0 recs
$ daos cont query pydaos_json kvstore
  Container UUID             : 09b2bd13-fd38-4e80-be25-c9af6e4d7605
  Container Label            : kvstore
  Container Type             : PYTHON
  Pool UUID                  : 31d7b053-b4c6-4c73-8d58-2c221d829815
  Number of snapshots        : 0
  Latest Persistent Snapshot : 0
  Highest Aggregated Epoch   : 418681269376745486
  Container redundancy factor: 0
  Snapshot Epochs            :

In json_example.py, we can take currency information from two json files—conversions.json and data.json—and then use that information to display simple exchange rates from the US Dollar to a requested currency. The file conversions.json contains the exchange rates from 29th of October of 2021. These rates are used in conjunction with data from data.json, which contains specific information pertaining to each of these currencies.

We begin by connecting to the pool and container created through the CLI, where the pool’s label is “pydaos_json”, and our container’s “kvstore.” Then we open data.json and store the information into our KV container named “kvstore” through put() operations.

data.json:

{
        "USD": {
                "symbol": "$",
                "name": "US Dollar",
                "symbol_native": "$",
                "decimal_digits": 2,
                "rounding": 0,
                "code": "USD",
                "name_plural": "US dollars"
        },
        "CAD": {
                "symbol": "CA$",
                "name": "Canadian Dollar",
                "symbol_native": "$",
                "decimal_digits": 2,
                "rounding": 0,
                "code": "CAD",
                "name_plural": "Canadian dollars"
        },
        "EUR": {
                "symbol": "€",
                "name": "Euro",
                "symbol_native": "€",
                "decimal_digits": 2,
                "rounding": 0,
                "code": "EUR",
                "name_plural": "euros"
        },
        "AUD": {
                "symbol": "AU$",
                "name": "Australian Dollar",
                "symbol_native": "$",
                "decimal_digits": 2,
                "rounding": 0,
                "code": "AUD",
                "name_plural": "Australian dollars"
        },
        "CNY": {
                "symbol": "CN¥",
                "name": "Chinese Yuan",
                "symbol_native": "CN¥",
                "decimal_digits": 2,
                "rounding": 0,
                "code": "CNY",
                "name_plural": "Chinese yuan"
        },
        "SGD": {
                "symbol": "S$",
                "name": "Singapore Dollar",
                "symbol_native": "$",
                "decimal_digits": 2,
                "rounding": 0,
                "code": "SGD",
                "name_plural": "Singapore dollars"
        }

}

conversions.json:

{
    "provider": "https://www.exchangerate-api.com",
    "WARNING_UPGRADE_TO_V6": "https://www.exchangerate-api.com/docs/free",
    "terms": "https://www.exchangerate-api.com/terms",
    "base": "USD",
    "date": "2021-10-29",
    "time_last_updated": 1635510901,
    "rates": {
        "USD": 1,
        "CAD": 1.23,
        "AUD": 1.33,
        "CNY": 6.39,
        "EUR": 0.857,
        "SGD": 1.34,
    }
}

json_example.py:

from pydaos import (DCont, DDict, DObjNotFound)
from random import randrange
from timeit import default_timer as timer
import ast

daos_cont = DCont("pydaos_json", "kvstore", None )
daos_dict = None
try:
        daos_dict = daos_cont.get("dict-0")
except DObjNotFound:
        daos_dict = daos_cont.dict("dict-0")

with open('data.json') as json_file:
    data = json.load(json_file)

    for i in data:
        daos_dict.put(i,str(data[i]))

    while True:
        cmd = input("\ncommand (? for help)> ")
        if cmd == "?":
            print("\n? - print this help")
            print("\nlc - list currencies")
            print("\nlci - list a specific currency's details")
            print("\ngxr - get exchange rate for an inputted currency to the US Dollar ")
            print("\ncxr - convert exchange rate for an inputted currency to the USD Dollar")
        elif cmd == "lc":
            for i in daos_dict:
                print(i)
        elif cmd == "lci":
            currency = input("Enter a currency: ")
            currency_dict = ast.literal_eval(daos_dict[currency].decode('utf-8'))

            print("The symbol for ", currency, " is ", currency_dict["symbol"])
            print("The name is ", currency_dict["name"], " or ", currency_dict["name_ plural"], ".")
            print("The native symbol is ", currency_dict["symbol_native"])
        elif cmd == "gxr":
            currency = input("what currency? ")
            currency_dict = ast.literal_eval(daos_dict[currency].decode('utf-8'))

            with open('conversions.json') as usd_exchange_rates:
                usd_data = json.load(usd_exchange_rates)
                print("Requested", currency_dict["name_plural"], " whose current exchange rate is ", currency_dict["symbol"], usd_data["rates"][currency], "for $1 USD")
        elif cmd == "cxr":
            currency = input("What currency do you want to convert to? ")
            currency_amount = input("How much USD do you currently have? ")
            currency_dict = ast.literal_eval(daos_dict[currency].decode('utf-8'))

            with open('conversions.json') as usd_exchange_rates:
                usd_data = json.load(usd_exchange_rates)
                converted_currency = int(currency_amount) * usd_data["rates"][currenc y]
                print("Converted ", currency_amount  ,"US Dollars to", converted_curr ency, currency_dict["name_plural"])
        elif cmd == "q":
            break

Running it:

$ python3 json_example.py 

command (? for help)> ?

? - print this help

lc - list currencies

lci - list a specific currency's details

gxr - get exchange rate for an inputted currency to the US Dollar

cxr - convert exchange rate for an inputted currency to the US Dollar

command (? for help)> lci
Enter a currency: EUR
The symbol for  EUR  is  €
The name is  Euro  or  euros .
The native symbol is  €

command (? for help)> gxr
what currency? EUR
Requested euros  whose current exchange rate is  € 0.857 for $1 USD

command (? for help)> cxr
What currency do you want to convert to? EUR
How much USD do you currently have? 150
Converted  150 US Dollars to 128.55 euros

command (? for help)>

Summary

In this introductory article, we showed how to easily interface with DAOS from Python through the PyDAOS package. The PyDAOS dictionary API was presented, describing each operation in detail with small code snippets to ease understanding. After that, two complete working examples were presented. As mentioned in the introduction, the PyDAOS package is still a work in progress and more features will be supported in the future. Stay tuned.

"