Try Federated Learning with OpenFL

author-image

作者

Open Federated Learning (OpenFL) is a Python* 3 library for federated learning that enables organizations to collaboratively train a model without sharing sensitive information.  

Developed and hosted by Intel, the Linux Foundation and AI & Data Foundation Technical Advisory Council recently accepted OpenFL  as an incubation project.  

Patrick Foley, Deep Learning Software Engineer at Intel and an OpenFL maintainer, gets you started in just a few minutes with a few commands in this video.

The demo is part of a wide-ranging 40-minute conversation about OpenFL with Prashant Shah, Global Head of Artificial Intelligence, Health and Life Sciences at Intel and Arun Gupta, VP and GM of the Open Ecosystem at Intel. The trio talk about “hungry” AI models, lack of diversity in data sets, explain the federated learning model and how OpenFL solves the data silo problem.

 

Getting started

Foley begins with a server in Azure and uses the command-line interface to create a template workspace for the Aggregator Workflow. There are a number of different ways to start your OpenFL work flow, but Foley says “a template is typically where we recommend people start.”

Install a fresh environment with:

pip install openfl

Next steps:

fx workspace create --prefix workspace --template torch_cnn_hystology

Options include TensorFlow* and PyTorch* - Adding -- help to the create command will show a complete list of options.

This demo uses a Pytorch Convolutional Neural Network (CNN) model trained on a healthcare-specific data set of histology that classifies different tissue samples.

Directory Dive

Once you’ve created a directory called “workspace,” underneath it sits a demo directory with a hierarchy.

There are a couple of key folders:
 

  • source directory:
    workspace$ vim src/ 
    you’ll find a model definition:
    pt_cnn.py
    a standard PyTorch model and a base class called PyTorchTaskRunner. This establishes how the model weights are extracted from this model, and then sent back to it during rounds of federated training.
     

“Everything else about this model looks pretty similar to what data scientists are used to -- the model definition, the forward pass -- nothing too surprising if you're used to working with Pytorch,” Foley says.
 

  • Data loader representation. The file: 
    pthystology_inmemory.py 
    is the interface for working with different types of deep learning frameworks and data loaders. In this case, there’s an internal function called to load this histology data set:
    load_histology_shard 
    The data set is downloaded to this directory, and for this demo, it's sharded across the different collaborators. Every collaborator gets a different slice of this data set; you can define the training features as well as the labels. There’s a set of validation features and labels, too.

Federated Learning Plan

The next step is to run:

fx plan initialize

 

This defines the initial plan to start with and downloads data into a local folder. The demo uses a single node, but this process looks very similar when run across distributed infrastructure. As it completes, you’ll see the model being initialized and the initial model that all participants start with on the first round.

The aggregator starts with this base model and then distributes it to all the participants in the federation.  

Take a look at the plan file --  just:

plan.yml

in the plan directory – for a list of settings that all participants agree to before they start federations.

“This is pretty important because it defines the model structure, the different hyperparameters chosen as part of the experiment,” Foley says.  “Think of the plan as a contract -- people know what they'll be training their data on before they actually start running these experiments.”

Set up Security

Security is foundational for federated learning. OpenFL makes Transport Layer Security (TLS) the default implementation for federated learning plans. There’s a built-in private key infrastructure (PKI) mechanism to make things easier for people who don't deep experience with TLS and setting up certificates for themselves.

Type:

fx workspace certify

to create a private certificate authority.  This aggregator node represents the trusted entity for all the collaborators, to verify their certificates against. Now create the aggregator by typing:

fx aggregator generate-cert-request

followed by:

fx aggregator certify

to self-sign the certificate. Repeat for each collaborator (collaborator one, collaborator two) in the demo:

fx collaborator generate-cert-request

Add a shard number for each collaborator. In the demo, each collaborator has half of the downloaded data set.

Signed, Sealed, Delivered

Now you have the certificate signing request. (If you were using OpenFL on multiple nodes, you’d send the certificate signing request (CSR) too.) You should see a zip file that gets packaged here that you can send to the aggregator:

fx collaborator certify

Whoever is acting as a certificate authority signs it, then sends it back.  
The demo runs on only one node, go to the aggregator window and run:

fx collaborator

Type 'yes' to verify that the hash is what you expect. You can this out of band and verify that things look exactly what how you would expect. At this point, the setup is complete: TLS certificates, the data set configuration, the number of collaborators (two) and the initial model are all in place.

Ready Collaborator One

Launch collaborator one by typing:

fx aggregator start

This will launch a gRPC* server. (Note: it will also use gRPC across multiple host as well, that doesn't change depending on where you're running the application.) In the logs, you'll start seeing different tasks performed by the different aggregators.

Every collaborator goes to the aggregator to ask, “What am I supposed to be doing for this round?” For this experiment, collaborators do three different things: download the model, make a forward pass validation of the aggregated model on their local data, then train on local data.

Next, each collaborator reviews the model that's been modified slightly by their local data set, then sends another forward pass before sending all of that information back to the aggregator for averaging or aggregation of those weights back into that single model. That model can then be redistributed to the other collaborators again for another round in the experiment.

Then collaborators perform a backward pass and have their locally trained model, they send that model back to the aggregator where it’s combined. The default implementation is a weighted average depending on the amount of data that's present at each collaborator site. Because these collaborators divide the data equally, they'll have equal weight in terms of their representation in that global model.  

That’s not always the case: For The Federated Tumor Segmentation (FeTS) initiative, they were wildly different data representations between different collaborators, Foley says.

At the end of the round, you can see the accuracy across all of the participants. After round four in the live demo, accuracy reached 74%. “It generally improves over time until it hits that upper bound where training can be stopped after that point,” Foley says. 

In the demo, the experiment runs a total of 20 rounds, “but this serves as an example of when your data is independently and identically distributed across different participants, the types of improvement you can expect to see,” Foley adds.

Check out the whole video on open.intel's YouTube channel.

Get Involved

If you’re ready to do more with OpenFL, here are some resources:
 

  • GitHub* OpenFL (if you’re already an expert, try solving some issues, outside of those marked ‘hackathon’.) 
  • Try out more tutorials
  • Read this blog post explaining how to train a model with OpenFL
  • Check out the online documentation to launch your first federation.
  • Go to the virtual community meetings, here’s the calendar with time zones in multiple regions.