Creating an AI-Based Customer Segmentation Model Using Intel-Optimized Workflows

Get the Latest on All Things CODE

author-image

作者

Abstract

Segmenting potential customers by behavioral data such as email opens, content preferences, and engagement metrics is crucial to the success of digital marketing organizations. But this work can be time-consuming and inefficient. B2B marketing automation leader Act-On, working with Intel, created a solution that saves marketers time and effort: it uses machine learning (ML) to parse and ingest vast amounts of behavioral data to design and implement customized segmentation models.

Challenge: Removing the Guesswork from Customer Segmentation

Marketers can improve the effectiveness of their campaigns and better target content, cadence, and channel if they can group customers into appropriate segments based on a target’s proven behavioral preferences. Act-On offers a user-driven segmentation engine that allows marketers to select contacts based on profile and behavior data, but this requires marketers to use guess work (otherwise known as “industry best practice”) to decide which criteria to use, experiment with different combinations of criteria, and determine how to define segments. 

Convinced that machine learning could help make sense of the vast amounts of contact profile and behavior data to offer segmentation insights, Act-On turned to Intel to collaborate on enhancing “Act-On Audience Insights”, a feature that uncovers potential segments for marketers based on common profile and behavioral characteristics. 

The vision was to create models that analyze customer data and then create segments for consideration. Example customer segments include “contacts in the manufacturing industry with job title ‘Purchasing’ who’ve attended a webinar and filled out a form in the last 30 days” or “contacts with the lead source ‘Paid Search’ who have visited the website 10+ times and clicked on 5-10 emails in the past 90 days”. While marketers may define these segments themselves, the machine learning model can offer new insights and ideas to prompt their thinking and data exploration.

Solution: Leverage Intel’s AI & Machine Learning Software Portfolio

The project kicked off in March 2023, with Intel and Act-On working together to understand the available data and requirements, gathered from Act-On’s customers and stakeholders. The goal was to use Intel’s easy-to-use, modular AI software, with optimizations based on oneAPI, to create an AI-based customer segmentation model that could be incorporated into the Act-On application for use by marketers exploring their own datasets. The desired deployment timeframe was within 6 months—August 2023.

The good news is that by working together closely, Act-On and Intel were able to meet that tight schedule and get the new feature into the hands of Act-On’s customers by leveraging Intel’s modular AI & Machine Learning software portfolio (Figure 1).

Let's look at how this was done, including how we were able to go from idea to production in such a short period.

Figure 1. Intel works to democratize AI and make AI Everywhere a reality by supercharging machine performance and developer productivity through its easy-to-use, modular Intel AI & Machine Learning Software Portfolio.

Wrangling the Data: Complexity & Scale

Act-On has thousands of customers in different industries across the world and almost tens of billions of rows of behavioral data—email opens and clicks, page visits, content downloads, webinar registrations and attendance, and form submissions—combined with hundreds of millions of rows of contact data.

The first step in this project was to wrangle the data, stored in Snowflake*, into a format that could be used for customer segmentation.

Each individual customer has contact data typically stored in a CRM (Customer Relationship Management) system and synced with the Act-On system. Many customers have common CRM systems (Salesforce*, Microsoft Dynamics*, NetSuite*, etc.) with a completely custom format for their customer data. Act-On decided to focus on the common CRM data first and to limit the profile data to standard fields that most customers use. These include fields such as lead source, status, account type, industry, job title, state, and country.

One challenge is data uniformity and quality, especially as Act-On is relying on end-customers to maintain their own data. This led to some difficult data-cleansing exercises. The Act-On team made decisions for each field on how best to standardize data.

Next, Act-On aggregated individual behavior data (such as each individual form submission or email click) into new fields which contained the totals for each contact for 30, 90 and 180 days. For some behaviors, Act-On was able to provide a calculated rate or rate of change, such as % of emails opened or clicked on, and whether the contact’s open rate or click rate has increased or decreased over time. These measures are a much more relevant indicator of engagement patterns than simple counts of opens and clicks.

Intel provided an unsupervised machine learning pipeline to ingest, process, cluster, and visualize the data and clusters. The pipeline was containerized and deployed on Act-On’s production environment. Unsupervised ML techniques extract insights from the data (i.e., cluster characteristics). The pipeline leverages different algorithms and Intel-optimized ML libraries for speed on Intel® CPUs.

Act-On is an AWS customer, so the deployment of the model was via AWS, which Act-On’s individual customers can be easily onboarded to.

Workflow 

Intel has developed an unsupervised learning workflow. While the workflow is generic for unsupervised learning problems, Intel customized it to support Act-On’s specific problem where data is predominantly categorical. The insights are specific to Act-On’s dataset.

The data visualization and processing pipeline augments the tabular data analysis. The specific work with Act-On serves as a reference kit for unsupervised learning using these components.

The modular design of this architecture supports unsupervised learning algorithms, specification of clustering, and auto-search for the number of clusters. For select algorithms, performance accelerations can be achieved using Intel® Distribution of Modin* for data import and handling, and Intel® Extension for Scikit-learn* for fitting data into supported algorithms while running on Intel hardware. It takes a one-line code change to take advantage of the optimizations in each of those packages. 

The workflow is accelerated on AWS EC2 R7iz, which is powered by 4th Generation Intel® Xeon® Scalable Processors. 

Figure 2 shows different layers of Intel's modular AI software stack and ML architecture diagram:

  • Starting at the foundational layer, Intel Distribution of Modin and Intel Extension for Scikit-learn accelerate computation for supported algorithms on Intel hardware.
  • The domain tool layer contains easy-to-use utilities such as Tabular Data Analyses functions.
  • The workflow layer describes the steps from data ingestion to deployment for customer segmentation workflow.
  • Finally, this case, specific to Act-On, provides a customized pipeline with cluster insights for customer segmentation with visualization utilities.
Figure 2: Machine Learning Architecture Diagram for Unsupervised Learning.

Data Processing

The dataset contains both numerical and categorical features. These features provide information about the profile attributes of contacts and the interaction and behaviors of contacts. Categorical features such as “lead source,” “industry (of contact),” and “job title” are encoded. Numerical features require transformation and binning as additional steps before encoding due to the nature of the interactional data and how businesses interpret such interactions. 

Clustering

For our data, the most suitable algorithm is K-Modes, which is a variation from K-Means, but specifically for use with categorical data. The clustering algorithm automatically searches for the most suitable number of clusters. Alternatively, users can specify the number of clusters.

Deployment

Deployment of the solution was done using Spring Boot and Spring Cloud Data Flow on Kubernetes*. An Audience Service acts as a controller, determining when and for which configurations to start each insights job.

Each job consists of a staging task followed by many different configurations of the clustering task that run in parallel and are wrapped up with a processor task to interpret all the results and post them back to the Audience Service. Here’s the breakdown:

  • The staging task is responsible for first fetching, analyzing, and preprocessing the data.
  • The clustering tasks are then done in parallel. This is the most computationally intensive portion of the workflow.
  • The processor task interprets the results from Intel’s clustering pipeline and segments them using statistical analysis. Business logic is then applied to both the processor task and the Audience Service to determine which segments are the most important to recommend to the user.

Results and Conclusions

The next challenge was to provide the recommended segments in a format within the Act-On user experience that was accessible and easy to understand. The AI Audience Insights page in the Act-On user interface provides marketers with suggested segments based on the AI enhanced customer segmentation model’s results. This is shown below in Figure 3. The user can then create a segment from the suggested query parameters or discard the suggestion. Either way, Act-On gains valuable feedback to further refine the model. 

Figure 3:  The AI Audience Insights User Interface.


To make the feature more powerful, Act-On allows users to customize the settings for the next model run. They can choose which categorical and behavioral features to include. For instance, if a marketer is not interested in webinar attendance but wants to focus primarily on form submissions, they have the power to incorporate that feature into the model.

Figure 4: Settings for AI Audience Insights.


Even more exciting for marketers is the ability to limit the contact data used in the model. Rather than considering all contacts, the marketer can select an existing segment to focus on, such as accounts with revenue about $500M, or leads with an engagement score >40. By focusing initially on engaged leads, marketers can understand the similarities and differences between their leads and tailor their marketing in a more personalized way. 

Another important consideration for marketers is knowing who not to target. Overly saturating unresponsive targets with emails is a risk for marketers because it can impair their overall delivery rates. Act-On users can take advantage of AI Insights to reveal and recommend segments with low engagement that should be suppressed from further marketing efforts.

Once the AI Audience Insights model has generated recommended segments that the marketer wishes to accept, they can create a segment. The resulting segment will appear alongside manually generated segments and can be used to target contacts in mailings and automated marketing programs.

Figure 5: Act-On Audience Insights User Interface.


Collaborating with Intel and leveraging their AI and machine learning frameworks greatly accelerated the project to within 6 months from idea to deployment in a customer environment, a comparatively short time window. This rapid time-to-market helped Act-On deliver a valuable AI feature to their customers quickly and with a much lower internal development cost than would have been required to develop the model from the ground up.  

Specifically, the Intel-optimized clustering algorithm enabled Act-On to process large amounts of data in parallel, making this implementation practical. The flexibility of inputs into the model allowed for easy customization of the features and the resulting number of clusters. The modular toolkits were easily plugged into the model to enhance functionality without engineering effort on the Act-On side.

While Intel’s data scientists brought expertise in machine learning, Act-On’s understanding of the business problem was essential to ensure the right features were selected and the data transformation and binning met business needs. With this collaboration, the model could be optimized.

Future Work

After evaluating customer feedback, Act-On will consider future enhancements to the model. This may include expanding the type of contact profile data available, including custom fields for individual clients.  More specific behavior data could also lead to more targeted insights. For example, instead of knowing that a contact submitted a form in the last 30 days, knowing they submitted the “Request A Price Quote” form would be extremely powerful. Automating the model configuration could also help Act-On surface more varied insights automatically. For instance, if the model runs on Engaged Leads on Tuesday and New Leads on Wednesday, the marketer will not need to tweak the inputs themselves.

Learn More 

About Act-On

Act-On is a B2B marketing automation company that serves thousands of businesses across multiple vertical industries such as manufacturing, education, health care and financial services. These companies use Act-On to create personalized, targeted, and automated marketing journeys and to track prospect and customer engagement with their content. 

Get the Software