Business Results

  • Better customer experience through faster and more accurate search results

  • Less compute time spent on inferencing means lower total cost of ownership (TCO)

  • Up to 158% faster batch encoding

  • Up to 72% faster real-time query search

author-image

作者

View All Reference Kits

Background

The COVID-19 pandemic caused a significant surge in data due to the sudden rush for digitalization of services and remote knowledge-sharing across businesses, making it increasingly difficult to find relevant, precise information quickly.

Semantic search systems are an improvement on traditional keyword-based search systems that enable the use of the contextual meaning within a query to find matching documents more intelligently. These systems have been shown to provide better results and user experiences, especially when queries are specified in a natural language format. The key component in building effective semantic search systems is transforming queries and documents from text form into a form that captures semantic meaning. An AI-powered semantic vector search enables users to find answers, content, and products more accurately than traditional text search systems.

Solution

In collaboration with Accenture*, Intel developed this vertical search engine AI reference kit. Paired with Intel® software, this kit may help customers develop a semantic search system. It transforms queries and documents from a text form into a form that captures semantic meaning by using a BERT model to compute distances between user queries and documents to determine how semantically similar two documents are.

End-to-End Flow Using Intel® AI Software Products

flowchart of a use case

 

The dataset used for demonstration in this reference implementation is a question-answering dataset called HotpotQA1 constructed of a corpus of short answers and a set of queries built on top of the corpus.

For demonstration purposes, we truncated the document corpus for this dataset from approximately two million entries to 200,000 for embedding, and 5,000 for quantization experiments.

This reference solution starts by using the pretrained large language model to convert or embed every single document in the corpus to a dense vector. The resulting embeddings are stored in a file for use in the real-time semantic search component. For these purposes, and because the document corpus is not massive, these are stored in a file as a NumPy array of dimension n_docs x embedding_dim. In larger production pipelines, these can be stored in a database that allows for vector embeddings.

When a new query arrives in the system, it passes into the same pretrained large language model to create a dense vector embedding of it. After loading the NumPy array of document embeddings from the offline component, this solution computes the cosine similarity score between the query and each document. From this, it returns the K- closest document based on the similarity score.

This reference kit includes:
 

  • Training data
  • An open source, trained model
  • Libraries
  • User guides
  • Intel® AI software products

At a Glance

  • Industry: Cross-industry
  • Task: Document encoding to ingest raw document text and transform it into a high-dimensional dense vector embedding to effectively capture semantic meaning. ​Real-time semantic search ingests a raw user query and transforms it into a high-dimensional dense vector embedding. Compute distances between user queries and documents to determine how semantically similar two documents are​.
  • Dataset: HotpotQA1 question-answering dataset constructed of a corpus of short answers and a set of queries built on top of the corpus. Corpus truncated from over two million entries to 200,000 for embedding and 5,000 for quantization experiments.
  • Type of Learning: Deep learning
  • Models: BERT
  • Output:
    • For offline document embedding, the output is a dense vector embedding
    • For real-time semantic search, the output is the top K-closest document in the corpus​
  • Intel AI Software Products:
    • Intel® Extension for PyTorch* with oneAPI Deep Neural Network Library (oneDNN)
    • Intel® Neural Compressor

Technology

Optimized with Intel oneAPI for Better Performance

The vertical search engine model was optimized by the Intel Extension for PyTorch and Intel Neural Compressor for better performance across heterogeneous XPU and FPGA architectures.

Intel Extension for PyTorch and Intel Neural Compressor allow you to reuse your model development code with minimal code changes for training and inferencing.

Performance benchmark tests were run on Microsoft Azure* Standard_D4_v5 using 3rd generation Intel® Xeon® processors to optimize the solution.

Benefits

Improving on traditional keyword-based search systems, semantic search systems enable the use of the contextual meaning within a query to find matching documents more intelligently. An AI-powered semantic vector search enables users to find answers, content, and products more accurately compared to traditional text search systems.

Faster encoding and real-time queries result in less compute time and costs spent to produce search results. Additionally, the increased accuracy in the search results drives increased productivity, better customer experience, and increased user satisfaction.

With Intel® oneAPI toolkits, little to no code change is required to attain the performance boost.

Download Kit

Reference

1. Yang, Zhilin, et al. "HotpotQA: A dataset for diverse, explainable multi-hop question answering." arXiv preprint arXiv:1809.09600 (2018).

Stay Up to Date on AI Workload Optimizations

Sign up to receive hand-curated technical articles, tutorials, developer tools, training opportunities, and more to help you accelerate and optimize your end-to-end AI and data science workflows.

Take a chance and subscribe. You can change your mind at any time.

通过提交此表单,您确认您已年满 18 周岁,并同意就执行此业务请求与英特尔分享您的个人信息。英特尔的网站和通讯受隐私声明使用条款的制约
通过提交此表单,您确认您已年满 18 周岁,并同意就执行此业务请求与英特尔分享您的个人信息。此外,您还同意通过电子邮件和电话订阅来随时了解最新英特尔技术和行业趋势。您可以随时取消订阅。英特尔的网站和通讯受隐私声明使用条款的制约