Search Result Count Prediction

Naver Corp, 2021. 1. ~ Current

Project Summary

Although it varies by search system, almost 50% of search queries have no search results. These noresult queries hit search engine servers of all shards and consume system resources. This phenomenon becomes much harder when a search service has little documents.

If we can predict whether a query has search results in advance and forward the traffic only when search results exist, we can save the search engine resources a lot. Considering that the search engine servers occupy most of the search system, this is a good way to reduce the infrastructure cost.

srcp-architecture

We built the structure like above and started to research about how to predict whether a query has search results or not precisly and efficiently.

Role

Model

Predicting whether a query has search results can be seen as a text binary classification problem. There were lot of model candidates like deep learning, decision tree and so on. But I selected naive bayes for the following reasons.

I skipped the mathmatical details here. I checked the naive bayes model fulfilled the requirements of accuracy well through extensive experiments for many search services. In addition, the naive bayes model inference requires only key-value read so it was easy to serve.

srcp-nb-prediction

Pipeline

To serve srcp models into production search services, we needed a pipeline to continuously update the naivebayes models and track performance of them. So I designed pipelines using airflow.

srcp-pipeline

Results

threshold traffic reduction cpu utilization reduction FN ratio
0.1 30% 20% 1%
0.5 55% 30% 10%

Limitations

Altough SRCP models can reduce the traffic significantly, there’re always risks of false negative cases. If the model predicts that a query has no search results but in reality there are search results, the user will not see the search results. For some sensitive services like shopping seardch, this can leads to serious problem. So we have tried to solve it in multiple ways.

Tech Stack