Normalised Discounted Cumulative Gain (NDCG): How To Guide

What is Normalised Discounted Cumulative Gain (NDCG)?

Normalised Discounted Cumulative Gain (NDCG) is a popular evaluation metric used to measure the effectiveness of search engines, recommendation systems, and other information retrieval systems. The primary goal of NDCG is to quantify the quality of ranked lists by considering the relevance and position of the items within those lists. By doing so, NDCG helps determine how well the system returns relevant results higher up in the list, which is crucial for providing a good user experience.

Table of Contents

Why is Normalised Discounted Cumulative Gain (NDCG) Important in Search Engine Optimisation?

In search engine optimisation (SEO) and information retrieval, NDCG is essential because it goes beyond merely checking if relevant items are present in the results. Instead, it assesses how those relevant items are ranked. Users typically pay the most attention to the top results, so ensuring that the most appropriate items appear at the top is critical. This makes NDCG a valuable metric for evaluating and improving the performance of search algorithms.

Normalised Discounted Cumulative Gain (NDCG) in ranking

Normalised Discounted Cumulative Gain (NDCG) vs. Other Metrics

NDCG is often compared with other evaluation metrics such as Precision, Recall, and Mean Average Precision (MAP):

Precision and Recall: These metrics measure the fraction of relevant items retrieved and the fraction of relevant items retrieved out of all applicable items, respectively. However, they do not account for the order of the items, making them less informative about the ranking quality.
Mean Average Precision (MAP): MAP considers the order of the items but averages the precision at each relevant item across multiple queries. While MAP provides a good measure of ranking quality, it does not discount lower-ranked items, which can be less valuable in real-world scenarios where users prioritise higher-ranked results.

In contrast, NDCG incorporates items’ relevance and positions in the ranked list. By discounting the relevance scores based on their positions, NDCG emphasises placing relevant items higher, making it a more comprehensive measure for evaluating ranking performance.

Components of Normalised Discounted Cumulative Gain (NDCG)

Relevance Scores

Relevance scores are numerical values assigned to each item in search results, reflecting how pertinent each item is to the given query. These scores are typically based on user feedback, editorial judgments, or other relevant signals. Higher relevance scores indicate more pertinent results.

For instance, in a binary relevance model, an item might receive a score of 1 for relevance and 0 for non-relevance. In a graded relevance model, scores could range from 0 (irrelevant) to 3 (highly relevant).

Cumulative Gain (CG)

Cumulative Gain (CG) is the sum of the relevance scores of items in a ranked list. It provides a basic measure of the results’ relevance without considering each item’s position. The formula for CG at position p is:

where rel_i is the relevance score of the item at position i.

Discounted Cumulative Gain (DCG)

Discounted Cumulative Gain (DCG) refines CG by applying a discount factor that reduces the contribution of items as their positions in the ranking increase. This reflects the principle that items appearing lower in the ranking are less likely to be viewed by users. The discount factor typically used is logarithmic, which smoothly decreases the weight of relevance scores. The formula for DCG at position p is:

In this formula:

rel_i is the relevance score of the item at position i.
log_2(i+1) is the discount factor, with the base 2 logarithm ensuring a gradual weight reduction.

Normalisation

Normalisation is necessary to make DCG scores comparable across different queries or datasets, as the range of possible DCG scores can vary significantly depending on the number of results and their relevance scores. Normalised Discounted Cumulative Gain (NDCG) achieves this by dividing the DCG of a ranked list by the Ideal Discounted Cumulative Gain (IDCG), the maximum possible DCG for that list.

Ideal Discounted Cumulative Gain (IDCG)

IDCG is calculated by sorting the items in the list by their relevance scores in descending order and then computing the DCG for this ideal ranking. The formula for IDCG at position p is:

where rel_i_ideal is the relevance score of the item at position i in the ideally sorted list.

Calculating Normalised Discounted Cumulative Gain (NDCG)

NDCG is computed by dividing the DCG of the actual ranked list by the IDCG of the ideally ranked list. This normalises the DCG score to a value between 0 and 1, where 1 indicates a perfect ranking. The formula for NDCG at position p is:

Normalised Discounted Cumulative Gain (NDCG) formula

By normalising DCG in this way, NDCG provides a consistent measure of ranking quality that can be compared across different queries and systems. This makes it a powerful tool for evaluating and improving the performance of search and recommendation algorithms.

Calculating Normalised Discounted Cumulative Gain (NDCG): Step-by-Step Guide

In this section, we will walk through calculating Normalised Discounted Cumulative Gain (NDCG) with a detailed, step-by-step example. This will help illustrate how each component of NDCG comes together to provide a measure of ranking quality.

Step 1: Assign Relevance Scores

First, we must assign relevance scores to each item in our ranked list. Let’s consider a search query with the following search results and their relevance scores:

Rank	Item	Relevance Score
1	A	3
2	B	2
3	C	3
4	D	0
5	E	1

Step 2: Calculate DCG

Next, we calculate the ranked list’s Discounted Cumulative Gain (DCG). The formula for DCG at position p is:

Using our example, we calculate DCG as follows:

Breaking this down:

For rank 1:

For rank 2:

For rank 3:

For rank 4:

For rank 5:

Summing these values:

Step 3: Calculate IDCG

To calculate the Ideal Discounted Cumulative Gain (IDCG), we first sort the items by their relevance scores in descending order:

Rank	Item	Relevance Score
1	A	3
2	C	3
3	B	2
4	E	1
5	D	0

Now we calculate IDCG using the same formula as for DCG, but with the ideal ranking:

Breaking this down:

For rank 1:

For rank 2:

For rank 3:

For rank 4:

For rank 5:

Summing these values:

Step 4: Compute NDCG

Finally, we compute the NDCG by dividing the DCG by the IDCG:

Normalised Discounted Cumulative Gain (NDCG) example

This NDCG score indicates that our search results’ ranking is very close to the ideal ranking, with a score of approximately 0.9724 (out of a possible 1.0).

Let’s summarise our example:

Relevance Scores: We assigned relevance scores to each item in the search results.
Calculate DCG: We used the relevance scores to calculate the DCG.
Calculate IDCG: We reordered the items by relevance scores and calculated the IDCG.
Compute NDCG: We divided the DCG by the IDCG to get the NDCG score.

By following these steps, we can evaluate the quality of our search results and make informed decisions to improve our ranking algorithms.

Advantages and Limitations of Normalised Discounted Cumulative Gain (NDCG)

Advantages

Accounts for Ranking Position
NDCG considers the position of relevant items in the ranked list. This is crucial because users are likelier to click on items at the top of the search results. NDCG emphasises placing highly relevant items near the top by applying a discount factor to lower-ranked items.

Supports Graded Relevance
Unlike binary relevance metrics that only differentiate between relevant and non-relevant items, NDCG supports graded relevance. This means it can handle varying degrees of significance, making it more flexible and reflective of real-world scenarios where items might have different levels of usefulness.

Normalised for Comparability
NDCG is normalised, making comparing scores across different queries or datasets possible. By dividing the DCG by the IDCG, NDCG scales the scores to a range between 0 and 1, where 1 indicates a perfect ranking. This normalisation allows for consistent and fair comparisons, even if the number of items or their relevance scores vary.

Widely Used and Recognised
NDCG is a widely accepted metric in both academia and industry. Its use in various research studies and practical applications has established it as a standard for evaluating the performance of search engines and recommendation systems. This widespread recognition adds to its credibility and reliability as an evaluation metric.

Limitations

Computational Complexity
Calculating NDCG can be computationally intensive, especially for large datasets with many items. The need to compute logarithms and perform sorting operations for IDCG calculations can make the process time-consuming, mainly when dealing with large-scale systems.

Sensitivity to Relevance Scores
NDCG’s effectiveness relies heavily on the accuracy and consistency of the relevance scores assigned to items. If the relevance scores are not well-calibrated or are biased, the NDCG score may not accurately reflect the actual quality of the ranking. This can be challenging when obtaining reliable relevance judgments from users or other sources.

Logarithmic Discounting
While the logarithmic discount factor used in DCG is a reasonable approximation of user behaviour, it may not perfectly capture how users interact with search results in all scenarios. Different users or contexts might exhibit distinct patterns of interaction, which the standard logarithmic discounting might not wholly accommodate.

Not Always Intuitive
Interpreting NDCG scores can be less intuitive than more straightforward metrics like Precision or Recall. Understanding the impact of the discounting and normalisation processes requires a more in-depth knowledge of the metric, which might be a barrier for some practitioners.

Handling of Ties
NDCG can sometimes struggle with handling ties in relevance scores. If multiple items have the same relevance score, their exact ordering might not significantly impact the NDCG score, potentially leading to ambiguities in the evaluation.

While NDCG has limitations, its advantages make it a powerful and widely used metric for evaluating the quality of ranked lists. By accounting for item position and graded relevance, NDCG provides a nuanced and comprehensive measure of ranking performance. However, to fully leverage its benefits, it is essential to know its computational demands and the need for accurate relevance scores.

Applications of Normalised Discounted Cumulative Gain (NDCG)

Search Engines

Evaluating Search Results

Search engines extensively use NDCG to evaluate and refine their ranking algorithms. By measuring how well the search results align with user expectations regarding relevance and ranking position, search engines can assess the quality of their results and make necessary adjustments to improve user satisfaction.

mean average precision is used to evaluate search engine rankings

A/B Testing of Algorithms

Search engines frequently conduct A/B tests to compare ranking algorithms or updates to an existing algorithm. NDCG is a critical metric in these tests, indicating which version performs better in delivering relevant results to users.

Recommendation Systems

Optimising Recommendations

Recommendation systems, such as those used by e-commerce sites, streaming services, and social media platforms, leverage NDCG to optimise the order in which items are presented to users. These systems enhance user engagement and satisfaction by focusing on placing the most relevant recommendations at the top.

Personalised Content Delivery

NDCG is also used to evaluate the effectiveness of personalised content delivery. By measuring how well the recommendations match users’ individual preferences and interests, companies can fine-tune their recommendation algorithms to provide more personalised and relevant suggestions.

Information Retrieval Research

Benchmarking and Comparison

In academic research, NDCG is commonly used as a benchmark metric for comparing different information retrieval models and techniques. Researchers can evaluate new models against established baselines using NDCG, ensuring that improvements in ranking quality are rigorously assessed.

Relevance Feedback Experiments

Researchers often conduct experiments to gather relevant feedback from users or simulated environments. NDCG provides a quantitative measure to evaluate how well the feedback-driven models deliver relevant results, helping to advance the field of information retrieval.

E-commerce Platforms

Product Search Optimisation

E-commerce platforms use NDCG to optimise their product search algorithms, ensuring that the most relevant products appear at the top of search results. This improves user experience and increases the likelihood of conversions and sales.

RBM collaborative filtering is often used in recommendation systems

Category and Filter Navigation

NDCG helps e-commerce platforms evaluate and improve the effectiveness of category and filter navigation systems. By assessing how well the filtered and categorised results match user expectations, platforms can enhance their navigation systems to make product discovery more accessible and intuitive.

Online Advertising

Ad Ranking and Placement

In online advertising, NDCG is used to evaluate and optimise the ranking of ads. By ensuring that the most relevant ads are shown at the top, advertisers can increase click-through rates (CTR) and conversion rates, maximising the return on investment for ad campaigns.

Quality Score Measurement

Ad platforms often use NDCG in their quality score calculations, influencing ad placements and pricing. By incorporating NDCG, these platforms can ensure that higher-quality and more relevant ads are favoured, leading to a better user experience and more effective advertising.

Social Media Platforms

Feed Ranking Optimisation

Social media platforms utilise NDCG to optimise content ranking in users’ feeds. By placing the most relevant and engaging posts at the top, platforms can enhance user engagement and satisfaction, encouraging more active participation and longer time spent on the platform.

Academic and Industrial Research

Developing New Algorithms

Researchers in academia and industry use NDCG to develop and evaluate new ranking algorithms. By providing a rigorous and standardised measure of ranking quality, NDCG helps researchers ensure that their new algorithms offer genuine improvements over existing methods.

Comparative Studies

NDCG is often employed in comparative studies to assess the performance of different retrieval models, algorithms, and systems. These studies help identify the strengths and weaknesses of various approaches, driving innovation and improvement in information retrieval.

Conclusion

In this blog post, we have explored the concept of Normalised Discounted Cumulative Gain (NDCG), a powerful and widely used metric for evaluating the quality of ranked lists in search engines, recommendation systems, and other information retrieval applications. We discussed the components of NDCG, including relevance scores, Cumulative Gain (CG), Discounted Cumulative Gain (DCG), and Ideal Discounted Cumulative Gain (IDCG). We also provided a step-by-step guide to calculating NDCG and examined its advantages and limitations.

Summary of Key Points

NDCG Definition and Purpose: NDCG measures the effectiveness of ranking algorithms by considering the relevance and position of items in a ranked list.
Components of NDCG: The metric comprises relevance scores, CG, DCG, and IDCG, with normalisation making the scores comparable across different queries and datasets.
Calculation: We provided a detailed example to demonstrate how to calculate NDCG, including assigning relevance scores, calculating DCG, determining IDCG, and computing the final NDCG score.
Advantages and Limitations: NDCG accounts for ranking position and supports graded relevance, but it can be computationally complex and relies on accurate relevance scores.
Applications: NDCG is used in various domains, including search engines, recommendation systems, e-commerce, online advertising, social media, and academic research.

Importance of Normalised Discounted Cumulative Gain (NDCG)

NDCG is essential for evaluating and improving the performance of ranking systems. By providing a nuanced measure that considers both the relevance and order of items, NDCG helps ensure that the most pertinent results are prominently displayed, enhancing user satisfaction and engagement. Whether used for search engines, recommendation systems, or other applications, NDCG plays a critical role in optimising the quality of information retrieval.

Future Directions

As information retrieval and ranking systems continue to evolve, so will the methods for evaluating their performance. Future research and development may address NDCG’s limitations, such as improving computational efficiency and refining relevance scoring techniques. Additionally, new metrics and hybrid approaches may emerge to complement NDCG, providing even more comprehensive evaluations of ranking quality.