Improving the Performance of NumPy Code

Experiments on the runtime of NumPy code

Aniruddha Karajgi
Better Programming

--

Image by author

In a recent project, we decided to use AUC-Recall@K as our primary metric for evaluating our model. We defined it as the proportion of area under an ideal Recall@k curve that our model achieved.

My first implementation used NumPy’s functions, and the performance wasn’t bad at all, but it made me wonder how much better I could have done.. and how much worse.

Table of ContentsThe AUC-Recall@k MetricThe BaseLine ImplementationReplacing the for-loop with NumPy’s functionsRemoving the DataframeUsing Numba

Updates

4.11.22: Add code gists

The AUC-Recall@k Metric

Our metric of choice calculates the proportion of the ideal area under the Recall@k curve our model achieves.

AUC Recall@k = Area under Recall@k / Area under Ideal Recall@k

This required two computations: the ideal recall@k curve and of course, the recall@k curve of our model. Let’s first define what Recall@k is.

Recall@k

This is a popular metric for evaluating recommender systems — and rankings in general — and is often computed in tandem with Precision@k.

Recall@k answers the question:

Out of all the correct recommendations, how many are present in the top-k recommendations of your model?

Recall@k = # correct recommendations in the top k / # total correct

It’s clear that as k increases, so does Recall@k, ultimately reaching a value of 1 when k equals the total number of documents. Note that it may reach 1 for a smaller k as well.

Computing the Recall@k curve

Now that we have a way to compute Recall@k, to get the curve, we compute Recall@k for every k from 0 to the total number of recommendations.

Recall@k Curve = [Recall@0, Recall@1, Recall@2, Recall@3, ..]

Computing the ideal Recall@k curve

The intuition behind the ideal Recall@k curve is that in an ideal world, each relevant document will be ranked higher than the non-relevant one. What would result is a ranking order like this:

[1, 1, 1, ... 1, 0, 0 ... 0]

Every time we increase k, we’re always considering one more relevant document. And as soon as we’ve covered all the relevant documents, our recall@k has already hit 1. After that point, it remains the same since there are no more relevant documents to affect our recall.

Ideal Recall@k = k / total relevant documents
(when k <= # relevant docs)
= 1
(when k > # relevant docs)

More succinctly,

Ideal Recall@k = min(1, k / total relevant documents)

And since we’re interested in the ideal recall@k curve, we’ll extrapolate this for each k.

Computing AUC-Recall@k

Now that we have our two curves — Recall@k and the ideal Recall@k — the AUC score would just be the ratio of their areas.

AUC-Recall@k = Area under Recall@k / Area under the ideal Recall@k

The BaseLine Implementation

Let’s start with a rudimentary implementation to appreciate the performance improvements in subsequent versions. Note that this implementation is intentionally slow and only serves as a representation of poorly written code.

The dataframe

We’ll first create a dataframe of our model’s confidence scores and the target, where 1 represents a relevant document while 0 represents an irrelevant one.

Implementing the computation of the Recall@k curve

To calculate your model’s recall@k, iterate over the list of rankings and calculate the recall at each point.

Implementing the computation of the ideal Recall@k curve

As we had defined in the previous section, the ideal Recall@k curve is a piecewise function. We’ll implement it using NumPy’s numpy.minimum function.

The complete implementation

Now that we have our two curves, we’ll calculate the ratio of their areas to get our final AUC-Recall@k metric. We’ll use NumPy’s numpy.trapz to compute this.

# Computing the final metric by getting the proportion of areas
return np.trapz(recall_at_k) / np.trapz(ideal_recall_at_k)

The entire implementation looks like this:

Replacing the for-loop With NumPy’s functions

One of the best ways to speed up your code is to eliminate loops. NumPy’s performance improvements work when we use its optimized functions.

Computing the Recall@k curve

For example, to calculate the Recall@k curve for our model, we can use NumPy’s cumsum function.

Let’s first store our rankings as a NumPy array. Recall that conf_df was our dataframe of containing the expected predictions and the confidence scores our model assigned to them.

ranking = conf_df["expected"].to_numpy()

The cumsum logic can be implemented like this. In our case, since 1 represents a relevant document, we can calculate the recall by simply counting the number of 1s in the first k ranks.

recall_at_k = (ranking == 1).cumsum() / (ranking == 1).sum()

Computing the ideal Recall@k Curve

While the overall implementation of the ideal curve won’t change here since we’re already using NumPy’s functions, we no longer have to use conf_df since we have our rankings in a NumPy array already (ranking).

We’ll substitute ranking into the existing implementation.

The complete implementation

Our function now looks like this:

Removing the Dataframe

In our previous approach, though we started with a dataframe, we ended up working with NumPy arrays at the end. Removing the dataframe should give us a decent improvement.

If we look at the main reason for creating the dataframe, it was simply to sort the predictions based on our model’s confidence scores. This can be easily accomplished using NumPy. All we need to do is use the ranked indices of one array to reorder the other.

In our case, we’ll sort the confidence scores array in descending order like before and then use those indices to reorder our documents array.

Let’s call the documents array y_true and the confidence scores array y_conf . The sorting can be done with a combination of NumPy’s argsort and indexing.

ranking = y_true[np.argsort(y_conf)[::-1]]

The complete implementation

Using Numba

While running these experiments, I was curious whether there was an easier way to get an additional boost in performance. I came across Numba, which according to them, is:

Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

Their quickstart guide titled “A ~5-minute guide to Numba” mentions that Numba works best with loops and NumPy code but not with pandas. Since we were already using NumPy exclusively in the previous approach, it seemed a good idea to give it a go.

While I haven’t explored all that Numba has to offer, I found their jit decorator, which optimizes any function tagged with.

So, our final Numba approach entails simply tacking on Numba’s jit decorator on our previous function.

The Results

We’ll compare the performance of these approaches over four dataset sizes:

  • 5 documents
  • 100 documents
  • 10k documents
  • 100k documents

The raw values

To run these experiments, I made use of the timeit magic command.

For example, on the 100k dataset, the baseline approach’s performance was calculated like this:

%timeit -o auc_recall_at_k_np_no_df_numba(np.random.randint(0, 2, 100000), np.random.rand(100000))

The final raw times (in milliseconds) were:

Runtimes (ms) — smaller is better

Relative performance

Let’s first look at the relative performance of all our experiments compared to the one with the least time — Numba on five data points. In the graph below, we see that the relative performance of our experiments improves pretty significantly as we go from the baseline approach, which had a for loop, to the approach that used Numba’s performance boost.

Log-scaled Relative Performance over Numba on 5 data points — smaller is better

Key observation

Once we commit to using NumPy (the second experiment onwards), we see a very significant jump in performance. As we improve further, we get diminishing returns.

On 10k and 100k documents

If we zoom into the 10k and 100k dataset sizes, the trend is easier to visualize. The baseline implementation sees the most performance deterioration as we increase our dataset’s size. On the other side of the spectrum, adding Numba’s performance boost gives us a 30–40% faster implementation, which isn’t too bad considering that all we had to do was add four letters at the top of our function.

The data labels in the plots are raw times (ms) below:

Log-scaled Performance over 10k (left) and 100k (right) documents — smaller is better

References and Sample Code

Code

References

Conclusion

This article isn’t meant to be a comprehensive guide on writing better code. Rather, it’s to show what’s possible. While I’m sure there’s further scope for improvement here — for example, by exploring Numba further or storing any repeated function calls in variables — it’s clear how much of a difference replacing loops with NumPy’s built-in functions make.

If nothing else, I hope readers see how something so trivial can save you hours.

--

--