User:Jesse/NewFrecency - MozillaWiki (original) (raw)

Frecency is a measure that combines frequency and recency.

This page describes a frecency measure based on exponential decay, and a way to store and update this measure efficiently.

Contents

Problems with the old algorithm

https://developer.mozilla.org/en/The_Places_frecency_algorithm

The use of buckets and sampling cause the following problems:

Proposed new definition

See also http://mathb.in/708, which is the same idea, but perhaps a simpler way of going about it. See also also http://mathb.in/713, which describes how to do a frecency update without bignums.

The new definition is based on exponential decay. It simplifies the algorithm by removing the need for buckets, sampling, and recomputation. It improves the predictions by making them monotonic and continuous.

Both visit scores and total frecency scores decay with a half-life of 1 month. If typing a URL is worth 2 points, then a typed visit a month ago is worth 1 point.

That takes care of the buckets and sampling. It could be implemented by storing the URL frecency score, and having an idle-daily job that multiplies all scores by (e ^ (-(ln 2) / 30)) = 0.977.

Efficient computation

But with an additional trick, no recomputation is necessary. The trick is to store in the database something with units of date.

With this trick, we can just store now() + t in the database, and we're done.

What's kept from the current algorithm

Possible disadvantages of the new algorithm

Implementation