Engineering stories behind the Medium Daily Digest Algorithm

 Bloom Filters are great tools to make fast and cheap filtering. They also come with plenty of problems and can easily get expensive and cumbersome. We switched to user-based direct database queries, which made our filtering cheaper and easy to maintain. Here’s the full breakdown of that migration.


Intro: This is a 4-part series breaking down improvements to the algorithm behind the Medium’s Daily Digest over the past year. When we started this work, the Digest was suboptimal — and since it’s a huge distribution surface, reaching millions of readers every day, we started working on incremental improvements.


By the end of these projects, the digest was 10% more likely to convert users to paying members, less expensive to run, more flexible and easier to maintain and it’s now providing higher quality recommendations for all our users, including our “power readers”.


https://hackmd.io/@alexaa34/rkxphCej-x

https://medium.com/@alexharris59600/engineering-stories-behind-the-medium-daily-digest-algorithm-76c0ac828de2


This is told through the lens of our engineering team tackling a series of challenges one by one. Medium has a small team but we operate on a big scale. We’re working our way through some technical debt and at the same time, striving to provide the best experience for our readers. This is the source of many interesting challenges.


I hope this series helps you understand how the recommendations algorithm work and can help others who are facing similar technical challenges.


This is probably the most technical story in the series, but I will keep it as simple as possible and hopefully this is interesting for non-technical readers too.


Some Concepts

Here’s a little cheat sheet with some concepts you may need to follow along with this story


Bloom Filters at Medium

A lot of the filters I mention in this series are backed by Bloom Filters (I’ve described some of those filtering rules in Part 1 if you haven’t read it already). We use Bloom filters to remove stories we think won’t interest readers from their feeds and other recommendations:


For example:


  • our “muted” filter removes all stories from writers that you have muted
  • our “read” filter removes all stories that you have already read
  • our “presentation” filter removes all stories that have already been presented to you 3 times or more
  • These all rely on Bloom Filters


So what’s a Bloom Filter?

I asked Claude to summarize Bloom filters in a really simple way and it went with a funny analogy I’m going to try here.


A Bloom filter is like a super-efficient bouncer at a club who has a really good memory but isn’t perfect. It lets you do two things:


let someone into the club

→ in code that would be an add(string) function

lets you ask if someone is in the club. There are two possible answers to this:

→ “yes, probably”

→ “no, definitely not”

→ in code that would be a check(string) --> bool function

That doesn’t sound super useful like that but we’ll see next that it’s actually kinda well suited for recommendation systems.


At Medium we’re using it to store information such as “user a read story x” or “user a muted user b”. We add those to the “club” as strings, like read|user_x|story_y . Later on, when we want to know if user x has already read story y, we just ask our “bouncer”: is read|user_x|story_y in the club?

Comments

Popular posts from this blog

Ultimate Guide to Activate YouTube on Smart TVs & Streaming Devices

How to Update Drivers Automatically in Windows 11

How to Build a Tech Portfolio That Impresses Employers and Lands You a Job in 2026