Searching with Bloom Filters



BOLT Talks · March 18, 2021

Luís Rodrigues

Twitter @goblindegook

The problem


Letting readers search my static sites.

The solution



That works, but...


Visitors had to download an 8 MB index file.

Could I have a search index for less?

Bloom filters

“Is <this thing> in <that group>?”

You’ve likely used them already...

“Is this site in the naughty list?”

“Is this word in that document?”

  • “Definitely not.”
  • “Possibly yes.”

Building blocks

  • An array of n bits.
  • m independent hash functions.

Hash function


data (any size)value (fixed size)

Where am I supposed to find
m independent hash functions?

gi(x) = h1(x) + ih2(x) + i2 mod n

Absurdly simple example

(10 bits, 1 hash function)

Add DOG to the filter.

Adding DOG to a simple Bloom filter.

Add CAT to the filter.

Adding CAT to a Bloom filter.

Is DOG in the filter?

Looking up DOG in a Bloom filter: found.

Is MOUSE in the filter?

Looking up MOUSE in a Bloom filter: not found.

Is COW in the filter?

Looking up COW in a Bloom filter: false positive.

Reducing the error rate

(100 bits, 4 hash functions)

Adding DOG to a larger Bloom filter.

Counting Bloom filters


Relevant search results by term frequency.

Adding COW to a counting Bloom filter.

Searching the site


One Bloom filter per document.

Searching for CAT in the document set.



Try it

That is all.

Thank you

Twitter @goblindegook