Searching with Bloom Filters

 

 


BOLT Talks · March 18, 2021

Luís Rodrigues

 

goblindegook.com

Twitter @goblindegook

The problem

 

Letting readers search my static sites.

The solution

 

Lunr
lunrjs.com

That works, but...

 

Visitors had to download an 8 MB index file.

Could I have a search index for less?

Bloom filters

“Is <this thing> in <that group>?”

You’ve likely used them already...

“Is this site in the naughty list?”

“Is this word in that document?”

  • “Definitely not.”
  • “Possibly yes.”

Building blocks

  • An array of n bits.
  • m independent hash functions.

Hash function

 

data (any size)value (fixed size)

Where am I supposed to find
m independent hash functions?

gi(x) = h1(x) + ih2(x) + i2 mod n

Absurdly simple example

(10 bits, 1 hash function)

Add DOG to the filter.

Adding DOG to a simple Bloom filter.

Add CAT to the filter.

Adding CAT to a Bloom filter.

Is DOG in the filter?

Looking up DOG in a Bloom filter: found.

Is MOUSE in the filter?

Looking up MOUSE in a Bloom filter: not found.

Is COW in the filter?

Looking up COW in a Bloom filter: false positive.

Reducing the error rate

(100 bits, 4 hash functions)

Adding DOG to a larger Bloom filter.

Counting Bloom filters

 

Relevant search results by term frequency.

Adding COW to a counting Bloom filter.

Searching the site

 

One Bloom filter per document.

Searching for CAT in the document set.

NPM

@pacote/bloom-search

Try it

 

fantasticmetropolis.com

goblindegook.com

That is all.

Thank you

 

goblindegook.com

Twitter @goblindegook