Searching with Bloom Filters
BOLT Talks · March 18, 2021
The problem
Letting readers search my static sites.
The solution
That works, but...
Visitors had to download an 8 MB index file.
Could I have a search index for less?
Bloom filters
“Is <this thing> in <that group>?”
You’ve likely used them already...
“Is this site in the naughty list?”
“Is this word in that document?”
- “Definitely not.”
- “Possibly yes.”
Building blocks
- An array of n bits.
- m independent hash functions.
Hash function
data (any size) → value (fixed size)
Where am I supposed to find
m independent hash
functions?
gi(x) = h1(x) + ih2(x) + i2
mod n
Absurdly simple example
(10 bits, 1 hash function)
Reducing the error rate
(100 bits, 4 hash functions)
Counting Bloom filters
Relevant search results by term frequency.
Searching the site
One Bloom filter per document.