‘How Cuckoo Filter Can Improve Existing Approximate Matching Techniques’ by Vikas Gupta (Netskope Inc., USA) & Frank Breitinger (University of New Haven, USA)
Best Paper award at ICDF2C 2015, the 7th EAI International Conference on Digital Forensics & Cyber Crime.
—
Gupta and Breitinger focused on a process widely used in digital forensics – approximate matching (chances are that you have heard of this being referred to as “similarity hashing” or “fuzzy hashing”) – a process that helps investigators filter out irrelevant data from large datasets. And these can go up to hundreds of gigabytes, since entire OS structures can be involved. Add to that the rate at which the sizes of our files keep growing, and the need for an efficient and reliable way to sift through the mundane, and quickly get to the suspect files, becomes obvious.
To do this, approximate matching identifies similarities between digital artifacts (do note that we are talking about “similarities”, not 1:1 matches, hence the often-used term “fuzzy”), essentially stripping down files to extract identifying features, and compressing them into similarity digests, a.k.a. fingerprints. These (instead of the actual files) are then used for comparisons to determine their overlap and the final similarity score.
What makes Gupta and Breitinger’s work stand out is that they exchanged the widely accepted and well documented data structure of the Bloom filter for a new and promising Cuckoo filter, testing its relative runtime efficiency, memory usage, and compression. In this first ever effort to use a new fingerprint representation for approximate matching, the authors tried to figure out a better approximate matching algorithm.
And they did.
Want the nitty-gritty? Get the full paper at EUDL.
Learn more about the 2016 edition of ICDF2C here.