Document Type


Publication Date


Subject: LCSH

Cyber forensics, Computer forensics


Computer Engineering | Computer Sciences | Electrical and Computer Engineering | Forensic Science and Technology | Information Security


Over the past few years, the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has increased. Especially within the area of bytewise approximate matching, several algorithms were published, tested, and improved. It has been shown that these algorithms are powerful, however they are sometimes too precise for real world investigations. That is, even very small commonalities (e.g., in the header of a file) can cause a match. While this is a desired property, it may also lead to unwanted results. In this paper, we show that by using simple pre-processing, we significantly can influence the outcome. Although our test set is based on text-based file types (cause of an easy processing), this technique can be used for other, well-documented types as well. Our results show that it can be beneficial to focus on the content of files only (depending on the use-case). While for this experiment we utilized text files, additionally, we present a small, self-created dataset that can be used in the future for approximate matching algorithms since it is labeled (we know which files are similar and how).


Copyright (c) 2016 Journal of Digital Forensics, Security and Law This work is licensed under a Creative Commons Attribution 4.0 International License.

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher Citation

Jeong, D., Breitinger, F., Kang, H. & Lee, Sangjin (2016). Towards Syntactic Approximate Matching-A Pre-Processing Experiment. Journal of Digital Forensics, Security and Law, 11(2), 97-110.

Check your library



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.