Electrical & Computer Engineering and Computer Science Faculty Publications

Towards Syntactic Approximate Matching-A Pre-Processing Experiment

Doowon Jeong, Korea University - Korea
Frank Breitinger, University of New HavenFollow
Hari Kang, Korea University - Korea
Sangjin Lee, Korea University - Korea

Author URLs

Professor Breitinger's Faculty Profile

Professor Breitinger's web page

Professor Breitinger's full bibliography

Document Type

Article

Publication Date

2016

Subject: LCSH

Cyber forensics, Computer forensics

Disciplines

Computer Engineering | Computer Sciences | Electrical and Computer Engineering | Forensic Science and Technology | Information Security

Abstract

Over the past few years, the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has increased. Especially within the area of bytewise approximate matching, several algorithms were published, tested, and improved. It has been shown that these algorithms are powerful, however they are sometimes too precise for real world investigations. That is, even very small commonalities (e.g., in the header of a file) can cause a match. While this is a desired property, it may also lead to unwanted results. In this paper, we show that by using simple pre-processing, we significantly can influence the outcome. Although our test set is based on text-based file types (cause of an easy processing), this technique can be used for other, well-documented types as well. Our results show that it can be beneficial to focus on the content of files only (depending on the use-case). While for this experiment we utilized text files, additionally, we present a small, self-created dataset that can be used in the future for approximate matching algorithms since it is labeled (we know which files are similar and how).

Comments

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Repository Citation

Jeong, Doowon; Breitinger, Frank; Kang, Hari; and Lee, Sangjin, "Towards Syntactic Approximate Matching-A Pre-Processing Experiment" (2016). Electrical & Computer Engineering and Computer Science Faculty Publications. 60.
https://digitalcommons.newhaven.edu/electricalcomputerengineering-facpubs/60

Publisher Citation

Jeong, D., Breitinger, F., Kang, H. & Lee, Sangjin (2016). Towards Syntactic Approximate Matching-A Pre-Processing Experiment. Journal of Digital Forensics, Security and Law, 11(2), 97-110.

Download

Check your library

Find in your library

Included in

Computer Engineering Commons, Electrical and Computer Engineering Commons, Forensic Science and Technology Commons, Information Security Commons

COinS

Electrical & Computer Engineering and Computer Science Faculty Publications

Towards Syntactic Approximate Matching-A Pre-Processing Experiment

Author URLs

Document Type

Publication Date

Subject: LCSH

Disciplines

Abstract

Comments

Creative Commons License

Repository Citation

Publisher Citation

Included in

Search

Browse

Author Corner

Links

Library Link

Electrical & Computer Engineering and Computer Science Faculty Publications

Towards Syntactic Approximate Matching-A Pre-Processing Experiment

Authors

Author URLs

Document Type

Publication Date

Subject: LCSH

Disciplines

Abstract

Comments

Creative Commons License

Repository Citation

Publisher Citation

Included in

Share

Search

Browse

Author Corner

Links

Library Link