Document Type

Article

Publication Date

2017

Subject: LCSH

Data curation, Computer crimes--Investigation, Digital libraries

Disciplines

Computer Engineering | Computer Sciences | Electrical and Computer Engineering | Forensic Science and Technology | Information Security

Abstract

This paper targets two main goals. First, we want to provide an overview of available datasets that can be used by researchers and where to find them. Second, we want to stress the importance of sharing datasets to allow researchers to replicate results and improve the state of the art. To answer the first goal, we analyzed 715 peer-reviewed research articles from 2010 to 2015 with focus and relevance to digital forensics to see what datasets are available and focused on three major aspects: (1) the origin of the dataset (e.g., real world vs. synthetic), (2) if datasets were released by researchers and (3) the types of datasets that exist. Additionally, we broadened our results to include the outcome of online search results.We also discuss what we think is missing. Overall, our results show that the majority of datasets are experiment generated (56.4%) followed by real world data (36.7%). On the other hand, 54.4% of the articles use existing datasets while the rest created their own. In the latter case, only 3.8% actually released their datasets. Finally, we conclude that there are many datasets for use out there but finding them can be challenging.

Comments

© 2017 The Author(s). Published by Elsevier Ltd. on behalf of DFRWS. This is an open access article under CC-BY-NC-ND 4.0

Dr. Baggili was appointed to the University of New Haven's Elder Family Endowed Chair in 2015.

DOI

10.1016/j.diin.2017.06.004

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Publisher Citation

Grajeda, C., Breitinger, F., & Baggili, I. (2017). Availability of datasets for digital forensics–And what is missing. Digital Investigation, 22, S94-S105.

 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.