NICT Darknet Data Set 2019

Japanese Page (日本語)

tags: Darknet Data Set NICT

A new version of the dataset is now available! (NICT Darknet Dataset 2022)

This dataset had limited usability because the statistical processing was done on each host. NICT Darknet Dataset 2022 does not perform per-host statistical processing but only hashing, so it is considered to be more versatile in its use.

Outline of Data Set

Reference

[1] C. Han, J. Shimamura, T. Takahashi, D. Inoue, M. Kawakita, J. Takeuchi, and K. Nakao. Real-Time Detection of Malware Activities by Analyzing Darknet Traffic Using Graphical Lasso. IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom): Security Track, 2019.


Method to Use


Collect Darknet PCAP Data

Sensor ID #Observed IP Address #Alerts Sensor ID #Observed IP Address #Alerts
A 29,182 (/17) 122 E 8,188 (/19) 198
B 14,593 (/18) 199 F 16,384 (/18) 115
C 4,098 (/20) 146 G 2,044 (/21) 118
D 4,096 (/20) 460 H 2,045 (/21) 276

Darknet Statistical Data

Preprocessing

Statistical Data Processing

  1. Divide the one-month PCAP data after preprocessing into every 10 minutes.

    • 144 PCAP data per day * 31 days * 8 darknet sensors = 35,712 PCAP data in total
  2. Divide one 10-minute PCAP data into every 50 seconds and divide it by source host and count the number of packets. The following figure is an example of one darknet statistical data for understanding.

    • The size of one data is two-dimensional data of “12 unit time samples * the number of source hosts”, and the element indicates the number of packets.
    • Hide the IP address of source hosts.
    • Put UNIX timestamp in the first column.
      Darknet Statistical Data of 10 Minutes
  3. Apply the above 2nd to all PCAP data and save in CSV format.

    • 144 CSV data per day * 31 days * 8 darknet sensors = 35,712 CSV data in total

Summary Data (data.json)


Analysis Result Data


Notes

  1. Darknet may lose data due to temporary trouble in operation. Therefore, the total number of CSV data released is less than 35,712 described in Statistical Data Processing.
  2. Darknet statistical data used in the paper [1] and darknet statistical data to be published in this time differ in their preprocessing and are not same.
  3. Darknet statistical data is two-dimensional data of observation points (source hosts) and time, but it does not include destination TCP port information. The analysis result data is made using the information of destination TCP ports according to the method adopted in the paper [1].

Person in Charge

Cybersecurity Laboratory, Cybersecurity Research Institute, National Institute of Information and Communications Technology (NICT), Japan.

Contact for Inquiries

Contact details for inquiries in relation to use of the Data Set is as follows.


Last updated on Aug 15, 2022
© NICT, Japan.
Chansu Han