NICT Darknet Data Set 2019
This dataset had limited usability because the statistical processing was done on each host. NICT Darknet Dataset 2022 does not perform per-host statistical processing but only hashing, so it is considered to be more versatile in its use.
Outline of Data Set
Reference
[1] C. Han, J. Shimamura, T. Takahashi, D. Inoue, M. Kawakita, J. Takeuchi, and K. Nakao. Real-Time Detection of Malware Activities by Analyzing Darknet Traffic Using Graphical Lasso. IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom): Security Track, 2019.
Method to Use
- Please check Notes first.
- The User may make use of the Data Set by means of the following method, by sending an e-mail to the Institute agreeing to these Terms and the Common Terms.
- NICT Darknet Data Set 2019 Trems of Use
- AI Data Testbed Common Terms of Use
- The Institute’s e-mail address
- ✉ csl-ai(at “@” symbol)ml(dot “.”)nict(dot “.”)go(dot “.”)jp
- The subject/title of the e-mail should be “[NICT Cyber Repository] – NICT Darknet Dataset 2019”.
Collect Darknet PCAP Data
- We collected PCAP data for traffic observed with eight NICTER darknet sensors in October 2018. The following table shows the ID of each sensor, its observation scale of IP address, and the number of alerts obtained from the analysis results in the paper [1].
Sensor ID |
#Observed IP Address |
#Alerts |
Sensor ID |
#Observed IP Address |
#Alerts |
A |
29,182 (/17) |
122 |
E |
8,188 (/19) |
198 |
B |
14,593 (/18) |
199 |
F |
16,384 (/18) |
115 |
C |
4,098 (/20) |
146 |
G |
2,044 (/21) |
118 |
D |
4,096 (/20) |
460 |
H |
2,045 (/21) |
276 |
Darknet Statistical Data
Preprocessing
- Preprocess the PCAP data collected above.
- Use only TCP-SYN packets.
- The IP address of the source host is counted up to octet2 (upper 16 bits) as one source host.
- Exclude destination TCP ports that constantly receive a large number of packets and source hosts for a long time (e.g., more than one week). Shown below.
TCP Port: 22, 23, 80, 81, 445, 2323, 3389, 5431, 5555, 8080, 50382, 50390, 52869
Statistical Data Processing
-
Divide the one-month PCAP data after preprocessing into every 10 minutes.
- 144 PCAP data per day * 31 days * 8 darknet sensors = 35,712 PCAP data in total
-
Divide one 10-minute PCAP data into every 50 seconds and divide it by source host and count the number of packets. The following figure is an example of one darknet statistical data for understanding.
- The size of one data is two-dimensional data of “12 unit time samples * the number of source hosts”, and the element indicates the number of packets.
- Hide the IP address of source hosts.
- Put UNIX timestamp in the first column.
-
Apply the above 2nd to all PCAP data and save in CSV format.
- 144 CSV data per day * 31 days * 8 darknet sensors = 35,712 CSV data in total
Summary Data (data.json)
- Make a JSON format file (data.json) that summarizes information about all CSV data so that you can easily refer to darknet statistical data.
- Objects
- Timestamp(UNIX, JST): UNIX/JST timestamp
- File: file name
- Error: if there is obviously a data loss in the traffic data, it is true; otherwise, it is false.
- #Host: the number of source hosts
- Size (Byte): data size (Byte)
- Alert: as a result of analysis in the paper [1], if an alert is issued at that time, it is true; otherwise, it is false.
Analysis Result Data
- Darknet statistical data is analyzed by the method of the paper [1], and the obtained result data (alert information) is made in JSON format (alert.json).
- In the paper [1], time periods with a high anomaly degree of cooperation between source hosts are issued as alerts.
- Objects
- Timestamp(UNIX, JST): UNIX/JST timestamp
- Port: target destination TCP port
- Although information of destination TCP ports is not included in darknet statistical data, in the paper [1], target destination TCP ports are specified directly from PCAP data for evaluation.
- Type: type of alert. For more details, please refer to the paper [1].
- 1: Cyberattack, 2: Survey scan, 3: Sporadically-focused traffic
- #Host: the number of source hosts that sent packets to the target TCP port
Notes
- Darknet may lose data due to temporary trouble in operation. Therefore, the total number of CSV data released is less than 35,712 described in Statistical Data Processing.
- Darknet statistical data used in the paper [1] and darknet statistical data to be published in this time differ in their preprocessing and are not same.
- Darknet statistical data is two-dimensional data of observation points (source hosts) and time, but it does not include destination TCP port information. The analysis result data is made using the information of destination TCP ports according to the method adopted in the paper [1].
Person in Charge
Cybersecurity Laboratory, Cybersecurity Research Institute, National Institute of Information and Communications Technology (NICT), Japan.
- Researcher Manager, Takeshi Takahashi
- Senior Researcher, Tao Ban
- Researcher, Chansu Han
Contact details for inquiries in relation to use of the Data Set is as follows.
- ✉ csl-ai(at “@” symbol)ml(dot “.”)nict(dot “.”)go(dot “.”)jp
Last updated on Aug 15, 2022
© NICT, Japan.
Chansu Han
NICT Darknet Data Set 2019
A new version of the dataset is now available! (NICT Darknet Dataset 2022)
This dataset had limited usability because the statistical processing was done on each host. NICT Darknet Dataset 2022 does not perform per-host statistical processing but only hashing, so it is considered to be more versatile in its use.
Outline of Data Set
This data set is the darknet statistical data and analysis result data used for analysis in the paper [1].
Method to Use
Collect Darknet PCAP Data
Darknet Statistical Data
Preprocessing
TCP Port: 22, 23, 80, 81, 445, 2323, 3389, 5431, 5555, 8080, 50382, 50390, 52869
Statistical Data Processing
Divide the one-month PCAP data after preprocessing into every 10 minutes.
Divide one 10-minute PCAP data into every 50 seconds and divide it by source host and count the number of packets. The following figure is an example of one darknet statistical data for understanding.
Apply the above 2nd to all PCAP data and save in CSV format.
Summary Data (data.json)
Analysis Result Data
Notes
Person in Charge
Cybersecurity Laboratory, Cybersecurity Research Institute, National Institute of Information and Communications Technology (NICT), Japan.
Contact for Inquiries
Contact details for inquiries in relation to use of the Data Set is as follows.