NICT Darknet Dataset 2022
Dataset overview
This dataset is the darknet traffic data observed by NICTER.
It is a per-packet dataset made under the following conditions.
- Only TCP-SYN packets are released. (To analyze indiscriminate scanning attacks)
- A dataset in CSV format with the following data for each packet.
data type | details |
timestamp | received packet time (UNIX time) |
hash[ip.src.upper16] | hash value of upper 16-bit source IP address |
hash[ip.src.32] | hash value of 32-bit source IP address |
ip.dst.lower16 | lower 16-bit darknet destination IP address |
tcp.dstport | 16-bit TCP destination port number |
Contents of this site
- How to use
If you wish to use this dataset, please see [How to use](#How to use).
- Pseudo data
We present pseudo data in CSV format.
- List of data periods and sensor IDs
Please confirm the period and sensor ID of the data you want to utilize from this list.
In order to ensure the reproducibility of our research, we have released the data for the period used in our research. We plan to continuously add and update the most recent data according to our research activities.
Differences from NICT Darknet Data Set 2019
In NICT Darknet Data Set 2019, statistical processing was done on each host, so its utility was limited. This dataset does not perform per-host statistical processing but only hashing, so it is considered to be more versatile in its use.
How to use
- Users may use this dataset by sending us an e-mail and agreeing to below Terms of Use.
- NICT Darknet Dataset 2022 – Terms of Use
- Our e-mail address
- ✉ csl-ai(at “@” symbol)ml(dot “.”)nict(dot “.”)go(dot “.”)jp
- Please send your request with the subject as “[NICT Cyber Repository] – NICT Darknet Dataset 2022”.
- Please specify the data period and sensor ID.
- About data
- The data are divided into CSV files for each day based on JST (Japan Standard Time).
- The data are compressed with
bzip2
. You can use pbzip2
for quick decompression on multi-core.
tar -I pbzip2 -xf XXX.tar.bz2
- About anlysis
- This dataset cannot be analyzed using raw IP addresses because the source IP addresses are hashed.
- If you wish to perform analysis using raw darknet traffic data including IP addresses and other header information, please consider installing a NICTER Darknet Sensor in your organization and contact us by e-mail.
Pseudo data
This pseudo data is in CSV format.
UNIXTIME | ip.src.upper16 | ip.src.32 | ip.dst.lower16 | tcp.dport |
1640962800.12 | 111.111 | 111.111.1.2 | 100.100 | 23 |
1640962800.21 | 222.222 | 222.222.2.4 | 100.101 | 2323 |
1640962800.21 | 123.123 | 123.123.123.123 | 101.100 | 80 |
1640962800.33 | 121.121 | 121.121.123.123 | 101.101 | 8080 |
1640962800.36 | 2.2 | 2.2.2.2 | 100.102 | 443 |
UNIXTIME,hash[ip.src.upper16],hash[ip.src.32],ip.dst.lower16,tcp.dport
1640962800.12,2fe1ec63c455bd46152926d283e91a8cc4a5fe4f471c27a56f825d046cdf8185,457d5c7b1a91d24d7747179ea793c009f509378781b7aaaa0c1748791b0108e0,100.100,23
1640962800.21,2d9e8afbdd75fd5a3be91f1fa290d4e43c90486a29519ceecd1ca5fd39dce22f,39825211c3134d68dd26708eb73fcad7c7fc3cf65b7a75e7fa8f9ab7c0c0c38e,100.101,2323
1640962800.21,1f5f57cbe46c479aef35f4dcb66d618c38d68fdc3739abe8b5e6fc0a5484c8fb,2ee37d765230eaa9f69a0508f0fc43589111b9e7c1a8ec26cd768d572defc1f6,101.100,80
1640962800.33,cf31089c853c78cfde5c57687cd3613288bd6ffc6c18dcf61a3a8cde7786d8bf,ea9eb9ad3e94e59103d4554332374c2fb19339ff5ba9e263e489edc6ce739f49,101.101,8080
1640962800.36,7f10d3eecd32bfb1c83b81238d42673b5c21b3c5533a6fa7ba7b5e2cf607430f,717aecfa766c462729db6b7443dbf928b61247142e3e575f9f4ba72a04420ff3,100.102,443
List of data periods and sensor IDs
Please check the period and sensor ID of the data you wish to use.
- To ensure the reproducibility of our research, we have released the data for the period used in our research.
For this reason, in every period of data we publish, we include our references that conducted the research using the data.
As mentioned in terms of use, please cite the paper [1] when you publish your work using this dataset. Also, please cite other references as necessary.
- We plan to continuously add and update the most recent data according to our research activities.
Data list
period | darknet sensor ID (scale) | data size | references |
Oct. 2018 (1 month) | A (/17 subnet) | 63GB | [1--4] |
Oct. 2018 (1 month) | B (/18 subnet) | 40GB | [1--4] |
Oct. 2018 (1 month) | C (/20 subnet) | 9.5GB | [1--4] |
Oct. 2018 (1 month) | D (/20 subnet) | 11GB | [1--4] |
Oct. 2018 (1 month) | E (/19 subnet) | 18GB | [1--4] |
Oct. 2018 (1 month) | F (/18 subnet) | 35GB | [1--4] |
Oct. 2018 (1 month) | G (/21 subnet) | 5.4GB | [1--4] |
Oct. 2018 (1 month) | H (/21 subnet) | 5.1GB | [1--4] |
Jun. 2019 -- Oct. 2020 (*1) | A - H | ----- | [1 and 5] |
09/01/2022 (1 day) | D (/20 subnet) | 633MB | [6] |
Jul. 1st - 10th 2023 (9 days) | D (/20 subnet) | 7.1GB | [7] |
- (*1) Data for Jun. 2019 - Oct. 2020 can be provided for some periods upon request.
- For large-scale sensor data (sensors A, B, E, and F), data for up to one week is available.
- For small-scale sensor data (Sensors C, D, G, and H), data for more than one month can be available.
List of references
- [1] C. Han, J. Takeuchi, T. Takahashi, and D. Inoue, ‘‘Dark-TRACER: Early Detection Framework for Malware Activity Based on Anomalous Spatiotemporal Patterns,’’ IEEE ACCESS, 2022. [DOI] [PDF] [Slides]
- [2] C. Han, J. Shimamura, T. Takahashi, D. Inoue, M. Kawakita, J. Takeuchi, and K. Nakao, ‘‘Real-Time Detection of Malware Activities by Analyzing Darknet Traffic Using Graphical Lasso,’’ IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2019. [DOI] [PDF] [Slides]
- [3] C. Han, J. Shimamura, T. Takahashi, D. Inoue, J. Takeuchi, and K. Nakao, ‘‘Real-time Detection of Global Cyberthreat Based on Darknet by Estimating Anomalous Synchronization Using Graphical Lasso,’’ IEICE Transactions on Information and Systems, Vol.E103-D, No.10, pp.2113-2124, Oct. 2020. [DOI] [PDF]
- [4] C. Han, J. Takeuchi, T. Takahashi, and D. Inoue, ‘‘Automated Detection of Malware Activities Using Nonnegative Matrix Factorization,’’ IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2021. [DOI] [PDF] [Slides]
- [5] C. Han, A. Tanaka, and T. Takahashi, ‘‘Darknet Analysis-Based Early Detection Framework for Malware Activity: Issue and Potential Extension,’’ IEEE International Conference on Big Data (Workshop on Big Data for Cybersecurity), 2022. [DOI] [PDF] [Slides]
- [6] C. Han, A. Tanaka, J. Takeuchi, T. Takahashi, T. Morikawa, and T. Lin, ‘‘Towards Long-Term Continuous Tracing of Internet-Wide Scanning Campaigns Based on Darknet Analysis,’’ International Conference on Information Systems Security and Privacy (ICISSP), 2023. [DOI] [PDF] [Poster]
- [7] C. Han, A. Tanaka, T. Takahashi, S. Dadkhah, A. Ghorbani, and T. Lin, ‘‘Traceability Measurement Analysis of Sustained Internet-Wide Scanners via Darknet,’’ IEEE Conference on Dependable and Secure Computing (DSC), Nov 2024.
Person in charge
Cybersecurity Laboratory, Cybersecurity Research Institute, National Institute of Information and Communications Technology (NICT), Japan.
- Senior Manager, Takeshi Takahashi
- Researcher, Chansu Han
For inquiries regarding the use of this dataset, please contact the following.
- ✉ csl-ai(at “@” symbol)ml(dot “.”)nict(dot “.”)go(dot “.”)jp
Acknowledgment
This effort was conducted under a contract of ‘‘MITIGATE’’ among ‘‘Research and Development for Expansion of Radio Wave Resources (JPJ000254),’’ which was supported by the Ministry of Internal Affairs and Communications, Japan.
Last updated on Nov 4, 2024
© NICT, Japan.
Chansu Han
NICT Darknet Dataset 2022
Dataset overview
This dataset is the darknet traffic data observed by NICTER.
It is a per-packet dataset made under the following conditions.
Contents of this site
If you wish to use this dataset, please see [How to use](#How to use).
We present pseudo data in CSV format.
Please confirm the period and sensor ID of the data you want to utilize from this list.
In order to ensure the reproducibility of our research, we have released the data for the period used in our research. We plan to continuously add and update the most recent data according to our research activities.
Differences from NICT Darknet Data Set 2019
In NICT Darknet Data Set 2019, statistical processing was done on each host, so its utility was limited. This dataset does not perform per-host statistical processing but only hashing, so it is considered to be more versatile in its use.
How to use
bzip2
. You can usepbzip2
for quick decompression on multi-core.Pseudo data
This pseudo data is in CSV format.
List of data periods and sensor IDs
Please check the period and sensor ID of the data you wish to use.
Data list
List of references
Person in charge
Cybersecurity Laboratory, Cybersecurity Research Institute, National Institute of Information and Communications Technology (NICT), Japan.
Contact for inquiries
For inquiries regarding the use of this dataset, please contact the following.
Acknowledgment
This effort was conducted under a contract of ‘‘MITIGATE’’ among ‘‘Research and Development for Expansion of Radio Wave Resources (JPJ000254),’’ which was supported by the Ministry of Internal Affairs and Communications, Japan.