This anonymized data set encompasses 9 continuous months and represents 708,304,516 successful authentication events from users to computers collected from the Los Alamos National Laboratory (LANL) enterprise network.
Description
Each authentication event is on a separate line in the form of “time,user,computer” and represents a successful authentication by a user to a computer at the given time. The values are comma delimited.
As an example, here are the first 10 lines of the data set:
1,U1,C1
1,U1,C2
2,U2,C3
3,U3,C4
6,U4,C5
7,U4,C5
7,U5,C6
8,U6,C7
11,U7,C8
12,U8,C9
There are 11,362 users within the data set represented as U plus an anonymized, unique number, and 22,284 computers represented as C plus an anonymized, unique number. Timestamps, with a resolution of 1 second, start at an epoch 1 and all subsequent times are an offset from this epoch. The time frame of the actual data collection is not provided to enhance the anonymization of the data.
Some centralized computers (the Active Directory Servers) and the associated authentication events have been removed.
Data
The data is available both as as one single file with 708,304,516 text lines or 9 files each with 30 days of events. All of the files are compressed with the bzip2 compression algorithm (http://www.bzip.org/).
The data is currently available as a single file for each data source.
Single File:
- lanl-auth-dataset-1.bz2 (2.3G)
- checksums.txt (4.0K)
Multiple files:
- lanl-auth-dataset-1-00.bz2 (215M)
- lanl-auth-dataset-1-01.bz2 (179M)
- lanl-auth-dataset-1-02.bz2 (228M)
- lanl-auth-dataset-1-03.bz2 (177M)
- lanl-auth-dataset-1-04.bz2 (245M)
- lanl-auth-dataset-1-05.bz2 (247M)
- lanl-auth-dataset-1-06.bz2 (273M)
- lanl-auth-dataset-1-07.bz2 (249M)
- lanl-auth-dataset-1-08.bz2 (252M)
- checksums.txt (4.0K)