Example Log

Julius Davies, July 30th, 2008

This log is a 100% fictional attempt to create a log that conforms to the Chuvakin checklist. For the purposes of the example, please pretend that the Foo application is some kind of central gateway/portal that also has the ability to edit people in LDAP.


Priority format:  *=ERROR, !=IMPORTANT, _=LOW, and space=REGULAR

Log Format:
timestamp                 |priority| delay|event-type     |user/host/IP:port                       |verb           |verb modifier|object (user or system involved)        |reason and/or general notes
2008-01-28/15:00:00.000000000/PDT| | 114ms|VERSION-LINE   |-                                       |VERSION-LINE   |-            |-|"APP=FooApp-1.2.3","WHOAMI=root","HOSTNAME=foo-qa.mycompany/","STARTTIME=2006-10-14/03:45:59.982514000/PST","UPTIME=382-Days","VM=Java HotSpot(TM) Client VM 1.4.2_16-b05","OS=Linux 2.6.25 amd64","CONTAINER=Apache Tomcat/5.0.30","HEAP=69MB/281MB 24.6%","CPU-IDLE=98.2%","DISK-AVAIL=/var=4.2GB 18%; /var/log=523MB 4%"
2008-01-28/15:07:14.232173000/PDT|*|   0ms|SYSTEM         |foo/foo-qa.mycompany/  |DISCONNECTED   |CLIENT-SOCKET|directory/ldap.mycompany/|read timeout
2008-01-28/15:08:14.288111000/PDT|*|   4ms|SYSTEM         |foo/foo-qa.mycompany/  |CONNECTED      |CLIENT-SOCKET|directory/ldap.mycompany/|connect successful, soTimeout=5000
2008-01-28/15:12:37.320676000/PDT|!| 232ms|USER           |slaskin/slot81.vpn.mycompany.ca/|LOGIN-FAILED   |INACTIVITY   |slaskin|password correct, but last activity was over 6 months ago
2008-01-28/15:25:19.953622000/PDT|*|   0ms|SYSTEM         |-                                       |SHUTDOWN       |-            |foo/foo-qa.mycompany/|"APP=FooApp-1.2.3","WHOAMI=root"
2008-01-28/15:28:03.117654000/PDT|*|   0ms|SYSTEM         |-                                       |STARTUP        |-            |foo/foo-qa.mycompany/|"APP=FooApp-1.3.0","WHOAMI=root"
2008-01-28/15:28:03.117654000/PDT| | 334ms|VERSION-LINE   |-                                       |VERSION-LINE   |-            |-|"APP=FooApp-1.3.0","WHOAMI=root","HOSTNAME=foo-qa.mycompany/","STARTTIME=2008-01-28/15:28:03.117654000/PDT","UPTIME=000-Days","VM=Java HotSpot(TM) Client VM 1.4.2_16-b05","OS=Linux 2.6.25 amd64","CONTAINER=Apache Tomcat/5.0.30","HEAP=69MB/281MB 24.6%","CPU-IDLE=98.2%","DISK-AVAIL=/var=4.2GB 18%; /var/log=523MB 4%"
2008-01-28/15:28:05.004592000/PDT|*|   0ms|SYSTEM         |foo/foo-qa.mycompany/  |CONNECTED      |CLIENT-SOCKET|directory/ldap.mycompany/|connect successful, soTimeout=5000
2008-01-28/15:33:17.512318000/PDT| |  82ms|USER           |juliusd/somehost.ca/             |LOGIN-SUCCEEDED|W2K-KERBEROS |juliusd|has memberOf={all,staff,reception,managers,property-mgmt}
2008-01-28/15:33:20.000871000/PDT| |  17ms|USER           |juliusd/somehost.ca/             |ACCESSED       |THRU-GATEWAY |payroll/payroll-qa.mycompany/|allowed because juliusd has memberOf={staff}
2008-01-28/15:33:20.089333000/PDT| | 383ms|TRUSTED_SYSTEM |payroll/payroll-qa.mycompany/|READ           |LDAP         |juliusd|{mail,phone,address,title,department} - allowed because payroll can read={ALL}
2008-01-28/15:38:08.983118000/PDT| |2056ms|ADMIN          |chuvakin/otherhost.ca/           |LOGIN-FAILED   |BAD-PASSWORD |chuvakin|
2008-01-28/15:38:12.223122000/PDT| |2041ms|ADMIN          |chuvakin/otherhost.ca/           |LOGIN-FAILED   |BAD-PASSWORD |chuvakin|
2008-01-28/15:38:18.509854000/PDT|!|2025ms|ADMIN          |chuvakin/otherhost.ca/           |LOGIN-FAILED   |BAD-PASSWORD-LOCKOUT|chuvakin|too many attempts
2008-01-28/15:38:18.509854000/PDT|!|2003ms|ADMIN          |chuvakin/otherhost.ca/           |LOGIN-FAILED   |LOCKOUT-MAINTAINED|chuvakin|old IP, old session
2008-01-28/15:40:00.661450000/PDT|*|   0ms|USER           |juliusd/somehost.ca/             |APP-ERROR      |-            |-|java.lang.NullPointerException (com/mycompany/foo/ldap/Escalate.java:336)
2008-01-28/15:40:01.002110000/PDT|!| 102ms|USER           |juliusd/somehost.ca/             |ACCESS-DENIED  |THRU-GATEWAY |jira/issues.mycompany/|denied because juliusd does not have memberOf={tech OR support}
2008-01-28/15:40:01.212318000/PDT| | 530ms|TRUSTED_SYSTEM |jira/issues.mycompany/      |READ           |LDAP         |juliusd|{mail,groups} - allowed because jira can read={mail,groups,department}
2008-01-28/15:40:01.213333000/PDT| |2005ms|UNKNOWN        |root/otherhost.ca/               |LOGIN-FAILED   |UNKNOWN-USER |root
2008-01-28/15:40:05.662998000/PDT| |2007ms|UNKNOWN        |admin/otherhost.ca/              |LOGIN-FAILED   |UNKNOWN-USER |admin
2008-01-28/15:40:10.001000000/PDT|!|2025ms|UNKNOWN        |guest/otherhost.ca/              |LOGIN-FAILED   |UNKNOWN-USER-LOCKOUT|guest|too many attempts,lockout ip/browser
2008-01-28/15:50:02.090113000/PDT| | 118ms|USER           |juliusd/somehost.ca/             |READ           |LDAP         |henri|{mail,phone,address,title,department,memberOf} - allowed because juliusd is memberOf={staff} and thus can read={mail,phone,address,title,department,memberOf}
2008-01-28/15:50:05.122338000/PDT| | 170ms|USER           |juliusd/somehost.ca/             |EDITED         |LDAP         |henri|{department}[OLD=Payments][NEW=Technology] AS TRUSTED_SYSTEM foo/foo-qa.mycompany/ - allowed because juliusd has memberOf={reception}
2008-01-28/15:50:05.722412000/PDT| | 133ms|USER           |juliusd/somehost.ca/             |GROUP-ADDED    |LDAP         |henri|TO {memberOf=tech} AS TRUSTED_SYSTEM foo/foo-qa.mycompany/ - allowed because juliusd has memberOf={reception}
2008-01-28/15:50:05.992555000/PDT| | 263ms|USER           |juliusd/somehost.ca/             |GROUP-REMOVED  |LDAP         |henri|FROM {memberOf=payments} AS TRUSTED_SYSTEM foo/foo-qa.mycompany/ - allowed because juliusd has memberOf={reception}
2008-01-28/15:50:55.000005000/PDT| |800123ms|USER           |juliusd/somehost.ca/             |CREATED-CONTACT|LDAP         |test@test.com|{name,phone,mail}["Test Contact","604-222-2222","test@test.com"] - allowed because juliusd has memberOf={reception}
2008-01-28/15:51:09.361298000/PDT|!|  22ms|USER           |juliusd/somehost.ca/             |EDIT-FAILED    |LDAP         |henri|{telephone}[OLD=604-541-1118][NEW=604-541-'; DELETE FROM CONTACTS;] - pathologically invalid input
2008-01-28/15:53:17.555329000/PDT| |   3ms|USER           |fred/somehost.ca/                |LOGOFF         |TIMEOUT      |fred|session timeout 4 hrs
2008-01-28/15:55:58.999999000/PDT|!|2004ms|ADMIN          |chuvakin/newhost.ca/             |LOGIN-FAILED   |LOCKOUT-MAINTAINED|chuvakin|new IP, new session
2008-01-28/15:59:00.442111000/PDT| |  77ms|ADMIN          |henri/henrihost.ca/              |LOGIN-SUCCEEDED|PASSWORD     |henri|has memberOf={all,staff,sysadmins,network,payments,vpn-users}
2008-01-28/15:59:32.123456000/PDT| |  41ms|ADMIN          |henri/henrihost.ca/              |UNLOCKED       |-            |chuvakin|henri unlocked chuvakin
2008-01-28/15:59:59.999999000/PDT| |   2ms|ADMIN          |henri/henrihost.ca/              |LOGOFF         |CLICKED      |henri|clicked on logoff button

How does it fare - General Logging Goals

Tell you exactly what happened: when, where, and how.
Where:APP=[FooApp-1.2.3] WHOAMI=[root] HOSTNAME=[foo-qa.mycompany/]
How:ACCESSED [app] - allowed because [user] has memberOf={[group]}
Suitable for manual, semi-automated, an automatated analysis.
Semi-automated:Fixed positions for timestamp, level (! and *), and user/host/ip. The rest of the log should be amenable to full-text indexing.
Automated:Only the final descriptive entry contains free-form text (everything after status verb). All other fields, including the first occurence of "user/host/ip" as well as the status verb (e.g. DENIED, GROUP-ADD) can be reliably extracted.
Can be analyzed without having the application that produced them at hand.
Don't slow the system down.
 Not Sure. Need to be careful with any DNS reverse-lookups - they might block.
Can be proven reliable (if used as evidence).
 Not Sure.

How does it fare - The Checklist

1. Authentication/Authorization Decisions (including logoff)
AuthenticationUSER juliusd/somehost.ca/ LOGIN
AuthorizationLOGIN-FAILED INACTIVITY ALERT - password correct, but last activity was over 6 months ago
Access Controlallowed because juliusd has memberOf={staff}
LogoffUSER fred/somehost.ca/ LOGOFF - session timeout 4 hrs
2. System Access, Data Access
SystemACCESSED payroll/payroll-qa.mycompany/
DataREAD juliusd's {mail,phone,address,title,department}
3. System/Application Changes (especially privilege changes)
a. SystemHOSTNAME=[foo-qa.mycompany/] OS=[Linux 2.6.25 amd64] CONTAINER=[Apache Tomcat/5.0.30]
b. ApplicationAPP=[FooApp-1.2.3] WHOAMI=[root]
c. Privilege ChangesEDITED henri's [...ldap fields...] AS TRUSTED_SYSTEM foo/foo-qa.mycompany/
Note: a. and b. both come from VERSION-LINE, which is logged every hour, so changes in these, while awkward to parse, will show up.
4. Data Changes
a. Add DataCREATED-CONTACT {name,phone,mail}["Test Contact","604-222-2222","test@test.com"]
b. Edit DataEDITED henri's {department}[OLD=Payments][NEW=Technology]
c. Delete DataGROUP-REMOVED henri FROM {memberOf=payments}
5. Invalid Input (possible badness/threats)
EDIT-FAILED henri's {telephone}[OLD=604-541-1118][NEW=604-541-'; DELETE FROM CONTACTS;] - pathalogically invalid input
6. Resources (RAM, Disk, CPU, Bandwidth, any other hard or soft limits)
UPTIME=[382 Days, 10:51:11.231] HEAP=[69MB/281MB 24.6%] CPU-IDLE=[98.2%] DISK-AVAIL=[/var=4.2GB 18%, /var/log=523MB 4%]
It's probably a good idea to keep something like dstat running in the background as well. For example:

dstat -T -m -l -a --output /var/log/dstat/dstat.csv

It looks like this, but in CSV format:
--epoch--- ------memory-usage----- ---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
  epoch   | used  buff  cach  free| 1m   5m  15m |usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
1217445809| 526M 8376k 2418M   92M| 0.4  0.5  0.6|  3   1  89   7   0   0|1284k 1906k|   0     0 | 3.5B   72B| 223  2324
1217445810| 526M 8376k 2418M   92M| 0.4  0.5  0.6|  2   0  98   0   0   0|   0     0 |   0     0 |   0     0 |  48  1463
1217445811| 526M 8376k 2418M   92M| 0.4  0.5  0.6|  1   0  98   0   0   0|   0     0 |   0     0 |   0     0 |  62  1469
This way performance problems can be correlated to the second, rather than to the hour. Use "epoch" instead of "time" so that you have the timezone (epoch is always UTC).
7. Health/Availibility
a. Startups/Shutdowns 2008-01-28/15:25:19.953622000/PDT*SYSTEM foo/foo-qa.mycompany/ SHUTDOWN
2008-01-28/15:28:03.117654000/PDT*SYSTEM foo/foo-qa.mycompany/ STARTUP
b. Faults/Errors*USER juliusd/somehost.ca/ APPLICATION-ERROR java.lang.NullPointerException
c. Delays
d. Backups success/failure
8. Timestamp + TZ (when)
9. System, Application, or Component (where)
a. IP's and contemporaneous DNS lookups of involved parties
b. Names/Roles of involved systems
(what servers are we talking to?)
c. Name/Role of local application
(what is this server?)
10. User (who)
11. Action (what)
12. Status (result)
13. Priority (severity, importance, rank, level, etc)
14. Reason