Julius Davies, July 30th, 2008
This log is a 100% fictional attempt to create a log that conforms to the Chuvakin checklist. For the purposes of the example, please pretend that the Foo application is some kind of central gateway/portal that also has the ability to edit people in LDAP.
/var/log/foo/foo.log.2008-01-28.15PDT
Priority format: *=ERROR, !=IMPORTANT, _=LOW, and space=REGULAR
Log Format:
timestamp |priority| delay|event-type |user/host/IP:port |verb |verb modifier|object (user or system involved) |reason and/or general notes
2008-01-28/15:00:00.000000000/PDT| | 114ms|VERSION-LINE |- |VERSION-LINE |- |-|"APP=FooApp-1.2.3","WHOAMI=root","HOSTNAME=foo-qa.mycompany/192.168.1.5","STARTTIME=2006-10-14/03:45:59.982514000/PST","UPTIME=382-Days","VM=Java HotSpot(TM) Client VM 1.4.2_16-b05","OS=Linux 2.6.25 amd64","CONTAINER=Apache Tomcat/5.0.30","HEAP=69MB/281MB 24.6%","CPU-IDLE=98.2%","DISK-AVAIL=/var=4.2GB 18%; /var/log=523MB 4%"
2008-01-28/15:07:14.232173000/PDT|*| 0ms|SYSTEM |foo/foo-qa.mycompany/192.168.1.5:33912 |DISCONNECTED |CLIENT-SOCKET|directory/ldap.mycompany/192.168.1.2:636|read timeout
2008-01-28/15:08:14.288111000/PDT|*| 4ms|SYSTEM |foo/foo-qa.mycompany/192.168.1.5:33913 |CONNECTED |CLIENT-SOCKET|directory/ldap.mycompany/192.168.1.2:636|connect successful, soTimeout=5000
2008-01-28/15:12:37.320676000/PDT|!| 232ms|USER |slaskin/slot81.vpn.mycompany.ca/3.1.2.81|LOGIN-FAILED |INACTIVITY |slaskin|password correct, but last activity was over 6 months ago
2008-01-28/15:25:19.953622000/PDT|*| 0ms|SYSTEM |- |SHUTDOWN |- |foo/foo-qa.mycompany/192.168.1.5|"APP=FooApp-1.2.3","WHOAMI=root"
2008-01-28/15:28:03.117654000/PDT|*| 0ms|SYSTEM |- |STARTUP |- |foo/foo-qa.mycompany/192.168.1.5|"APP=FooApp-1.3.0","WHOAMI=root"
2008-01-28/15:28:03.117654000/PDT| | 334ms|VERSION-LINE |- |VERSION-LINE |- |-|"APP=FooApp-1.3.0","WHOAMI=root","HOSTNAME=foo-qa.mycompany/192.168.1.5","STARTTIME=2008-01-28/15:28:03.117654000/PDT","UPTIME=000-Days","VM=Java HotSpot(TM) Client VM 1.4.2_16-b05","OS=Linux 2.6.25 amd64","CONTAINER=Apache Tomcat/5.0.30","HEAP=69MB/281MB 24.6%","CPU-IDLE=98.2%","DISK-AVAIL=/var=4.2GB 18%; /var/log=523MB 4%"
2008-01-28/15:28:05.004592000/PDT|*| 0ms|SYSTEM |foo/foo-qa.mycompany/192.168.1.5:33915 |CONNECTED |CLIENT-SOCKET|directory/ldap.mycompany/192.168.1.2:636|connect successful, soTimeout=5000
2008-01-28/15:33:17.512318000/PDT| | 82ms|USER |juliusd/somehost.ca/2.2.2.2 |LOGIN-SUCCEEDED|W2K-KERBEROS |juliusd|has memberOf={all,staff,reception,managers,property-mgmt}
2008-01-28/15:33:20.000871000/PDT| | 17ms|USER |juliusd/somehost.ca/2.2.2.2 |ACCESSED |THRU-GATEWAY |payroll/payroll-qa.mycompany/192.168.1.100|allowed because juliusd has memberOf={staff}
2008-01-28/15:33:20.089333000/PDT| | 383ms|TRUSTED_SYSTEM |payroll/payroll-qa.mycompany/192.168.1.100|READ |LDAP |juliusd|{mail,phone,address,title,department} - allowed because payroll can read={ALL}
2008-01-28/15:38:08.983118000/PDT| |2056ms|ADMIN |chuvakin/otherhost.ca/2.2.2.3 |LOGIN-FAILED |BAD-PASSWORD |chuvakin|
2008-01-28/15:38:12.223122000/PDT| |2041ms|ADMIN |chuvakin/otherhost.ca/2.2.2.3 |LOGIN-FAILED |BAD-PASSWORD |chuvakin|
2008-01-28/15:38:18.509854000/PDT|!|2025ms|ADMIN |chuvakin/otherhost.ca/2.2.2.3 |LOGIN-FAILED |BAD-PASSWORD-LOCKOUT|chuvakin|too many attempts
2008-01-28/15:38:18.509854000/PDT|!|2003ms|ADMIN |chuvakin/otherhost.ca/2.2.2.3 |LOGIN-FAILED |LOCKOUT-MAINTAINED|chuvakin|old IP, old session
2008-01-28/15:40:00.661450000/PDT|*| 0ms|USER |juliusd/somehost.ca/2.2.2.2 |APP-ERROR |- |-|java.lang.NullPointerException (com/mycompany/foo/ldap/Escalate.java:336)
2008-01-28/15:40:01.002110000/PDT|!| 102ms|USER |juliusd/somehost.ca/2.2.2.2 |ACCESS-DENIED |THRU-GATEWAY |jira/issues.mycompany/192.168.1.18|denied because juliusd does not have memberOf={tech OR support}
2008-01-28/15:40:01.212318000/PDT| | 530ms|TRUSTED_SYSTEM |jira/issues.mycompany/192.168.1.18 |READ |LDAP |juliusd|{mail,groups} - allowed because jira can read={mail,groups,department}
2008-01-28/15:40:01.213333000/PDT| |2005ms|UNKNOWN |root/otherhost.ca/2.2.2.3 |LOGIN-FAILED |UNKNOWN-USER |root
2008-01-28/15:40:05.662998000/PDT| |2007ms|UNKNOWN |admin/otherhost.ca/2.2.2.3 |LOGIN-FAILED |UNKNOWN-USER |admin
2008-01-28/15:40:10.001000000/PDT|!|2025ms|UNKNOWN |guest/otherhost.ca/2.2.2.3 |LOGIN-FAILED |UNKNOWN-USER-LOCKOUT|guest|too many attempts,lockout ip/browser
2008-01-28/15:50:02.090113000/PDT| | 118ms|USER |juliusd/somehost.ca/2.2.2.2 |READ |LDAP |henri|{mail,phone,address,title,department,memberOf} - allowed because juliusd is memberOf={staff} and thus can read={mail,phone,address,title,department,memberOf}
2008-01-28/15:50:05.122338000/PDT| | 170ms|USER |juliusd/somehost.ca/2.2.2.2 |EDITED |LDAP |henri|{department}[OLD=Payments][NEW=Technology] AS TRUSTED_SYSTEM foo/foo-qa.mycompany/192.168.1.5 - allowed because juliusd has memberOf={reception}
2008-01-28/15:50:05.722412000/PDT| | 133ms|USER |juliusd/somehost.ca/2.2.2.2 |GROUP-ADDED |LDAP |henri|TO {memberOf=tech} AS TRUSTED_SYSTEM foo/foo-qa.mycompany/192.168.1.5 - allowed because juliusd has memberOf={reception}
2008-01-28/15:50:05.992555000/PDT| | 263ms|USER |juliusd/somehost.ca/2.2.2.2 |GROUP-REMOVED |LDAP |henri|FROM {memberOf=payments} AS TRUSTED_SYSTEM foo/foo-qa.mycompany/192.168.1.5 - allowed because juliusd has memberOf={reception}
2008-01-28/15:50:55.000005000/PDT| |800123ms|USER |juliusd/somehost.ca/2.2.2.2 |CREATED-CONTACT|LDAP |test@test.com|{name,phone,mail}["Test Contact","604-222-2222","test@test.com"] - allowed because juliusd has memberOf={reception}
2008-01-28/15:51:09.361298000/PDT|!| 22ms|USER |juliusd/somehost.ca/2.2.2.2 |EDIT-FAILED |LDAP |henri|{telephone}[OLD=604-541-1118][NEW=604-541-'; DELETE FROM CONTACTS;] - pathologically invalid input
2008-01-28/15:53:17.555329000/PDT| | 3ms|USER |fred/somehost.ca/2.2.2.2 |LOGOFF |TIMEOUT |fred|session timeout 4 hrs
2008-01-28/15:55:58.999999000/PDT|!|2004ms|ADMIN |chuvakin/newhost.ca/8.8.8.8 |LOGIN-FAILED |LOCKOUT-MAINTAINED|chuvakin|new IP, new session
2008-01-28/15:59:00.442111000/PDT| | 77ms|ADMIN |henri/henrihost.ca/8.8.9.9 |LOGIN-SUCCEEDED|PASSWORD |henri|has memberOf={all,staff,sysadmins,network,payments,vpn-users}
2008-01-28/15:59:32.123456000/PDT| | 41ms|ADMIN |henri/henrihost.ca/8.8.9.9 |UNLOCKED |- |chuvakin|henri unlocked chuvakin
2008-01-28/15:59:59.999999000/PDT| | 2ms|ADMIN |henri/henrihost.ca/8.8.9.9 |LOGOFF |CLICKED |henri|clicked on logoff button
| Tell you exactly what happened: when, where, and how. | |
|---|---|
| When: | 2008-01-28/15:33:17.512318000/PDT |
| Where: | APP=[FooApp-1.2.3] WHOAMI=[root] HOSTNAME=[foo-qa.mycompany/192.168.1.5] |
| How: | ACCESSED [app] - allowed because [user] has memberOf={[group]} |
| Suitable for manual, semi-automated, an automatated analysis. | |
| Manual: | Yes. |
| Semi-automated: | Fixed positions for timestamp, level (! and *), and user/host/ip. The rest of the log should be amenable to full-text indexing. |
| Automated: | Only the final descriptive entry contains free-form text (everything after status verb). All other fields, including the first occurence of "user/host/ip" as well as the status verb (e.g. DENIED, GROUP-ADD) can be reliably extracted. |
| Can be analyzed without having the application that produced them at hand. | |
| Yes. | |
| Don't slow the system down. | |
| Not Sure. Need to be careful with any DNS reverse-lookups - they might block. | |
| Can be proven reliable (if used as evidence). | |
| Not Sure. | |
| 1. Authentication/Authorization Decisions (including logoff) | |
|---|---|
| Authentication | |
| Authorization | |
| Access Control | allowed because juliusd has memberOf={staff} |
| Logoff | USER fred/somehost.ca/2.2.2.2 LOGOFF - session timeout 4 hrs |
| 2. System Access, Data Access | |
| System | ACCESSED payroll/payroll-qa.mycompany/192.168.1.100 |
| Data | READ juliusd's {mail,phone,address,title,department} |
| 3. System/Application Changes (especially privilege changes) | |
| a. System | |
| b. Application | APP=[FooApp-1.2.3] WHOAMI=[root] |
| c. Privilege Changes | EDITED henri's [...ldap fields...] AS TRUSTED_SYSTEM foo/foo-qa.mycompany/192.168.1.5 |
| Note: a. and b. both come from VERSION-LINE, which is logged every hour, so changes in these, while awkward to parse, will show up. | |
| 4. Data Changes | |
| a. Add Data | CREATED-CONTACT {name,phone,mail}["Test Contact","604-222-2222","test@test.com"] |
| b. Edit Data | EDITED henri's {department}[OLD=Payments][NEW=Technology] |
| c. Delete Data | GROUP-REMOVED henri FROM {memberOf=payments} |
| 5. Invalid Input (possible badness/threats) | |
| EDIT-FAILED henri's {telephone}[OLD=604-541-1118][NEW=604-541-'; DELETE FROM CONTACTS;] - pathalogically invalid input | |
| 6. Resources (RAM, Disk, CPU, Bandwidth, any other hard or soft limits) | |
| UPTIME=[382 Days, 10:51:11.231] HEAP=[69MB/281MB 24.6%] CPU-IDLE=[98.2%] DISK-AVAIL=[/var=4.2GB 18%, /var/log=523MB 4%] | |
| It's probably a good idea to keep something like dstat running in the background as well. For example:
dstat -T -m -l -a --output /var/log/dstat/dstat.csv
It looks like this, but in CSV format: --epoch--- ------memory-usage----- ---load-avg--- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- epoch | used buff cach free| 1m 5m 15m |usr sys idl wai hiq siq| read writ| recv send| in out | int csw 1217445809| 526M 8376k 2418M 92M| 0.4 0.5 0.6| 3 1 89 7 0 0|1284k 1906k| 0 0 | 3.5B 72B| 223 2324 1217445810| 526M 8376k 2418M 92M| 0.4 0.5 0.6| 2 0 98 0 0 0| 0 0 | 0 0 | 0 0 | 48 1463 1217445811| 526M 8376k 2418M 92M| 0.4 0.5 0.6| 1 0 98 0 0 0| 0 0 | 0 0 | 0 0 | 62 1469This way performance problems can be correlated to the second, rather than to the hour. Use "epoch" instead of "time" so that you have the timezone (epoch is always UTC). | |
| 7. Health/Availibility | |
| b. Faults/Errors | |
| c. Delays | |
| d. Backups success/failure | |
| 8. Timestamp + TZ (when) | |
| 9. System, Application, or Component (where) | |
| a. IP's and contemporaneous DNS lookups of involved parties | |
| b. Names/Roles of involved systems (what servers are we talking to?) | |
| c. Name/Role of local application (what is this server?) | |
| 10. User (who) | |
| 11. Action (what) | |
| 12. Status (result) | |
| 13. Priority (severity, importance, rank, level, etc) | |
| 14. Reason | |