TA19 VI3 Advanced Log Analysis

38
TA19 VI3 Advanced Log Analysis Mostafa Khalil VCP, VMware Product Support Engineering

description

TA19 VI3 Advanced Log Analysis. Mostafa Khalil VCP, VMware Product Support Engineering. Housekeeping. Please turn off your mobile phones, blackberries and laptops - PowerPoint PPT Presentation

Transcript of TA19 VI3 Advanced Log Analysis

Page 1: TA19 VI3 Advanced Log Analysis

TA19

VI3 Advanced Log Analysis

Mostafa KhalilVCP, VMware Product Support Engineering

Page 2: TA19 VI3 Advanced Log Analysis

Housekeeping

Please turn off your mobile phones, blackberries and laptops

Your feedback is valued: please fill in the session evaluation form (specific to that session) & hand it to the room monitor / the materials pickup area at registration

Each delegate to return their completed event evaluation form to the materials pickup area will be eligible for a free evaluation copy of VMware’s ESX 3i

Please leave the room between sessions, even if your next session is in the same room as you will need to be rescanned

Page 3: TA19 VI3 Advanced Log Analysis

Agenda

ESX Server Boot processLocating logsHow to read the logsInterpreting log entriesMaking sense of it all

Page 4: TA19 VI3 Advanced Log Analysis

ESX Server 3.0.x/3.5 Boot Process

Boot loader in MBR points to the boot deviceGrub.conf lists the boot menu options.Selected menu provides:

Location of root partition. Uses UUID instead of device name (e.g. /dev/sda)RAM Disk file name relative to /boot location

boot loader initrd VMkernel vmnix

/sbin/initinit scriptsVMware initscripts

Page 5: TA19 VI3 Advanced Log Analysis

ESX Server 3i Boot Process

Boot loader in MBR points to the boot deviceinitrd (initial Ram disk) is loaded

boot loader initrd VMkernel

/sbin/initVMware init scripts

Page 6: TA19 VI3 Advanced Log Analysis

Collecting Logs – UI

Logon to VI Client as an AdministratorSelect: File Export Export Diagnostics DataOr Administration Export Diagnostics DataSelect servers from which to collect the logs including VC ServerSelect “Include information ..” CheckboxSpecify location for storing the files

Page 7: TA19 VI3 Advanced Log Analysis

ESX Server 3.0.x Logs (collected via vm-support)

Logs are located mostly under /var/log directory. Locations listed here are relative to that directory

vmkernelmessagesdmesgboot.loginitrdlogs/*vmksummaryvmware/hostd.logvmware/vpx/vpxa.log

vmware/esxcfg-boot.logvmware/esxcfg-firewall.logvmware/vmware-cim.logvmware/esxcfg-linuxnet.logvmware/esxupdate.logoldconf/esx.conf.*rpmpkgsvmkernel-version

Page 8: TA19 VI3 Advanced Log Analysis

ESX 3i Log Files

config.logmessagesslpd.logwsmand.logconfigRP.log

vmware/hostd.logvmware/aam/*vmware/vpx/vpxa.log

Page 9: TA19 VI3 Advanced Log Analysis

vmkernel Log (3.0.x/3.5)

Located in /var/log directoryContains all events generated by vmkernelvmkwarning log is a subset of this one and contains only the warning eventsRotated with a numeric extension. The current log wihout extension and the next newest one with “.1” extensionAll events since last vmkernel load are also in memory in /proc/vmware/log

Page 10: TA19 VI3 Advanced Log Analysis

messages log files (3i)

Located in /var/log directoryContains all events generated by vmkernelRotated with a numeric extension. The current log wihout extension and the next newest one with “.0.gz” extension (rotated and compressed)

Page 11: TA19 VI3 Advanced Log Analysis

vmkernel Log - Components

System Date/Time

Jun 19 09:12:54 giza vmkernel: 14:22:31:50.009 cpu3:1033)scsi-qla0: Scheduling SCAN for new luns....

Host name

Message source

uptime

Device

MessageCPU:World ID

Page 12: TA19 VI3 Advanced Log Analysis

Sample Rescan Event vmkernel Log Entries

cpu3:1033)<6>scsi-qla0: Scheduling SCAN for new luns....cpu1:1034)SCSI: 8244: Starting rescan of adapter vmhba0

The beginning of a SAN Rescan event for vmhba0

Page 13: TA19 VI3 Advanced Log Analysis

Rescan Event – LUN Discovery

Vendor: IBM Model: 1722-600 Rev: 0520Type: Direct-Access ANSI SCSI revision: 03cpu3:1033)LinSCSI: 4625: Device vmhba0:0:0 has appearedVMWARE SCSI Id: Supported VPD pages for vmhba0:0:0 : 0x0 0x80 0x83 0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc6 0xc7 0xc8 0xc9 0xca 0xd0

Storage Vendor’s ID, Array Model and Microcode Rev.Reported ANSI version is 3 = SCSI-3LUN 0 on target 0 on vmhba0 discoveredArray sends supported Vital Product Data (VPD) pages for LUN 0Array supports VPD pages 0x80 and 0x83

Page 14: TA19 VI3 Advanced Log Analysis

Rescan Event – LUN Discovery – Cont.

VMWARE SCSI Id: Device id info for vmhba0:0:0: 0x1 0x3 0x0 0x10 0x60 0xa 0xb 0x80 0x0 0x17 0x4e 0x84 0x0 0x0 0x12 0x84 0x43 0x 78 0xa3 0x9fVMWARE SCSI Id: Id for vmhba0:0:0 0x60 0x0a 0x0b 0x80 0x00 0x17 0x4e 0x84 0x00 0x00 0x12 0x84 0x43 0x78 0xa3 0x9f 0x31 0x37 0x32 0x32 0x2d 0x36

Device ID for the LUN reported. If VPD page 0x83 were not supported, this line would not show in the logLUN Id is reported. This matches what is in the proc node/proc/vmware/scsi/vmhba0/0:0:Id: 60 a b 80 0 17 4e 84 0 0 12 84 43 78 a3 9f 31 37 32 32 2d 36

Page 15: TA19 VI3 Advanced Log Analysis

Rescan Event – LUN Discovery - Conclusion

cpu3:1033)SCSI: 1424: Device vmhba0:0:0 is attached to a V53 FAStT SAN.cpu3:1033)SCSI: 640: Dual Controllers active for adapter vmhba0cpu3:1033)SCSI: 1450: The IBM FAStT device on vmhba0:0:0 is not configured in Auto-Volume Transfer mode. ESX will handle path failover to passive controllers as necessary.cpu3:1033)SCSI: 2044: Setting default path policy to MRU on target vmhba0:0:0

The LUN is identified as attached to a V53 FAStT SANvmhba0 can access both controllers on the FAStTAVT is identified as “Disabled” and path failover will be handled by ESXDefault Path Policy is set to MRU for that LUNIf AVT were enabled, the policy would have been set to Fixed

Page 16: TA19 VI3 Advanced Log Analysis

Path Failover Event

cpu1:1038)WARNING: SCSI: 1785: Manual switchover to path vmhba0:1:5 begins.cpu1:1038)WARNING: SCSI: 1110: Did not switchover to vmhba0:1:5. Check Unit Ready Command returned READY instead of NOT READY for standby controller.cpu1:1038)WARNING: SCSI: 1820: Manual switchover to vmhba0:1:5 completed successfully.cpu1:1038)SCSI: 1789: Changing active path to vmhba0:1:5

Starting a manual switchover (done on ESX side)No need to move the LUN on the array to target 1 since it returned a “READY” state on that targetSwitchover (failover) completedNow vmhba0:0:5 active path is changed to vmhba0:1:5

Page 17: TA19 VI3 Advanced Log Analysis

Snapshot LUN Detection (ESX 3.0.x)LVM: 5739: Device vmhba2:2:2:1 is a snapshot:

LVM: 5745: disk ID: <type 3, len 15, lun 2, devType 0, scsi 3, h(id) 1771423412675533879>LVM: 5747: m/d disk ID: <type 3, len 15, lun 2, devType 0, scsi 3, h(id) 9219142619163180480>LVM: 5739: Device vmhba2:2:2:1 is a snapshot:LVM: 5745: disk ID: <type 3, len 15, lun 2, devType 0, scsi 3, h(id) 1771423412675533879>LVM: 5747: m/d disk ID: <type 3, len 15, lun 2, devType 0, scsi 3, h(id) 9219142619163180480>ALERT: LVM: 4903: vmhba2:2:2:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.

• This logging appears in the /var/log/vmkernel log file.• The line containing m/d is the metadata.• In this case it is the h(id) data in the LVM header which is mismatched.

Page 18: TA19 VI3 Advanced Log Analysis

Snapshot LUN Detection (ESX 3.5 and 3i)LVM: 5573: Device vml.010044000044363048313739443030343420202020444636303046:1

detected to be a snapshot:LVM: 5580: queried disk ID: <type 1, len 22, lun 68, devType 0, scsi 4, h(id) 3084339621621410734>LVM: 5587: on-disk disk ID: <type 1, len 22, lun 68, devType 0, scsi 4, h(id) 3661551745314019942>ALERT: LVM: 4469: vml.010044000044363048313739443030343420202020444636303046:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.

• This logging appears in /var/log/vmkernel log file on ESX 3.5• This logging appears in /var/log/message log file on ESX 3i• The line containing m/d is the metadata• In this case it is the h(id) data in the LVM header which is mismatched• The “type” field identifies the Disk ID type

Type value Disk ID Type1 Serial Number2 NAA3 Symm6

Page 19: TA19 VI3 Advanced Log Analysis

Understanding SCSI Error Strings

Format:Device/Host Sense_buffer[2] 12 13Abbrev: D/H S ASC ASCQExtended:“Device Status”/”Host Status” “Sense Key” “Additional Sense Code” “Additional Sense Code Qualifier”Example:2/0 0x6 0x29 0x0

Page 20: TA19 VI3 Advanced Log Analysis

Understanding SCSI Error Strings – Device Status

Device Status: (Displayed in decimal values)

Code Meaning

0 No errors

2 Check Condition

8 Device Busy

24 Reservation Conflict

Page 21: TA19 VI3 Advanced Log Analysis

Understanding SCSI Error Strings – Host Status

Host Status (displayed in decimal values)

Code Meaning

0 Host_OK

1 Host No_Connect

2 Host_Bus_Busy

3 Host_Timeout

4 Host_Bad_Target

5 Host_Abort

Page 22: TA19 VI3 Advanced Log Analysis

Understanding SCSI Error Strings – Host Status –cont.

Host Status (displayed in decimal values)

Code Meaning

6 Host_Parity

7 Host_Error

8 Host_Reset

9 Host_Bad_INTR

10 Host_PassThrough

11 Host_Soft_Error

Page 23: TA19 VI3 Advanced Log Analysis

Understanding SCSI Error Strings – Sense Key

Sense Key (displayed in hex)Code Meaning

0x0 No Sense Information

0x1 Last command completed but used error correction

0x2 Unit Not Ready

0x3 Medium Error

0x4 Hardware error

0x5 ILLEGAL_REQUEST (Passive SP)

0x6 LUN Reset

Page 24: TA19 VI3 Advanced Log Analysis

Understanding SCSI Error Strings – Sense Key – cont.

Sense Key (displayed in hex)Code Meaning

0x7 Data_Protect – Acces to data is blocked

0x8 Blank_Check – Reached an unexpected region

0xa Copy_Aborted

0xb Aborted_Command – Target aborted command

0xc Comparison for SEARCH DATA unsuccessful

0xd Volume_Overflow – Medium is full

0xe Source and Data on Medium do not agree

Page 25: TA19 VI3 Advanced Log Analysis

Understanding SCSI Error Strings – ASC/ASCQ

ASC and ASCQ are always in pairs (in hex) (ASQ usually 0)

Code Meaning

0x4 Unit Not Ready

0x3 Unit Not Ready – Manual Intervention Required

0x2 Unit Not Ready - Initializing Command Required

0x29 Device Power on or SCSI Reset

0x8b ASC_QUIESCENCE_HAS_BEEN_ACHIEVED (IBM FAStT)

0x94 ASC_Invalid_Req_due_To_Current_LU_Ownership (IBM FAStT)

0x01 ASCQ_Invalid_Req_due_To_Current_LU_Ownership (IBM FAStT)

0x02 ASCQ_QUIESCENCE_HAS_BEEN_ACHIEVED (IBM FAStT)

Page 26: TA19 VI3 Advanced Log Analysis

Understanding SCSI Error Strings - Examples

2/0 0x6 0x29 0x0 (Device Check Condition - Lun Reset)24/0 0x0 0x0 0x0 (SCSI Reservation Conflict)0/1 0x0 0x0 0x0 (Device OK/Host No_Connect)

cpu3)WARNING: SCSI: 5663: vmhba1:0:10:1 status = 2/0 0x6 0x29 0x0

cpu3)WARNING: SCSI: 5663: vmhba2:1:5:0 status = 24/0 0x0 0x0 0x0

cpu0)SCSI: 8879: vmhba2:1:5:0 status = 24/0 0x0 0x0 0x0

cpu0)WARNING: SCSI: 8760: returns error: "SCSI reservation conflict". Code: 0xbad0023.

cpu3)WARNING: SCSI: 5663: vmhba1:0:9:1 status = 0/1 0x0 0x0 0x0

Page 27: TA19 VI3 Advanced Log Analysis

Translating vmkernel Error Codes

In VI3 already listed in English along with the codeYou can find them in:

VMware-esx-drivers-public-source-<ver>-<build>.tar.gzAt:http://www.vmware.com/download/vi/open_source.html

File: return_status.hIn:/src/include/vmware/vmklinks/vmkernel/public

Codes listed sequentially starting from 0x0 (hex)Codes get renumbered with new releases

Page 28: TA19 VI3 Advanced Log Analysis

Translating vmkernel Error Codes – Examples (3.0.2U1)

Line# Hex Code Meaning27 1b 0xBAD001b Corrupt Redo Log34 22 0xBAD0022 SCSI Reservation Conflict35 23 0xBAD0023 File System Locked51 33 0xBAD0033 VMFS volume missing physical extents65 41 0xBAD0041 Error parsing MPS Table129 81 0xBAD0081 No Swap File138 8A 0xBAD008a SCSI LUN is in snapshot state142 8E 0xBAD008e Exceed maximum number of files on

the filesystem

Line number starts from “0”. Calculate the hex value which gives the last 2 digts in the hex code that starts with 0xBAD00

Page 29: TA19 VI3 Advanced Log Analysis

Messages Log (3.0x/3.5)

Console eventsLogon eventsiSCSI Authentication events

Jul 24 19:13:33 giza sshd[18915]: Connection from 10.16.112.24 port 1396Jul 24 19:13:36 giza sshd[18915]: Accepted password for root from 10.16.112.24 port 1396 ssh2Jul 24 19:13:36 giza sshd(pam_unix)[18915]: session opened for user root by (uid=0)

Jul 29 01:01:03 giza iscsid[32725]: cannot make connection to 10.16.95.161:3260: No route to host

Page 30: TA19 VI3 Advanced Log Analysis

initrdlogs

Located in /var/log/initrdlogsEvents during initial boot from RAM DiskLogs include:

vmklog.vmk

messages

vmklog.<storage-driver-name> (e.g. vmklog.qla2300_7xx)

Page 31: TA19 VI3 Advanced Log Analysis

hostd.log

Located in /var/log/vmwareSym-linked to the current rotated hostd log fileHostd events

VI Client communications when directly connected to ESX

Events done on behalf of VPXA

System Services

Firewall System

HA services

VMware Converter

Page 32: TA19 VI3 Advanced Log Analysis

vpxa.log

Located in /var/log/vmware/vpxSym-linked to the current rotated vpxa.logEvents of intractions with Virtual Center Server

Log for VMware VirtualCenter, pid=4470, version=2.0.2, build=build-50618, option=Release, section=2[2007-07-19 11:41:11.172 'App' 3076436896 info] Current working directory: /var/log/vmware/vpx[2007-07-19 11:41:11.172 'App' 3076436896 info] Initializing SSL context

[2007-07-19 11:41:11.216 'App' 3076436896 info] Starting VMware VirtualCenter Agent Daemon 2.0.2 build-50618

[2007-07-19 11:41:11.221 'App' 3076436896 info] [VpxaInvtHost] Manager IP: :902 Host IP:

Page 33: TA19 VI3 Advanced Log Analysis

esxcfg-firewall.log

Located in /var/log/vmware directoryAll VMware Firewall rules events

2006-10-09 13:57:02 (15463) INFO : "/sbin/iptables -A icmp-out -p icmp --icmp-type echo-reply -j ACCEPT"2006-10-09 13:57:02 (15463) INFO : "/sbin/iptables -A icmp-out -j DROP"2006-10-09 13:57:02 (15463) INFO : "/sbin/iptables -N log-and-drop"2006-10-09 13:57:02 (15463) INFO : "/sbin/iptables -A log-and-drop -j LOG --log-ip-options --log-tcp-options --log-level debug"2006-10-09 13:57:02 (15463) INFO : "/sbin/iptables -A log-and-drop -j DROP"2006-10-09 13:57:02 (15463) INFO : "/sbin/iptables -N valid-source-address"2006-10-09 13:57:02 (15463) INFO : "/sbin/iptables -A valid-source-address -s 127.0.0.1 -j DROP"2006-10-09 13:57:02 (15463) INFO : "/sbin/iptables -A valid-source-address -s 0.0.0.0/8 -j DROP"2006-10-09 13:57:02 (15463) INFO : "/sbin/iptables -A valid-source-address -d 255.255.255.255 -j DROP“

2007-07-14 13:28:30 (30269) INFO : Setting service AAMClient to 1

Page 34: TA19 VI3 Advanced Log Analysis

oldconf Files

Backup copies of /etc/vmware/esx.conf fileLocated in /var/log/oldconf directory

Created prior to updating the existing file

Only when changes done via VC, VI Client or esxcfg-* scripts

Date and time of backup used as the extension of the file name

esx.conf.2007-06-30_03:56:58

Page 35: TA19 VI3 Advanced Log Analysis

esxupdate.log

In /var/log/vmware directoryHistory of all updates done via esxupdate toolDate and PIDPackages installedResults of the installation

Page 36: TA19 VI3 Advanced Log Analysis

vmkernel-version

In /var/log directorylists current and all previous kernel build numbers

Found vmkernel version 27701Found vmkernel version 32039

Page 37: TA19 VI3 Advanced Log Analysis

Questions?

TA19VI3 Advanced Log AnalysisMostafa Khalil, VCPVMware Product Support Engineering

For more information …http://www.vmware.com

Page 38: TA19 VI3 Advanced Log Analysis