2009 HP-UX LVM, OnlineJFS and Oracle ASM Basics* Dusan Baljevic Sydney, Australia.
HP-UX Swap and Dump Unleashed by Dusan Baljevic
-
Upload
circling-cycle -
Category
Technology
-
view
2.378 -
download
0
description
Transcript of HP-UX Swap and Dump Unleashed by Dusan Baljevic
Aug 2011
HP-UX Swap and Dump Unleashed
By Unix/Linux Apprentice with 26 Years of Experience
Dusan BaljevicSydney, Australia
Why This Document? *• Frequent “abuse” of good design principles.
• A “friend in need is a friend indeed” – why standard swap/dump design fails in real scenarios.
• Everyone has different opinion – why not help system administrators and architects stop implementing bad practices.
• Especially important on large-RAM servers.
• Based on 26-year practical experiences in Unix/Linux.
This Document is Not:
• A replacement for HP’s official statements.
• A written manual to learn HP-UX and its design principles in detail.
• Glorified personal experience to prove that I “know best” (rather the opposite).
HP-UX Current Official Recommendations* - Part 1
Use the following guidelines when configuring swap logical
volumes:• Interleave device swap areas for better
performance.• Two swap areas on different disks perform better
than one swap area with the equivalent amount of space. This configuration allows interleaved swapping, which means the swap areas are written to concurrently, thus enhancing performance.
• When using LVM, set up secondary swap areas within logical volumes that are on different disks using lvextend.
• If you have only one disk and must increase swap space, try to move the primary swap area to a larger contiguous region.
HP-UX Current Official Recommendations* - Part 2• Similar-sized device swap areas work best. Device
swap areas must have similar sizes for best performance. Otherwise, when all space in the smaller device swap area is used, only the larger swap area is available, making interleaving impossible.
• By default, primary swap is located on the same disk as the root file system. The kernel configuration file contains the configuration information for primary swap.
• If you are using logical volumes as secondary swap, allocate the secondary swap to reside on a disk other than the root disk for better performance.
• Disable mirror consistency checking for swap mirrored primary swap device (no need to recover after a failure).
• Use Priority 0 device swap to bypass swap on root disk.
HP-UX How Much Swap is Enough?
• Every admin and architect has a different opinion.
• Traditional views typically use formula:
SWAP = 1 or 2 x RAM
• Some old designs and applications required even 3 x RAM (or more).
• Old HP-UX releases had serious issue with (now obsolete) kernel parameter swapmem_on (see next slide).
HP-UX How Much Dump is Enough?* - Part 1
The vast majority of problems are found in the kernel area. Only
rarely do the program data areas need to be examined, even more
rarely, the shared memory areas, and virtually never the buffer/file
cache and shared libraries.
If a full crash dump is taken, the total space needed with be as
high as RAM (and a bit more).
By compressed dump overall time taken will be reduced by 1/3 as
well as the disk space required should also get reduced by at least
1/3 for default selection of page classes (usually the default page
class selection utilizes around 20% of the memory).
HP-UX How Much Dump is Enough? – Part 2# crashconf -v
Crash dump configuration has been changed since boot.
CLASS PAGES INCLUDED IN DUMP DESCRIPTION
-------- ---------- ---------------- -------------------------------------
UNUSED 9572004 no, by default unused pages
USERPG 1341553 no, by default user process pages
BCACHE 1980 no, by default buffer cache pages
KCODE 9142 no, by default kernel code pages
USTACK 1567 yes, by default user process stacks
FSDATA 12 yes, by default file system metadata
KDDATA 1492949 yes, by default kernel dynamic data
KSDATA 8816 yes, by default kernel static data
SUPERPG 128677 no, by default unused kernel super pages
Total pages on system: 12556700
Total pages included in dump: 1503344
Dump compressed: ON
Dump Parallel: ON
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME
------------ ------------ ------------ ------------ -------------------------
1:0x000005 2349920 4194304 64:0x000002 /dev/vg00/lvol2
------------
4194304
# getconf PAGESIZE
4096
HP-UX Pseudoswap
2GB Device Swap+ 6GB Pseudo Swap (75% 8GB)
8GB Reservable Swap
With Pseudoswap (swapmem_on=1)
2GB Device Swap
+ 0GB Pseudo Swap
2GB Reservable Swap
Without Pseudoswap (swapmem_on=0)
I have 2GB of swap and 8GB of available
memory. Can I start a 4GB process onan idle server?
No!
Yes!
Pseudoswap allows the kernel to treat a portion of physical memory as if it is swap space in order to satisfy the swap reservation policy. Pseudo-swap is enabled by default in all current versions of HP-UX and is removed as kernel parameter in11i v3 (swapmem_on).
Example of an Application Swap Requirements
• Please see SAP note 1112627 for a detailed explanation of swap sizing and pseudo-swap.
• In general device swap configurations of 1.5 or 2 x RAM have proven appropriate for the majority of SAP installations. The recommendation is to set device swap to 2 x RAM (minimum 20 GB).
• Please refer to SAP note 153641 for a detailed explanation of swap requirements on a per SAP instance basis.
Basics of Crash Dumps
Bad Example of Swap Design
# /usr/sbin/swapinfo -tm Mb Mb Mb PCT START/ MbTYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAMEdev 30464 0 30464 0% 0 - 1 /dev/vg00/lvol2dev 30464 0 30464 0% 0 - 1 /dev/vg00/swap1dev 30464 0 30464 0% 0 - 1 /dev/vg00/swap2dev 30464 0 30464 0% 0 - 1 /dev/vg00/swap3reserve - 46202 -46202memory 98292 2278 96014 2%total 220148 48480 171668 22% - 0 -
HP-UX Maximum Swap
• Swap space in the kernel is managed using 'chunks' of physical device space. These chunks contain one or more (usually more) pages ofmemory, but provide another layer of indexing(similar to inodes in file systems) to keep the global swap table relatively small, as opposed to a large table indexed by swap page.
• swchunk controls the size in physical disk blocks (which are defined as 1 KB) for each chunk.
Maximum Swap on HP-UX Before 11i V3
• The total bytes of swap space manageable by the system on HP-UX 11i older releases is:
swchunk x 1KB x 16384
where16384 is the system maximum number ofswap chunks in the swap table, as defined bykernel parameter maxswapchunks.
swchunk has allowed values between 2048 and 65536 blocks.
Maximum Swap on HP-UX 11i V3
• The total bytes of swap space manageable by the system on HP-UX 11i v3 is:
swchunk x 1KB x 2147483648
Dump Terms• Dump unit A thread of execution during dump. A dump unit requires its
own set of CPUs, dump devices, and other resources, which are non-overlapping with other dump units.
• Reentrancy Capability of a dump driver to issue multiple I/Os simultaneously, one I/O per HBA port, during dump.
• Concurrency Capability of a dump driver to issue multiple I/Os simultaneously per HBA port, during dump. In HP-UX 11i v3 this capability means that the driver can issue I/Os simultaneously to multiple devices under a given HBA port, one I/O per device.
• Parallel Dump HP-UX 11i v3 dump infrastructure which enables the parallelism features.
• Reentrant HBA port or device An HBA port or device controlled by a reentrant driver.
• Concurrent HBA port or device An HBA port or device controlled by a concurrent driver.
Dump Unit - Part 1 *• A Dump Unit is an independent sequential unit of
execution within the dump process.
• Each dump unit is assigned an exclusive subset of the system resources needed to perform the dump, including CPUs, a portion of the physical memory to be dumped, and a subset of the configured dump devices. The dump infrastructure in HP-UX 11i v3 automatically partitions system resources at dump time into dump units.
• Each dump unit operates sequentially.
• Parallelism is achieved by multiple dump units executing in parallel.
Dump Unit - Part 2 *• A dump device cannot be shared across multiple dump units.
• Multiple “reentrant devices” can be accessed in parallel only if the devices are configured through separate HBA ports. Thus all “reentrant devices” on the same HBA port will be assigned to a single dump unit.
• Each “concurrent device” can be accessed in parallel. Each can therefore be assigned to a separate dump unit, even if configured through a single HBA port.
• Multiple dump volumes on a single physical volume will not allow for parallelism. Parallelism at dump time can only be achieved across multiple physical devices (LUNs).
• Logical volumes configured as dump devices: all logical volumes which reside on the same physical device (LUN) are assigned to the same dump unit.
Dump Options Overview
− Selective• Based on classes/uses of memory
− Compressed• >=5 CPUs per dump unit• Mixed compressed/non-compressed images
− Parallel (concurrent)• Faster dump with multiple “monarchs”• Influenced by memory availability and dump devices• HP Integrity Servers only
− Live dump• Crashdump a live system without forced shutdown or panic• System stays up, running & stable• Offline analysis of system• Memory image -> file
− Extra load during this save• HP Integrity Servers only
Dump ParallelismI/O support during dump is provided via dump drivers, and each
configured
dump driver reports its parallelism capabilities to the dump infrastructure:
Legacy: new parallelism feature is not supported
Reentrant: supports parallelism per HBA port
Concurrent: supports parallelism per dump device
These requirements can be distilled into the following formulas for calculating
the number of dump units that can be achieved:
• CPU Parallelism = (number of CPUs available at dump time) / (1 or 5, depending on whether or not compression is enabled)
• Device Parallelism = (number of reentrant dump HBA ports) + (number of concurrent dump devices) + (1 if there are any legacy dump devices)
• Number of Dump Units = Minimum (CPU Parallelism, Device Parallelism)
Dump Driver Parallelism CapabilityExamples of HP-provided dump drivers on HP-UX 11.31:
fcd Concurrent
td, mpt, c8xx, ciss, sasd, fclp Reentrant
# crashconf -l
DEVICE LOGICAL VOL. NAME LUNPATH HANDLE *
------------ --------------- ------------------- ----------------------- 1:0x000002 64:0x000002 /dev/vg00/lvol2 40/0/2/0/0/0/0/4/0/0/0.0x247000c0ffdb3fb9.0x4001000000000000
# ioscan -fNk | grep "40/0/2/0/0/0/0/4/0/0/0 "
fc 0 40/0/2/0/0/0/0/4/0/0/0 fclp CLAIMED INTERFACE HP AD222-60001 PCIe Fibre Channel 2-port 4Gb FC/2-port 1000B-T Combo Adapter
Dump Driver Capability
# scsimgr get_attr -a capability -H 40/0/2/0/0/0/0/4/0/0/0
SCSI ATTRIBUTES FOR CONTROLLER : 40/0/2/0/0/0/0/4/0/0/0
name = capability
current = "Boot Dump"
default =
saved =
Uncompressed vs. Compressed Dump –One Dump Device *
Uncompressed vs. Compressed Dump – Three Dump Devices *
Uncompressed vs. Compressed Dump – Legacy Devices *
Uncompressed Dump – Reentrant Devices *
Uncompressed vs. Compressed Dump – Complex Example *
Compressed Dump Configuration# crashconf -vCrash dump configuration has been changed since boot.
CLASS PAGES INCLUDED IN DUMP DESCRIPTION-------- ---------- ---------------- -------------------------------------UNUSED 1514754 no, by default unused pagesUSERPG 112614 no, by default user process pagesBCACHE 26235 no, by default buffer cache pagesKCODE 10389 yes, forced kernel code pagesUSTACK 1136 yes, by default user process stacksFSDATA 40 no, forced file system metadataKDDATA 386358 yes, by default kernel dynamic dataKSDATA 6933 yes, by default kernel static dataSUPERPG 21546 no, by default unused kernel super pages
Total pages on system: 2080005Total pages included in dump: 404816
Dump compressed: ON # crashconf –c off to turn compression off until reboot# crashconf –c on to turn compression on until reboot
Dump Parallel: ON
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME------------ ---------- ---------- ------------ ------------------------- 3:0x000000 2349920 8388608 64:0x000002 /dev/vg00/lvol2 3:0x000000 30677856 114688 64:0x000009 /dev/vg00/v3Dump ---------- 8503296Dump device configuration mode is config_deprecated_mode.Use crashconf -s option to change the mode.
# kctune dump_compress_onTunable Value Expression Changesdump_compress_on 1 Default Immed # crashconf –tc off to change tunable
to 0# kctune dump_compress_on=0# crashconf –tc on to set tunable to 1# kctune dump_compress_on=1
Compressed Dump Algorithm
• HP-UX uses one processor to do all disk writes and four processors for compression.
• The algorithm for compression is Lempel–Ziv–Welch (LZW).
• LZW is a universal lossless data compression algorithm, simple to implement, and has the potential for very high throughput in hardware implementations.
• One of the reasons for selecting LZW:
HP has a license to use it, andIt achieves pretty good compression ratio.
Concurrent Dump Configuration# crashconf -vCrash dump configuration has been changed since boot.
CLASS PAGES INCLUDED IN DUMP DESCRIPTION-------- ---------- ---------------- -------------------------------------UNUSED 1514754 no, by default unused pagesUSERPG 112614 no, by default user process pagesBCACHE 26235 no, by default buffer cache pagesKCODE 10389 yes, forced kernel code pagesUSTACK 1136 yes, by default user process stacksFSDATA 40 no, forced file system metadataKDDATA 386358 yes, by default kernel dynamic dataKSDATA 6933 yes, by default kernel static dataSUPERPG 21546 no, by default unused kernel super pages
Total pages on system: 2080005Total pages included in dump: 404816
Dump compressed: ON
Dump Parallel: ON # crashconf –p off to turn concurrent dump off until reboot# crashconf –p on to turn concurrent dump on until reboot
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME------------ ---------- ---------- ------------ ------------------------- 3:0x000000 2349920 8388608 64:0x000002 /dev/vg00/lvol2 3:0x000000 30677856 114688 64:0x000009 /dev/vg00/v3Dump ---------- 8503296Dump device configuration mode is config_deprecated_mode.Use crashconf -s option to change the mode.
# kctune dump_concurrent_onTunable Value Expression Changesdump_concurrent_on 1 1 Immed # crashconf –tp off to
change tunable to 0# kctune dump_concurrent_on=0# crashconf –tp on to set tunable to 1# kctune dump_concurrent_on=1
HP-UX Kernel Parameters alwaysdump and dontdump
On rare occasions, the system may panic before crashconf(1M) is run
during the boot process. On those occasions, the configuration can be
set using the alwaysdump and dontdump tunables.
# kctune -v -q alwaysdump
Tunable alwaysdump
Description Bitmap of memory page classes to include in a crash dump
Module dump
Current Value 0 [Default]
Value at Next Boot 1024
Value at Last Boot 0
Default Value 0
Can Change Immediately or at Next Boot
HP-UX Typical Crash Dump Configuration# crashconf -vCrash dump configuration has been changed since boot.
CLASS PAGES INCLUDED IN DUMP DESCRIPTION-------- ---------- ---------------- -------------------------------------UNUSED 9571877 no, by default unused pagesUSERPG 1340875 no, by default user process pagesBCACHE 2309 no, by default buffer cache pagesKCODE 9142 no, by default kernel code pagesUSTACK 1567 yes, by default user process stacksFSDATA 12 yes, by default file system metadataKDDATA 1493845 yes, by default kernel dynamic dataKSDATA 8816 yes, by default kernel static dataSUPERPG 128257 no, by default unused kernel super pages
Total pages on system: 12556700Total pages included in dump: 1504240
Dump compressed: ON
Dump Parallel: ON
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME------------ ------------ ------------ ------------ ------------------------- 1:0x000005 2349920 4194304 64:0x000002 /dev/vg00/lvol2 ------------ 4194304Dump device configuration mode is config_deprecated_mode. Use crashconf -s option to change the mode.
HP-UX Savecrash Locking
Dump devices are often used as paging devices (primary swap is one
such example). If savecrash determines that a dump device is already
enabled for paging, and that paging activity has already taken place on
that device, a warning message will indicate that the dump may be invalid.
If a dump device has not already been enabled for paging, savecrash
prevents paging from being enabled to the device by creating the file
/var/adm/crash/.savecrash.LCK.
swapon does not enable the device for paging if the device is locked in
/var/adm/crash/.savecrash.LCK. As savecrash finishes saving
the image from each dump device, it updates the/var/adm/crash/.savecrash.LCK file and optionally executes swapon to enable paging on the device.
HP-UX Dump Device in Non-Root VGs
• As of HP-UX 11.00 we have the possibility to configureadditional dump devices online (without the need of a reboot. These dump LVs must not be configured using lvlnboot –d but with crashconf(1M).
• We are no longer restricted to choose a dump LV from the root VG only. The configuration of such dump devices is similar to the configuration of secondary swap devices.
Example of Classical Swap/Dump Design on HP-UX
Other LVs
/stand /stand
Potential Issues• If shortage of RAM, boot
disks experience severe I/O performance problems due to swap usage.
• If more RAM is added, not easy to resize primary swap (contiguous blocks).
• Long reboot due to savecrash(1M) export to /var/adm/crash.
• More swap added in other VGs, often different in size than primary.
• Waste of large amount of disk space for swap.
RAID-1 for Boot disk
32 GB RAM
Swap = 1 or 2 x RAM
Swap/dump shared
Primary PV Alternate PV
Other LVs
Primary swap/dump
/stand
Primary swap/Dump mirror
/stand
Example of Different Swap/Dump Design on HP-UX with Internal Boot Disks *
Other LVs
/stand
Primary swap
Other LVs
/stand
Primary swap mirror
RAID-1 for Boot disks
32 GB RAM
Primary Swap = 4-8 GB
Total Swap = 1 x RAM **
Swap NOT shared with dump
Primary PV Alternate PV
Secondary swap Secondary swap
SAN-based LUNs or LVs
Dump areas set up on different
LUNs or PVs in non-root VGs
(dump PVs are NEVER RAID-1
in LVM)
Dump area Dump area
Example of Different Swap/Dump Design on HP-UX with SAN Boot Disk *
Other LVs
/stand
Primary swap
32 GB RAM
Primary Swap = 4-8 GB
Total Swap = 1 x RAM **
Swap NOT shared with dump
Boot PV
Secondary swap Secondary swap
SAN-based LUNs or LVs
Dump areas set up on different
LUNs or PVs in non-root VGs
(dump PVs are NEVER RAID-1
in LVM)
Dump area Dump area
HP-UX Persistent Dump Devices – Part 1• Persistent Dump Devices are those that are configured automatically
after a reboot. Persistent dump devices information is maintained in the kernel registry services, (KRS, see krs(5)).
• To mark the dump devices as persistent, there are two configuration modes available.
config_crashconf_mode
In this mode crashconf(1M) and crashconf(2) are the only mechanisms available to mark dump devices as persistent. Logical volumes marked for dump using lvlnboot(1M) or vxvmboot(1M) and devices marked in /stand/system for dump will be ignored during boot-up. This is the preferred method for dump device configuration and will be used from this HP-UX release onwards. This mode can be enabled using the crashconf -s option. VxVM stores extent information of persistent dump logical volumes in lif(4). Up to ten VxVM logical volumes can be marked persistent. The logical volumes which are not part of the root volume group cannot be configured as persistent dump devices.
HP-UX Persistent Dump Devices – Part 2
config_deprecated_mode
The logical volumes marked for dump using lvlnboot(1M) or vxvmboot(1M) and devices marked in /stand/system for dump will be configured as dump devices during boot-up. Devices marked as persistent, using crashconf -s, will be ignored during boot-up. Marking devices using lvlnboot(1M), vxvmboot(1M), and /stand/system will be obsolete in the next HP-UX release. This mode is deprecated on HP-UX 11.31 and will be obsolete in the next HP-UX release. This is the default mode for dump and can be enabled using the crashconf -o option.
HP-UX Dump Devices and Bad Block Relocation
• From HP-UX 11.23 release onwards, the LVM bad block relocation feature is obsolete. However, for compatibility reasons the value is maintained as a logical volume attribute.
• If BBRA is not disabled when dump device is created, HP-UX complains about “unsupported disk layout”.
• Hence, the correct procedure to create a dump device in LVM is:
# lvcreate -C y -r n -L 16000 -n dump2 /dev/vgdump
HP-UX Crashconf Fails with Unsupported Disk Layout Error - VxVMThe volume dumpvol was added to the /etc/fstab file and crashconf was issued to increase
the total dump
area but crashconf failed with the message below:
/dev/vx/dsk/rootdg/dumpvol: error: unsupported disk layout
The crashconf error is due to the dump area not being contiguous:
# vxprint -g rootdg -ht
v dumpvol - ENABLED ACTIVE 204800 SELECT - swap
pl dumpvol-01 dumpvol ENABLED ACTIVE 204800 CONCAT - RW
sd rootdisk01-07 dumpvol-01 rootdisk01 1081344 102400 0 c1t4d0 ENA
sd rootdisk01-17 dumpvol-01 rootdisk01 5702418 102400 102400 c1t4d0 ENA
The dumpvol volume has two areas on c1t4d0. The first is rootdisk01-07 which starts at 1081344 and is
102400 kb in size and the second is rootdisk01-17 which starts at 5702418 and is also 102400 kb in
size. The volume dumpvol needs to be contiguous so the last 102400 kb should be reduced from
dumpvol. To reduce dumpvol:
# vxassist shrinkby dumpvol 102400
HP-UX Crashconf Fails with Unsupported Disk Layout Error - LVM/dev/vg01/lvswap: error: unsupported disk layout
# lvdisplay /dev/vg01/lvswap
....
Bad block on
Allocation strict
Dump is required to be contiguous and have bad block reallocation
turned off:
# lvchange -C y -r n /dev/vg01/lvswap
HP-UX VxVM Dump Device Creation* – Part 1
With Volume Manager 5.0 on HP-UX 11.31, to initialize the disk, must use
vxdisksetup -ifB <disk> command, vxdiskadm is unable to
initialize the disk correctly for use with crashconf. Please note that CDS
diskgroups are not affected. Those can still be initialized via vxdiskadm.
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c2t0d0s2 auto:none - - online invalid
c2t1d0s2 auto:hpdisk rootdisk01 rootdg online
# vxdisk -f init c2t0d0s2 format=hpdisk
# vxdg init dumpdg c2t0d0s2 cds=off
# vxassist -g dumpdg -U swap make dumpvol 3g
HP-UX VxVM Dump Device Creation – Part 2# crashconf -s /dev/vx/dsk/dumpdg/dumpvol
# crashconf -v
Crash dump configuration has been changed since boot.
CLASS PAGES INCLUDED IN DUMP DESCRIPTION
-------- ---------- ---------------- -------------------------------------
UNUSED 10197 no, by default unused pages
USERPG 115131 no, by default user process pages
BCACHE 14359 no, by default buffer cache pages
KCODE 10819 no, by default kernel code pages
USTACK 890 yes, by default user process stacks
FSDATA 26 yes, by default file system metadata
KDDATA 100591 yes, by default kernel dynamic data
KSDATA 7238 yes, by default kernel static data
SUPERPG 1100 no, by default unused kernel super pages
Total pages on system: 260351
Total pages included in dump: 108745
Dump compressed: ON
Dump Parallel: ON
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME
------------ ------------ ------------ ------------ -------------------------
3:0x000001 2350176 2097152 4:0x000001 /dev/vx/dsk/rootdg/swapvol
3:0x000000 544896 3145728 4:0x414ad8 /dev/vx/dsk/dumpdg/dumpvol
------------
5242880
HP-UX Better Swap and Dump Design – Part 1• Set up primary swap between 4 and 8 GB ONLY, no matter how large
the RAM is!
• Primary swap device should not be NOT SHARED with dump.
• Initially, set up primary swap only. In the pre-production testing, verify if that is enough and avoid creating other swap areas unless absolutely necessary.
• Secondary swaps (if you need to have them!) are created as 4-8 GB LUNs (could be LVs in LVM or Plexes in VxVM) on SAN (if practicable). Ensure that secondary swaps match the size of primary swap. That way, if server ever needs to use swap, the performance of swap devices will be excellent and boot disk I/O will never “suffer”.
• If primary swap is left at 4-8 GB, then allocate separate dump areas in other volume groups to match the size of physical memory if compression is disabled or not possible (due to lack of available CPUs), or less if compression is enabled and possible.
HP-UX Better Swap and Dump Design – Part 2
• Disable savecrash(1M) at boot (/etc/rc.config.d/savecrash):
SAVECRASH=0
If you do it, make sure not to forget to run savecrash(1M) after the reboot.
• Dedicated dump device will not shorten the time required to write from memory to the dump volume during the crash, but will shorten the reboot time. This is because the crash image are not at risk being overwritten by page or swap activity and savecrash(1M) can run in background to save the crash files into the crash dump directory.
• If the dump device is also configured as one of the swap devices, the device cannot be enabled for paging until savecrash(1M) has finished saving the image from the device to the crash dump directory. Therefore, the boot time will be longer if savecrash is run in foreground. This extra time will be even greater if vPars are configured because multiple dump images may have to be saved.
HP-UX Better Swap and Dump Design – Part 3
• When dump and swap areas are separated, there is no need to save the crash images at boot time. Therefore, savecrash(1M) at (re)boot can be disabled!
• The reduction in reboot time achieved by configuring a separate dump device (close to 50% over classical design with savecrash running in foreground) is likely to provide a worthwhile return on investment when system availability is a priority.
• Using identical sizes and types of dump devices and HBAs in the dump configuration is one way to avoid inequalities in dump speeds or times across the dump units. This tends to produce more predictable results and better overall parallelism.
HP-UX Better Swap and Dump Design – Part 4
• It is recommended that shared swap and dump devices or volumes not be used with parallel dump. Using a shared swap/dump device can significantly increase the subsequent reboot time because such devices result in swap being disabled while saving the corresponding dump data (for example, in /var/adm/crash).
• Avoid file system swap altogether if possible.
• Set priorities of SAN-based secondary swaps to lower value than the primary swap (and let it be identical value across all secondary swaps). That way, if there is a serious shortage of RAM, swap will perform as “perfectly striped” volume.
HP-UX Better Swap and Dump Design – Part 5• If compressed dumps are required, ensure that there
are five CPUs per each dump unit.
• Set up multiple dump units on SAN (non-root volume groups), and enable parallel dumps. Note that, currently, the logical volumes which are not part of the root volume group in LVM cannot be configured as persistent dump devices. * However, non-root data group with VxVM can be used for persistent dump devices. **
HP-UX Better Swap and Dump Design – Part 6• For a kernel dump, the usual requirement:
Kernel text/static dataKernel dynamic data in useUser-space kernel thread stacks (UAREA)
Kernel dynamic memory, which is free-and-cached (Super Page Pool), is only needed when there is a problem in the SPP itself (pretty rare). User data is very rarely needed (in addition, most users do not want HP support reading their application private data for security reasons (classified data, customer sensitive, and so on). The default configuration for crashconf is good enough for most situations.
• If enough disk space available or no other constraints imposed, you might enable all crash classes in dumps (check crashdump(1M)).
Guidelines for Selecting Device Swap
• Two swap areas on different disks are better than one single swap area
• Only configure one swap area per disk• Device swap areas should be of similar size• Consider the speed of the disks
Swap LV
Swap LV
No!
Swap LV Swap LV
Yes!
Swap LV
HP-UX Post-crash Manual Dump Export
• If the dump was not saved completely due to lack of space in the crash directory you have the possibility to save the dump again. The -r option (resave) need to be included when this is not the first time that savecrash runs.
# savecrash -v [-r] <crash directory>
• There is also the possibility to save the
dump directly to a tape:
# savecrash -v [-r] -t <tapedevice>
HP-UX Manual Dump Export from a Specific Dump Device
To manually extract the dump, type either the persistent DSF or the legacy
DSF of the whole disk along with the offset:
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME
------------ ------------ ------------ ------------ -----------------
3:0x000000 2612064 8364032 64:0x000002 /dev/vg00/lvol2
3:0x000001 18168692 40956 64:0x02000b /dev/vg01/dump_3
3:0x000001 18127732 40956 64:0x02000a /dev/vg01/dump_2
3:0x000001 18086772 40956 64:0x020009 /dev/vg01/dump_1
# savecrash -D /dev/rdisk/disk4 -O 18086772 -r -v .
or
# savecrash -D /dev/rdsk/c2t1d0 -O 18086772 -r -v .
Swapoff
• Available with HP-UX 11.31.
• The swapoff(1M) command disables swapping on the specified swap device(s) for the current boot. The term swap refers to an obsolete implementation of virtual memory; HP-UX actually implements virtual memory by way of paging rather than swapping. This command and others retain names derived from swap for historical reasons.
• Does not remove swap device from /etc/fstab.
• Will not be successful if amount of swap is needed, for example, reserve space as reported by swapinfo(1M).
• Example:
# /usr/sbin/swapoff /dev/vg00/lvol2
Swapoff – Real Life Example – Part 1
• Remove primary swap and move it into another volume group. To remove the primary swap, we need to ensure that the new swap device has at least enough space that “reserve” requires. Otherwise, swapoff(1M) command will fail!
# lvcreate -C y –r n -L 8192 -n lvswap2 /dev/vgswap
# swapon -f /dev/vgswap/lvswap2
# swapinfo -tm Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 8192 0 8192 0% 0 - 1 /dev/vg00/lvol2
dev 8192 0 8192 0% 0 - 1 /dev/vgswap/lvswap2
reserve - 1301 -1301
memory 3876 963 2913 25%
total 20260 2264 17996 11% - 0 -
Swapoff – Real Life Example – Part 2
• Remove primary swap on-line:
# swapoff /dev/vg00/lvol2
# lvrmboot -s vg00
# swapinfo -tm Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 8192 0 8192 0% 0 - 1 /dev/vgswap/lvswap2
reserve - 1291 -1291
memory 3876 963 2913 25%
total 12068 2254 9814 19% - 0 -
Swapoff – Real Life Example – Part 3
• Add line into /etc/fstab for the new primary swap and reboot the server:
/dev/vg00/lvol3 / vxfs delaylog 0 1
/dev/vg00/lvol1 /stand vxfs tranflush 0 1
/dev/vg00/lvol4 /home vxfs delaylog 0 2
/dev/vg00/lvol5 /tmp vxfs delaylog 0 2
/dev/vg00/lvol6 /usr vxfs delaylog 0 2
/dev/vg00/lvol7 /var vxfs delaylog 0 2
/dev/vg00/lvol8 /var/tmp vxfs delaylog 0 2
#/dev/vg00/lvdump3 / dump defaults 0 0
/dev/vgswap/lvswap2 / swap defaults 0 0
Swapoff – Real Life Example – Part 4
• After the reboot, check swap status and confirm that non-root volume is now the primary swap:
# swapinfo -tm Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 8192 0 8192 0% 0 - 1 /dev/vgswap/lvswap2
reserve - 1283 -1283
memory 3876 950 2926 25%
total 12068 2233 9835 19% - 0 -
Swapoff – Real Life Example – Part 5
• However, because we did not initialize the disk in vgswap with “-B” option, it does not contain the Boot Area, and cannot be added with “lvlnboot -s /dev/vgswap/lvswap2”. As a result, this is reported:
# lvlnboot -v
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/disk/disk6_p2 -- Boot Disk
Boot: lvol1 on: /dev/disk/disk6_p2
Root: lvol3 on: /dev/disk/disk6_p2
No Swap Logical Volume configured
No Dump Logical Volume configured
Swapoff – Real Life Example – Part 6• We still have one persistent dump device, which is NOT listed in /etc/fstab*:
# crashconf -v
Crash dump configuration has been changed since boot.
CLASS PAGES INCLUDED IN DUMP DESCRIPTION
-------- ---------- ---------------- -------------------------------------
UNUSED 584997 no, by default unused pages
USERPG 171077 no, by default user process pages
BCACHE 7529 no, by default buffer cache pages
KCODE 11892 no, by default kernel code pages
USTACK 1128 yes, by default user process stacks
FSDATA 16 yes, by default file system metadata
KDDATA 238003 yes, by default kernel dynamic data
KSDATA 10563 yes, by default kernel static data
SUPERPG 18286 no, by default unused kernel super pages
Total pages on system: 1043491
Total pages included in dump: 249710
Dump compressed: ON
Dump Parallel: ON
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME
------------ ------------ ------------ ------------ -------------------------
1:0x000002 57023328 4096000 64:0x000009 /dev/vg00/lvdump3
------------
4096000
Persistent dump device list:
/dev/vg00/lvdump3
Crash Dump – Two Dump Unit Example – Part 1# crashconf -v
Crash dump configuration has been changed since boot.
CLASS PAGES INCLUDED IN DUMP DESCRIPTION
-------- ---------- ---------------- -------------------------------------
UNUSED 4264207 no, by default unused pages
USERPG 185052 no, by default user process pages
BCACHE 45250 no, by default buffer cache pages
KCODE 11859 no, by default kernel code pages
USTACK 1271 yes, by default user process stacks
FSDATA 16 yes, by default file system metadata
KDDATA 581797 yes, by default kernel dynamic data
KSDATA 10569 yes, by default kernel static data
SUPERPG 107834 no, by default unused kernel super pages
Total pages on system: 5207855
Total pages included in dump: 593653
Dump compressed: OFF
Dump Parallel: ON
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME
------------ ------------ ------------ ------------ -------------------------
1:0x000004 2612064 8388608 64:0x000002 /dev/vgroot/lvol2
1:0x000003 2496 1048576 64:0x010001 /dev/vgdump/dump2
1:0x000003 16386496 1048576 64:0x010002 /dev/vgdump/dump3
------------
10485760
Persistent dump device list:
/dev/vgroot/lvol2
Crash Dump – Two Dump Unit Example – Part 2*** A system crash has occurred. (See the above messages for details.)
*** The system is now preparing to dump physical memory to disk, for use
*** in debugging the crash.
*** The dump will be a SELECTIVE dump with
compression OFF and concurrency ON: 2320 of 20344 megabytes.
Primary Dump Header Location :
Device details:
Major number: 0x1f Minor number: 0xb0000
Offset: 16386496.
*** Dumping: 100% complete (2320 of 2320 MB)
time: 84 seconds, Number of Dump units: 2
Crash Dump Without Primary Swap, No Persistent Devices, and No Dump Devices in /etc/fstabConsole logs at boot time after a crash:
No crash dump devices defined.
Persistent dump device list is empty.
All subsequent crashes will fail to collect data into dump volumes:
Swap device table: (start & size given in 512-byte blocks)
entry 0 - auto-configured on root device; ignored - no room
WARNING: No swap device configured, so dump cannot be defaulted to
primary swap.
WARNING: No dump devices are configured. Dump is disabled.
Message buffer contents after system crash:
These messages are the contents of msgbuf, which should have been saved
In the dump. They are output to the console, as the dump was not taken.
How to Set the Dump Order for Saving System Crash – Part 1• The current dump configuration first saves the crash to dump2 , dump1 , then to lvol2:
# crashconf
Crash dump configuration is changed after boot:
CLASS PAGES INCLUDED IN DUMP DESCRIPTION -------- ---------- ---------------- ------------------------------------- UNUSED 570458 no, by default unused pages USERPG 136677 no, by default user process pages BCACHE 10426 no, by default buffer cache pages KCODE 7764 yes, forced kernel code pages USTACK 1172 yes, by default user process stacks FSDATA 8 yes, by default file system metadata KDDATA 192353 yes, by default kernel dynamic data KSDATA 3641 yes, by default kernel static data SUPERPG 120995 no, by default unused kernel super pages
Total pages on system: 4173976 Total pages included in dump: 1253843
Dump compressed: ON
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME ------------ ---------- ---------- ------------ ------------------------- 31:0x021000 924532 4194300 64:0x000002 /dev/vg00/lvol2 31:0x021000 27843444 2097150 64:0x00000a /dev/vg00/dump1 31:0x021000 27859828 1048575 64:0x00000b /dev/vg00/dump2 ---------- 7340025
How to Set the Dump Order for Saving System Crash – Part 2SOLUTION:
• /etc/fstab does not list vg00/lvol2 , because it is the default dump volume.
/dev/vg00/dump1 ... dump defaults 0 0 /dev/vg00/dump2 ... dump defaults 0 0
• Edit /etc/fstab file for the new order of the dump LVs. The order of the dump LVs is opposite of the placement in the file, and vg00/lvol2 needs to be listed last to be used as the first dump lvol.
New listing of dump area's in /etc/fstab ------------------------------------------------------------- /dev/vg00/dump2 ... dump defaults 0 0 # last dump area used /dev/vg00/dump1 ... dump defaults 0 0 # second dump area used /dev/vg00/lvol2 ... dump defaults 0 0 # first dump area used
• Edit /etc/rc.config.d/crashconf :
CRASHCONF_ENABLED=1 CRASHCONF_READ_FSTAB=1 CRASHCONF_REPLACE=1
How to Set the Dump Order for Saving System Crash – Part 3• Put the new dump configuration in place (when a crash is saved, the first dump area is lvol2 followed by
dump1 , then by dump2 ):
# /sbin/rc1.d/S080crashconf start
• Check the new configuration:
# crashconf
CLASS PAGES INCLUDED IN DUMP DESCRIPTION -------- ---------- ---------------- ------------------------------------- UNUSED 169224 no, by default unused pages USERPG 500811 no, by default user process pages BCACHE 10412 no, by default buffer cache pages KCODE 7764 yes, forced kernel code pages USTACK 1218 yes, by default user process stacks FSDATA 20 yes, by default file system metadata KDDATA 241200 yes, by default kernel dynamic data KSDATA 3641 yes, by default kernel static data SUPERPG 109204 no, by default unused kernel super pages
Total pages on system: 4173976 Total pages included in dump: 1253843
Dump compressed: ON
DEVICE OFFSET(kB) SIZE (kB) LOGICAL VOL. NAME ------------ ---------- ---------- ------------ ------------------------- 31:0x021000 27859828 1048575 64:0x00000b /dev/vg00/dump2 31:0x021000 27843444 2097150 64:0x00000a /dev/vg00/dump1 31:0x021000 924532 4194300 64:0x000002 /dev/vg00/lvol2 ---------- 7340025
Example of Distributed Swap Design
# /usr/sbin/swapinfo –t Kb Kb Kb PCT START/ Kb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 4194304 0 4194304 0% 0 - 1 /dev/vgroot/lvol2 (4096MB)
dev 4194304 0 4194304 0% 0 - 0 /dev/vgswap1/swap1 (4096MB)
dev 4194304 0 4194304 0% 0 - 0 /dev/vgswap2/swap2 (4096MB)
reserve - 13417244 -13417244
memory 25135192 5363876 19771316 21%
I am also very passionate about naming volume groups and
logical volumes in a meaningful manner. *
Paginglist Command
# /usr/sam/lbin/paginglist
/dev/vg00/lvol2|dev|4194304|4.0 GB|0|0.0 KB|4194304|4.0 GB|0%|0|-|1|no|now|
reserve|reserve|0|0.0 KB|2019848|1.9 GB|-2019848|-2019848.`KB||0||0|no|now|
total|total|4194304|4.0 GB|2019848|1.9 GB|2174456|2.1 GB|48%|0|0|0|no|now|
Patch Servers RegularlySome of HP-UX 11.31 dump patches:
PHKL_41977: HANG OTHER crashconf(1M) hangs when trying to configure
more than 32 dump devices. This patch fix allows to configure a logical volume
as primary swap and, provide support to FCD and FCLP NPIV (N_Port ID
Virtualization) enablement.
PHKL_41257: HANG During MCA handling, the system hangs in the process
of generating a crashdump.
PHKL_39740: OTHER System fails to dump memory into dump devices.
PHKL_38628: PANIC P
HKL_38414: ABORT If in the kernel, base page size is configured greater than
4k, dump may get aborted prematurely and affect debugging of crash.
Add Timestamps to RC scripts – Part 1• If there are RC startup problems, /etc/rc.log is usually the first place
we need to check. The output from RC scripts can be found there, but rc.log has no timestamp for each RC script.
• In order to let rc.log has timestamp for each RC script, put date command into each RC script, but this is not a good choice because there are so many files to updates. A better option is to set /sbin/rc.utils. The rc.utils script intercepts the output of RC scripts and logs it to /etc/rc.log , we can make it log timestamps as well.
• Backup /sbin/rc.utils before you make changes, ensure permissions
unchanged:
# cp -p /sbin/rc.utils /sbin/rc.utils.bak
• Edit /sbin/rc.utils , find the two lines echo >> $LOGFILE , (one is under routine do_screen_mode , the other is under do_line_mode ), insert a new line:
date >> $LOGFILE
Add Timestamps to RC scripts – Part 2
/etc/rc.log reports:
Thu Aug 18 12:22:28 EST 2011
Configure system crash dumps
Output from "/sbin/rc1.d/S080crashconf start":
----------------------------
EXIT CODE: 0
...
Thu Aug 18 12:22:33 EST 2011
Save system crash dump if needed
Output from "/sbin/rc1.d/S440savecrash start":
----------------------------
savecrash directory not set; defaulting to: /var/adm/crash
*EXIT: parse_args
ENTER: open_source
ENTER: read_header
ENTER: get_hdr_loc
*EXIT: get_hdr_loc
savecrash: Finished Reading Header From: device : /dev/rdsk/c11t0d0 offset:16386496
Crash Dump Scenarios – Part 1
• If there are no crash dump devices on HP-UX, by design, server will default to primary swap for saving crash dumps!
• Persistent crash dump devices must be in root
volume group in LVM, but can be in any data group in VxVM (Symantec documentation confirms it too).
• If the crash dump devices are not persistent, and
they are not listed in /etc/fstab, and swap is not in root volume group, HP-UX will happily use non-persistent dump devices from other volume groups AS LONG as they are defined in the currently running kernel configuration (check with crashconf(1M) command).
Crash Dump Scenarios – Part 2• To make non-persistent dump devices enabled
permanently, they need to be added into /etc/fstab and “switched on” via crashconf(1M) command and/or /etc/rc.config.d/crashconf BEFORE crash happens. Otherwise, if there is no fall-back to primary swap, crash dump will FAIL.
• A dump can be saved to both non-persistent and
persistent dump devices*.
• If there are persistent crash dump devices (they must be in root volume group in LVM, but can be in any data group in VxVM), they will be used for saving crash dumps even if they are not listed in /etc/fstab.
Aug 2011
Thank You!
Dusan BaljevicSydney, Australia