Disk Group Cannot Be Imported. Serial Split Brain Detected
-
Upload
mohaideen-abdul-kader -
Category
Documents
-
view
37 -
download
0
description
Transcript of Disk Group Cannot Be Imported. Serial Split Brain Detected
Description : Disk Group Cannot be imported. Serial Split Brain Detected
Platform : Solaris
Software : Veritas Volume Manager
Category : How-To
Procedure
In this document we are going to see how to fix a Serial Split Brain issue
When you try to import the disk group you would get the below error
Disk group is not imported. Error being "vxvm:vxconfigd: [ID 457036 daemon.notice]
V-5-1-9576 Split Brain. da id is 0.2, while dm id is 0.1 for dm A0D5
May 10 16:34:33 tncdx15 vxvm:vxconfigd: [ID 220643 daemon.error]
V-5-1-569 Disk group datadg, Disk c3t21d0s2: C
annot auto-import group:"
Disk group is not imported automatically.
Cause : A disk was being replaced in array
Serial Split brain condition arises when the "SSB_ID" parameter stored into private region of every disk
in a diskgroup doesn't match.
This could happen if any disk was taken out of diskgroup (because of failure or to transfer some data
into other host)
Solution
First the command "vxsplitlines" need to be run on the disk group. This gives result as which & all disks
are suffered with serial split brain.
# vxsplitlines –g <disk group name>
# vxspiltlines -g datadg
VxVM vxsplitlines NOTICE V-5-2-2708 There are 1 pools.
The Following are the disks in each pool. Each disk in the same pool
has config copies that are similar.
VxVM vxsplitlines INFO V-5-2-2707 Pool 0.
c3t0d0s2 A0D1
To see the configuration copy from this disk issue
# /etc/vx/diag.d/vxprivutil dumpconfig /dev/vx/dmp/c3t0d0s2
To import the diskgroup with config copy from this disk use the following command
# /usr/sbin/vxdg -o selectcp=1141218744.29.tncdx15 import datadg
The following are the disks whose ssb ids don't match in this config copy
A0D3
A0D5
Above error mentions that disk A0D3 & A0D5 are suffering with Split brain. To verify this run
following command:
# vxsplitlines -g <disk group> -c <disk name>
For e.g
#vxsplitlines -g datadg -c c3t0d0s2
VxVM vxsplitlines INFO V-5-2-2701 DANAME(DMNAME) || Actual SSB || Expected SSB
VxVM vxsplitlines INFO V-5-2-2700 c3t0d0s2( A0D1 ) || 0.1 || 0.1 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c3t1d0s2( A0D2 ) || 0.1 || 0.1 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c3t2d0s2( A0D3 ) || 0.2 || 0.1 ssb ids don't match
VxVM vxsplitlines INFO V-5-2-2700 c3t3d0s2( A0D4 ) || 0.1 || 0.1 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c3t4d0s2( A0D5 ) || 0.2 || 0.1 ssb ids don't match
VxVM vxsplitlines INFO V-5-2-2700 c3t5d0s2( A0D6 ) || 0.1 || 0.1 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c3t6d0s2( A0D7 ) || 0.1 || 0.1 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c3t7d0s2( A0D8 ) || 0.1 || 0.1 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c3t9d0s2( A0D10 ) || 0.1 || 0.1 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c3t16d0s2( A0D12 ) || 0.1 || 0.1 ssb ids match
from the output above can bee seen that AOD3 & A0D5 have different ssb_id.
ssb_id could also be verified by running "vxdisk list" on that disk.
#vxdisk list c3t0d0s2
devicetag: c3t0d0
type: auto
hostid: tncdx15
disk: name= id=1141218744.29.tncdx15
group: name=datadg id=1141218774.31.tncdx15
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig autoimport
pubpaths: block=/dev/vx/dmp/c3t0d0s2 char=/dev/vx/rdmp/c3t0d0s2
version: 3.1
iosize: min=512 (bytes) max=2048 (blocks)
public: slice=2 offset=2304 len=71124864 disk_offset=0
private: slice=2 offset=256 len=2048 disk_offset=0
update: time=1178813226 seqno=0.123415
ssb: actual_seqno=0.1
Compare "vxdisk list" outputs of various disks in the diskgroup. It is quite possible that some of the disks
might have similar ssb_id, but it is not necessary that those disks have latest configuration copy.
To figure out which disk has latest configuration copy, run following command on multiple disks in a
diskgroup.
# /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c3t0d0s2 >dump_c3t0d0s2
(Check for private slice for proper dumpconfig output)
# /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c3t2d0s2 >dump_c3t3d0s2
# /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c3t7d0s2 >dump_c3t7d0s2
From the various dumpconfig outputs, make a note of following information:
dump_c3t0d0s2 dump_c3t2d0s2
update_tid = 0.1027 update_tid=0.1027
config_tid = 0.1355 config_tid =0.1357
ssb_id =0.1 ssb_id =0.2
dump_c3t3d0s2
update_tid = 0.1027
config_tid = 0.1355
ssb_id=0.1
Now here it becomes a bit confusing, as we can see that dump_c3t3d0s2 has latest config_tid (0.1357) &
at same time it has ssb_id 0.2 which doesn't match with expected ssb_id that is 0.1.
To clear this confusion, construct a vxprint output with above "vxprivutil" output.
# cat dump_c3t2d0s2 | vxprint -ht -D -
Disk group: datadg
DG NAME NCONFIG NLOG MINORS GROUP-ID
ST NAME STATE DM_CNT SPARE_CNT APPVOL_CNT
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
CO NAME CACHEVOL KSTATE STATE
VT NAME NVOLUME KSTATE STATE
V NAME RVG/VSET/CO KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
SC NAME PLEX CACHE DISKOFFS LENGTH [COL/]OFF DEVICE MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO
dg datadg default default 55000 1141218774.31.tncdx15
dm A0D1 - - - - -
dm A0D2 - - - - -
dm A0D3 - - - - -
dm A0D4 - - - - -
dm A0D5 - - - - -
dm A0D6 - - - - -
dm A0D7 - - - - -
dm A0D8 - - - - -
dm A0D9 - - - - -
dm A0D10 - - - - -
dm A0D11 - - - - -
dm A0D12 - - - - -
dm A0D13 - - - - -
dm A0D14 - - - - SPARE
dm A0D15 - - - - SPARE
dm A0D16 - - - - REMOVED
dm A0D17 - - - - -
dm A0D18 - - - - -
dm A0D19 - - - - -
dm A0D20 - - - - -
dm A0D21 - - - - -
dm A0D22 - - - - -
v db001_v - DISABLED ACTIVE 25165824 SELECT - fsgen
pl db001_v-01 db001_v DISABLED RECOVER 25165824 CONCAT - RW
sd A0D1-01 db001_v-01 A0D1 0 20971520 0 - DIS
sd A0D1-08 db001_v-01 A0D1 20974688 4194304 20971520 - DIS
pl db001_v-02 db001_v DISABLED RECOVER 25165824 CONCAT - RW
sd A0D16-01 db001_v-02 A0D16 0 25165824 0 - DIS
pl db001_v-03 db001_v DISABLED RECOVER LOGONLY CONCAT - RW
sd A0D1-02 db001_v-03 A0D1 20971520 528 LOG - DIS
v db002_v - DISABLED ACTIVE 6291456 SELECT - fsgen
pl db002_v-01 db002_v DISABLED RECOVER 6291456 STRIPE 2/128 RW
sd A0D2-01 db002_v-01 A0D2 0 3145728 0/0 - RLOC
sd A0D3-01 db002_v-01 A0D3 0 3145728 1/0 - DIS
pl db002_v-02 db002_v DISABLED ACTIVE 6291456 STRIPE 2/128 RW
sd A0D17-01 db002_v-02 A0D17 0 3145728 0/0 - DIS
sd A0D18-01 db002_v-02 A0D18 0 3145728 1/0 - DIS
pl db002_v-03 db002_v DISABLED RECOVER LOGONLY CONCAT - RW
sd A0D1-03 db002_v-03 A0D1 20972048 528 LOG - DIS
v db003_v - DISABLED ACTIVE 8388608 SELECT - fsgen
pl db003_v-01 db003_v DISABLED RECOVER 8388608 CONCAT - RW
sd A0D4-01 db003_v-01 A0D4 0 8388608 0 - DIS
pl db003_v-02 db003_v DISABLED RECOVER 8388608 CONCAT - RW
sd A0D19-UR-001 db003_v-02 A0D19 0 8388608 0 - RLOC
pl db003_v-03 db003_v DISABLED RECOVER LOGONLY CONCAT - RW
sd A0D1-04 db003_v-03 A0D1 20972576 528 LOG - DIS
v db004_v - DISABLED ACTIVE 6291456 SELECT - fsgen
pl db004_v-01 db004_v DISABLED ACTIVE 6291456 CONCAT - RW
sd A0D5-01 db004_v-01 A0D5 0 6291456 0 - DIS
pl db004_v-02 db004_v DISABLED ACTIVE 6291456 CONCAT - RW
sd A0D20-01 db004_v-02 A0D20 0 6291456 0 - DIS
pl db004_v-03 db004_v DISABLED RECOVER LOGONLY CONCAT - RW
sd A0D1-05 db004_v-03 A0D1 20973104 528 LOG - DIS
v db005_v - DISABLED ACTIVE 12582912 SELECT - fsgen
pl db005_v-01 db005_v DISABLED ACTIVE 12582912 CONCAT - RW
sd A0D6-01 db005_v-01 A0D6 0 12582912 0 - DIS
pl db005_v-02 db005_v DISABLED ACTIVE 12582912 CONCAT - RW
sd A0D21-01 db005_v-02 A0D21 0 12582912 0 - DIS
pl db005_v-03 db005_v DISABLED RECOVER LOGONLY CONCAT - RW
sd A0D1-06 db005_v-03 A0D1 20973632 528 LOG - DIS
v db006_v - DISABLED ACTIVE 10485760 SELECT - fsgen
pl db006_v-01 db006_v DISABLED ACTIVE 10485760 CONCAT - RW
sd A0D7-01 db006_v-01 A0D7 0 10485760 0 - DIS
pl db006_v-02 db006_v DISABLED ACTIVE 10485760 CONCAT - RW
sd A0D22-01 db006_v-02 A0D22 0 10485760 0 - DIS
pl db006_v-03 db006_v DISABLED RECOVER LOGONLY CONCAT - RW
sd A0D1-07 db006_v-03 A0D1 20974160 528 LOG - DIS
v repository - DISABLED ACTIVE 20971520 SELECT - fsgen
pl repository-01 repository DISABLED ACTIVE 20971776 STRIPE 6/128 RW
sd A0D8-01 repository-01 A0D8 0 3495296 0/0 - DIS
sd A0D9-01 repository-01 A0D9 0 3495296 1/0 - DIS
sd A0D10-01 repository-01 A0D10 0 3495296 2/0 - DIS
sd A0D11-01 repository-01 A0D11 0 3495296 3/0 - DIS
sd A0D12-01 repository-01 A0D12 0 3495296 4/0 - DIS
sd A0D13-01 repository-01 A0D13 0 3495296 5/0 - DIS
#
Check with customer if generated output appears to him as correct. If it appears as correct you can
import the diskgroup with configuration present on this disk.
# vxdg -o selectcp=<disk id> import <diskgroup>
For e.g
# /usr/sbin/vxdg -o selectcp=1141219312.37.tncdx15 import datadg
[ Please note it is quite possible that diskgroup wont import here. If it fails give an -Cf option with vxdg.
#/usr/sbin/vxdg -Cf -o selectcp=1141219312.37.tncdx15 import datadg ]
Confirm that disk group is imported.
# vxdisk list
Start the volume
#vxvol -g <diskgroup> start <volume name>
(If plexes are in recover state, you need to follow recovery procedure of plexes)
Mount the volume
#mount -F <fs type> /dev/vx/dsk/dg/vol-name /mount-point //(It may ask to run a fsck here)
Note : When diskgroup is imported, ssb_id parameter in all the hard disks is resetted to 0.0.