All in One- Clariion Blogs

Clariion blogs Welcome to my EMC CLARiiON blog central. Here, I will share my knowledge of the Clariions, as well as any new information that I come across in my teachings.

http://clariionblogs.blogspot.in

Clariion Disk Format

Disk Format

The Clariion Formats the disks in Blocks. Each Block written out to the disk is

520 bytes in size. Of the 520 bytes, 512 bytes is used to store the actual

DATA written to the block. The remaining 8 bytes per block is used by the

Clariion to store System Information, such as a Timestamp, Parity

Information, Checksum Data.

Element Size – The Element Size of a disk is determined when a LUN is

bound to the RAID Group. In previous versions of Navisphere, a user could

configure the Element Size from 4 blocks per disk 256 blocks per disk. Now,

the default Element Size in Navisphere is 128. This means that the Clariion

will write 128 blocks of data to one physical disk in the RAID Group before

moving to the next disk in the RAID Group and write another 128 blocks to

that disk, so on and so on.

http://clariionblogs.blogspot.in/

http://bp0.blogger.com/_XRjGvLrhUF0/R6DTQt0cUoI/AAAAAAAAADc/5qByyjZENXk/s1600-h/Clariion+Disk+Format.jpg

Chunk Size – The Chunk Size is the amount of Data the Clariion writes to a

physical disk at a time. The Chunk Size is calculated by multiplying the

Element Size by the amount of Data per block written by the Clariion.

128 blocks x 512 bytes of Data per block = 65,536 bytes of Data per Disk.

That is equal to 64 KB. So, the Chunk Size, the amount of Data the Clariion

writes to a single disk, before writing to the next disk in the RAID Group is 64

KB.

LUNs

LUNs

As stated in the Host Configuration Slide, a LUN is the disk space that is

created on the Clariion. The LUN is the space that is presented to the host.

The host will see the LUN as a “Local Disk.”

In Windows, the Clariion LUN will show up in Disk Manager is Drive #, which

the Windows Administrator can now format, partition, assign a Drive Letter,

etc…

In UNIX, the Clariion LUN will show up as a c_t_d_ address, which the UNIX

Administrator can now mount.

A LUN is owned by a single Storage Processor at a time. When creating a

LUN, you assign the LUN to a Storage Processor, SPA, SPB or let the Clariion

http://bp1.blogger.com/_XRjGvLrhUF0/R6DRM90cUnI/AAAAAAAAADU/KjdjPriR6jA/s1600-h/LUNs.jpg

choose by selecting AUTO. The Auto option lets the Clariion assign the next

LUN to the Storage Processor with the fewest number of LUNs.

The Properties/Settings of LUN during creation/binding are:

1. Selecting which RAID Group the LUN will be bound to.

2. If it is the first LUN created on a RAID Group, the first LUN will set the RAID

Type for the entire RAID Group. Therefore, when creating/binding the first

LUN on a RAID Group, you can select the RAID Type.

3. Select a LUN Id or number for the LUN.

4. Specify a REBUILD PRIORITY for the LUN in the event of a Hot Spare

replacing a failed disk.

5. Specify a VERIFY PRIORITY for the LUN to determine the speed in which

the Clariion runs a “SNIFFER” in the background to scrub the disks.

6. Enable or Disable Read and Write Cache at the LUN level. An example

might be to disable Read/Write Cache for a LUN that is given to a

Development Server. This ensures that the Development LUNs will not use

the Cache that is needed for Production Data.

7. Enable Auto Assign. By default this box is unchecked in Navisphere. That

is because you will have some sort of Host Based Software that will manage

the trespassing and failing back of a LUN.

8. Number of LUNs to Bind. You can bind up to 128 LUNs on a single RAID

Group.

9. SP Ownership. You can select if you want your LUN(s) to belong to SP A, SP

B, or the AUTO option in which the Clariion decides LUN ownership based on

the Storage Processor with the fewest number of LUNs.

10. LUN Size. You specify the size of a LUN by entering the numbers, and

selecting MB (MegaBytes), GB (GigaBytes), TB (TeraBytes), or Block Count to

specify the number of blocks a LUN will be. This is critical for SnapView

Clones, and MirrorView Secondary LUNs.

The amount of LUNs a Clariion can support is going to be Clariion specific.

CX 300 – 512 LUNs

CX3-20 – 1024 LUNs

CX3-80 – 2048 LUNs

RAID 1_0

Order of Disks in RAID Group for RAID 1_0.

When creating a RAID 1_0 Raid Group, it is important to know and

understand the order of the drives as they are put into the RAID Group will

absolutely make a difference in the Performance and Protection of that RAID

Group. If left to the Clariion, it will simply choose the next disks in the order

in which is sees the disks to create the RAID Group. However, this may not

be the best way to configure a RAID 1_0 RAID Group. Navisphere will take

the next disks available, which are usually right next to one another in the

same enclosure.

In a RAID 1_0 Group, we want the RAID Group to span multiple enclosures as

illustrated above. The reason for this is as we can see, the Data Disks will be

on Bus 1_Enclosure 0, and the Mirrored Data Disks will be on Bus

2_Enclosure 0. The advantage of creating the RAID Group this way is that we

place the Data and Mirrors on two separate enclosures. In the event of an

enclosure failure, the other enclosure could still be alive and maintaining

access to the data or the mirrored data. The second advantage is

Performance. Performance could be gained through this configuration

because you are spreading the workload of the application across two

different buses on the back of the Clariion.

http://bp2.blogger.com/_XRjGvLrhUF0/R5-fcd0cUmI/AAAAAAAAADM/4rssPf2ZbtM/s1600-h/RAID+1_0.jpg

Notice the order in which the disks were placed into the RAID 1_0 Group. In

order to for the disks to be entered into the RAID Group in this order, they

must be manually entered into the RAID Group this way via Navisphere or

the Command Line.

The first disk into the RAID Group receives Data Block 1.

The second disk into the RAID Group receives the Mirror of Data Block 1.

The third disk into the RAID Group receives Data Block 2.

The fourth disk into the RAID Group receives the Mirror of Data Block 2.

The fifth disk into the RAID Group received Data Block 3.

The sixth disk into the RAID Group receives the Mirror of Data Block 3.

If we let the Clariion choose these disks in its particular order, it would select

them:

First disk – 1_0_0 (Data Block 1)

Second disk – 1_0_1 (Mirror of Data Block 1)

Third disk – 1_0_2 (Data Block 2)

Fourth disk – 2_0_0 (Mirror of Data Block 2)

Fifth disk – 2_0_1 (Data Block 3)

Sixth disk – 2_0_2 (Mirror of Data Block 3)

This defeats the purpose of having the Mirrored Data on a different enclosure

than the Data Disks.

RAID Groups and Types

http://bp0.blogger.com/_XRjGvLrhUF0/R5-eA90cUlI/AAAAAAAAADE/9u8SuDCggIM/s1600-h/Raid+Groups+and+Types.jpg

RAID GROUPS and RAID Types

The above slide illustrates the concept of creating a RAID Group and the

supported RAID types of the Clariions.

RAID Groups

The concept of a RAID Group on a Clariion is to group together a number of

disks on the Clariion into one big group. Let’s say that we need a 1 TB LUN.

The disks we have a 200 GB in size. We would have to group together five

(5) disks to get to the 1 TB size needed for the LUN. I know we haven’t taken

into account for parity and what the RAW capacity of a drive is, but that is

just a very basic idea of what we mean by a RAID Group. RAID Groups also

allow you to configure the Clariion in a way so that you will know what LUNs,

Applications, etc…live on what set of disks in the back of the Clariion. For

instance, you wouldn’t want an Oracle Database LUN on the same RAID

Group (Disks) as a SQL Database running on the same Clariion. This allows

you to create a RAID Group of a # of disks for the Oracle Database, and

another RAID Group of a different set of disks for the SQL Database.

RAID Types

Above are the supported RAID types of the Clariion.

RAID 0 – Striping Data with NO Data Protection. The Clariions Cache

will write the data out to disk in blocks (chunks) that we will discuss later. For

RAID 0, the Clariion writes/stripes the data across all of the disks in the RAID

Group. This is fantastic for performance, but if one of the disks fail in the

RAID 0 Group, then the data will be lost because there is no protection of

that data (i.e. mirroring, parity).

RAID 1 – Mirroring. The Clariion will write the Data out to the first disk in

the RAID Group, and write the exact data to another disk in that RAID 1

Group. This is great in terms of data protection because if you were to lose

the data disk, the mirror would have the exact copy of the data disk, allowing

the user to access the disk.

RAID 1_0 – Mirroring and Striping Data. This is the best of both worlds if

set up properly. This type of RAID Group will allow the Clariion to stripe data

and mirror the data onto other disks. However, the illustration above of RAID

1_0, is not the best way of configuring that type of RAID Group. The next

slide will go into detail as to why this isn’t the best method of configuring

RAID 1_0.

RAID 3 – Striping Data with a Dedicated Parity Drive. This type of RAID

Group allows the Clariion to stripe data the first X number of disks in the

RAID Group, and dedicate the last disk in the RAID Group for Parity of the

data stripe. In the event of a single drive failure in this RAID Group, the failed

disk can be rebuilt from the remaining disks in the RAID Group.

RAID 5 – Striping Data with Distributed Parity. RAID type 5 allows the

Clariion to distribute the Parity information to rebuild a failed disk across the

disks that make up the RAID Group. As in RAID 3, in the event of a single

drive failure in this RAID Group, the failed disk can be rebuilt from the

remaining disks in the RAID Group.

RAID 6 – Striping Data with Double Parity. This is new to Clariion world

starting in Flare Code 26 of Navisphere. The simplest explanation of RAID 6

we can use for RAID 6 is the RAID Group uses striping, such as RAID 5, with

double the parity. This allows a RAID 6 RAID Group to be able to have two

drive failures in the RAID Group, while maintaining access to the LUNs.

HOT SPARE – A Dedicated Single Disk that Acts as a Failed Disk. A

Hot Spare is created as a single disk RAID Group, and is bound/created as a

HOT SPARE in Navisphere. The purpose of this disk is to act as the failed disk

in the event of a drive failure. Once a disk is set as a HOT SPARE, it is always

a HOT SPARE, even after the failed disk is replaced. In the slide above, we

list the steps of a HOT SPARE taking over in the event of a disk failure in the

Clariion.

1. A disk fails – a disk fails in a RAID Group somewhere in the back of the

Clariion.

2. Hot Spare is Invoked – a Clariion dedicated HOT SPARE acts as the failed

disk in Navisphere. It will assume the identity of the failed disk’s

Bus_Enclosure_Disk Address.

3. Data is REBUILT Completely onto the Hot Spare from the other disks in the

RAID Group – The Clariion begins to recalculate and rebuild the failed disk

onto the Hot Spare from the other disks in the RAID Group, whether it be

copying from the MIRRORed copy of the disk, or through parity and data

calculations of a RAID 3 or RAID 5 Group.

4. Disk is replaced – Somewhere throughout the process, the failed drive is

replaced.

5. Data is Copied back to new disk – The data is then copied back to the new

disk that was replaced. This will take place automatically, and will not begin

until the failed disk is completely rebuilt onto the Hot Spare.

6. Hot Spare is back to a Hot Spare – Once the data is written from the Hot

Spare back to the failed disk, the Hot Spare goes back to being a Hot Spare

waiting for another disk failure.

Hot Spares are going to be size and drive type specific.

Size. The Hot Spare must be at least the same size as the largest size disk in

the Clariion. A Hot Spare will replace a drive that is the same size or a

smaller size drive. The Clariion does not allow multiple smaller Hot Spares

replace a failed disk.

Drive Type Specific. If your Clariion has a mixture of Drive Types, such as

Fibre and S.ATA disks, you will need Hot Spares of those particular Drive

Types. A Fibre Hot Spare will not replace a failed S.ATA disk and vice versa.

Hot Spares are not assigned to any particular RAID Group. They are used by

the Clariion in the event of any failure of that Drive Type. The

recommendation for Hot Spares is one (1) Hot Spare for every thirty (30)

disks.

There are multiple ways to create a RAID Group. One is via the Navisphere

GUI, and the other is through the Command Line Interface. In later slides we

will list the commands to create a RAID Group.

Posted by san guy at 1:42 PM 4 comments

The VAULT Drives

Vault Drives

All Clariions have Vault Drives. They are the first five (5) disks in all Clariions.

Disks 0_0_0 through 0_0_4. The Vault drives on the Clariion are going to

contain some internal information that is pre-configured before you start

putting data on the Clariion. The diagram will show what information is

stored on the Vault Disks.

The Vault.

The vault is a ‘save area’ across the first five disks to store write cache from

the Storage Processors in the event of a Power Failure to the Clariion, or a

Storage Processor Failure. The goal here is to place write cache on disk

before the Clariion powers off, therefore ensuring that you don’t lose the

data that was committed to the Clariion and acknowledged to the host. The

Clariions have the Standby Power Supplies that will keep the Storage

Processors running as well as the first enclosure of disks in the event of a

http://www.blogger.com/post-edit.g?blogID=1454409572013465610&postID=427536365767607162&from=pencil

http://bp0.blogger.com/_XRjGvLrhUF0/R5-ai90cUkI/AAAAAAAAAC8/sXeClXQ_FJg/s1600-h/Vault+Disks.jpg

power failure. If there is a Storage Processor Failure, the Clariion will go into

a ‘panic’ mode and fear that it may lose the other Storage Processor. To

ensure that it does not lose write cache data, the Clariion will also dump

write cache to the Vault Drives.

The PSM Lun.

The Persistent Storage Manager Lun stores the configuration of the Clariion.

Such as Disks, Raid Groups, Luns, Access Logix information, SnapView

configuration, MirrorView and SanCopy configuration as well. When this LUN

was first introduced on the Clariions back on the FC4700s, it used to appear

in Navisphere under the Unowned Luns container as Lun 223-PSM Lun. Users

have not been able to see it in Navisphere for awhile. However, you can grab

the information of the Array’s Configuration by executing the following

command.

naviseccli -h 10.127.35.42 arrayconfig -capture -output c:\

arrayconfig.xml -format XML -schema clariion

Example of Information retrieved from the File:

For a Disk:

/CLAR:Disk

CLAR:Disk type="Category"

CLAR:Bus type="Property"1/CLAR:Bus

CLAR:Enclosure type="Property"0/CLAR:Enclosure

CLAR:Slot type="Property"12/CLAR:Slot

CLAR:State type="Property"3/CLAR:State

CLAR:UserCapacityInBlocks

type="Property"274845/CLAR:UserCapacityInBlocks

For a LUN:

/CLAR:LUN

CLAR:LUN type="Category"

CLAR:Name type="Property"LUN 23/CLAR:Name

CLAR:WWN

type="Property"60:06:01:60:06:C4:1F:00:B1:51:C4:1B:B3:A2:DC:11/CLAR:W

WN

CLAR:Number type="Property"6142/CLAR:Number

CLAR:RAIDType type="Property"1/CLAR:RAIDType

CLAR:RAIDGroupID type="Property"13/CLAR:RAIDGroupID

CLAR:State type="Property"Bound/CLAR:State

CLAR:CurrentOwner type="Property"1/CLAR:CurrentOwner

CLAR:DefaultOwner type="Property”2/CLAR:DefaultOwner

CLAR:Capacity type="Property"2097152/CLAR:Capacity

Flare Database LUN.

The Flare Database LUN will contain the Flare Code that is running on the

Clariion. I like to say that it is the application that runs on the Storage

Processors that allows the SPs to create the Raid Groups, Bind the LUNs,

setup Access Logix, SnapView, MirrorView, SanCopy, etc…

Operating System.

The Operating System of the Storage Processors is stored to the first five

drives of the Clariion.

Now, please understand that this information is NOT in any way shape or

form laid out this way across these disks. We are only seeing that this

information is built onto these first five drives of the Clariion. This

information does take up disk space as well. The amount of disk space that it

takes up per drive is going to depend on what Flare Code the Clariion is

running. Clariions running Flare Code 19 and lower, will lose approximately 6

GB of space per disk. Clariions running Flare Code 24 and up, will lose

approximately 33 GB of space per disk. So, on a 300 GB fibre drive, the

actual raw capacity of the drive is 268.4 GB. Also, you would subtract

another 33 GB per disk for this Vault/PSM LUN/Flare Database LUN/Operating

System Information. That would leave you with about 235 GB per disk on the

first five disks.

Enclosure Types

Enclosure Types

The above page diagrams the back-end structure of a Clariion. How the disks

are laid out. Before we discuss the back-end bus structure, we should discuss

the different types of enclosures that the Clariion contains.

1.DAE. The Disk Array Enclosure. Disk Array Enclosures exist in all Clariions.

DAE’s are the enclosures that house the disks in the Clariion. Each DAE is

holds fifteen (15) disks. The disks are in slots that are numbered 0 to 14.

2.DPE. The Disk Processor Enclosure. The Disk Processor Enclosure is in the

Clariion Models CX300, CX400, CX500. The DPE is made up of two

components. It contains the Storage Processors, and the first fifteen (15)

disks of the Clariion.

3.SPE. The Storage Processor Enclosure. The Storage Processor Enclosure is

http://bp0.blogger.com/_XRjGvLrhUF0/R5-ZI90cUjI/AAAAAAAAAC0/4iNe_AN_EYc/s1600-h/Enclosure+Types.jpg

in the Clariion Models CX700 and the CX-3 Series. The SPE is the enclosure

that houses the Storage Processors.

The diagrams above lay out the DAE’s back-end bus structure. Data that

leaves Cache and is written to disk, or data that is read from disk and placed

into Cache travels along these back-end buses or loops. Some Clariions have

one back-end bus/loop to get data from enclosure to enclosure. Others have

two and four back-end buses/loops to push and pull data from the disks. The

more buses/loops, the more expected throughput for data on the back-end of

the Clariion.

The Clariion Model on the left is a diagram of a CX300/CX3-10 and CX3-20.

These models have a single back-end bus/loop to connect all of the

enclosures. The CX300 will have one back-end bus/loop running at a speed of

2 GB/sec, while the CX3-Series Clariions have the ability to run up to 4

GB/sec on the back-end.

The Clariion Model in the middle is a diagram of a CX500. The CX500 has two

back-end buses/loops. This gives the CX500, twice the amount of potential

throughput for I/Os than the CX300.

The Clariion Model on the right is a diagram of a CX700, CX3-40 and CX3-80.

These Clariions contain four back-end buses/loops. The CX3-80 will contain

the maximum back-end throughput with all four buses having the ability to

run at a 4 GB/sec speed.

Each enclosure has a redundant connection for the bus that it is connected.

This is in the event that the Clariion loses a Link Control Card (LCC) that

allow the enclosures to move data, or the loss of a Storage Processor. You

will see one bus cabled out of SP A and SP B, allowing both SP’s access to

each enclosure.

Enclosure Addresses

To determine an address of an enclosure, we need to know two things, what

bus it is on, and what number enclosure it is on that bus. On the Clariions in

the left diagram, there is only one back-end bus/loop. Every enclosure on

these Clariions will be on Bus 0. The enclosure numbers start at zero (0) for

the first enclosure and work their way up. On these Clariions, the first

enclosure of disks is labeled Bus 0_Enclosure 0 (0_0). The next enclosure of

disks is going to be Bus 0_Enclosure 1 (0_1). The next enclosure of disks 0_2,

and so on.

The CX500, with two back-end buses will alternate enclosures with the

buses. The first enclosure of disks will be the same as the Clariions on the

left of Bus 0_Enclosure 0 (0_0). The next enclosure of disks will utilize the

other back-end bus/loop, Bus 1. This enclosure is Bus 1_Enclosure 0 (1_0). It

is Enclosure 0, because it is the first enclosure of disks on Bus 1. The third

enclosure of disks is going to be back on Bus 0, 0_1. The next one up is on

Bus 1, 1_1. The enclosures will continue to alternate until the Clariion has all

of the supported enclosures. You might ask why it is cabled this way,

alternating buses. The reason being is that most companies don’t purchase

Clariions fully populated. Most companies buy disks on an as needed basis.

By alternating enclosures, you are using all of the back-end resources

available for that Clariion.

The Clariions on the right show the four bus structure. The first enclosure of

disks is going to be Bus 0_Enclosure 0 (0_0) as all other Clariions. The next

enclosure of disks is Bus 1_Enclosure 0 (1_0). Again, using the next available

back-end bus, and being the first enclosure of disks on that bus. The third

DAE is going to be Bus 2_Enclosure 0 (2_0). The fourth DAE is on the fourth

and last back-end bus. It is Bus 3_Enclosure 0 (3_0). From here, we are back

to Bus 0 for the next enclosure of disks. Bus 0_Enclosure 1 (0_1). The next

DAE is 1_1. The next would be 2_1 if we had one. 3_1, 0_2, and so on until

the Clariions were fully populated.

Disk Address

The last topic for this page are the disks themselves. To find a specific disk’s

address, we use the Enclosure Address and add the Slot number the disk is

in. This gives us the address that is called the B_E_D. Bus_Enclosure_Disk.

The Clariion on the left has a disk in slot number 13. The address of that disk

would be 0_2_13. The Clariion in the middle has a disk in slot number 10 of

Enclosure 1_1. This disk address would be 1_1_10. And the Clariion on the

right has a disk in Bus 2_Enclosure 0. It’s address is 2_0_6. And the disk in

Bus 1_Enclosure 1 is in slot 9. Address = 1_1_9.

Finally, each Clariion has a limit to the number of disks that it will support.

The chart below the diagrams provides the number of how many disks each

model can contain. The CX300 can have a maximum of 60 disks, whereas

the CX3-80 can have up to 480 disks.

The importance of this page is to know where the disks live in the back of the

Clariion in the event of disk failures, and more importantly how you are going

to lay out the disks. Meaning, what applications on going to be on certain

disks. In order to put that data onto disks, we have to create LUNs (will get to

it), which are carved out of RAID Groups (again, getting there shortly). RAID

Groups are a grouping of disks. To have a nice balance and to achieve as

much performance and throughput on the Clariion, we have to know how the

Clariion labels the disks and how the DAE’s are structured.

Cache WaterMarks

http://bp2.blogger.com/_XRjGvLrhUF0/R5-Y6d0cUiI/AAAAAAAAACs/tHGj994Ksfo/s1600-h/WaterMarks.jpg

WaterMarks

WaterMarks is what controls writing data out of Cache to disk. It is used to

manage how long data stays in Write Cache before it is written to disk.

This diagram is used to describe the types of “Flushing” data to disk, or

writing data out of Cache to disk.

The first type of Write Cache Flushing is Idle Flushing.

Idle Flushing is when the Clariion has the ability to take the ‘writes” into

cache, send the acknowledgement back to the host that the data is on

“disk.” While this is happening, the Clariion can also write data out to disk.

The Clariion will try to write to disk in a 64 KB “Chunk.” The cache is

absorbing the writes, grouping them together, and writing them to disk. This

will come into play later when we discuss how the Clariion formats the disks.

This is the perfect case scenario. The Cache takes in the writes, the Clariion

has the resources to write the blocks to disk.

The second type of Flushing is WaterMark Flushing.

This is maintained by percentages that you can configure in Cache. The goal

with WaterMark Flushing is to keep the Write Cache level between these two

percentages. We are using the default Low WaterMark Setting of 60%, and

High WaterMark Setting of 80%. These can be changed, and we will discuss

that later. With WaterMark Flushing, Cache is going to do it’s best to keep

Write Cache between these two levels. As Write Cache hits the High

WaterMark, the Clariion tries to flush down to the Low WaterMark. If the

amount of Write Cache is constantly between these two levels, the Clariion is

doing its job.

The last type of flushing is the “Forced Flush.”

A Forced Flush of Cache results in the Write Cache reaching capacity. The

Clariion will no longer accept data into write cache, as there is no more

room.

When a Forced Flush occurs, the following take place:

1. The Clariion disables Write Cache.

2. The Clariion begins to destage/flush the write data in Cache out to disk.

3. Now comes the performance issue. With the Clariion disabling Write

Cache, any new writes that come in from a host will bypass cache and be

written directly to disk. The host/application is now waiting for the

acknowledgement to return after the data was written to disk.

4. The Clariion will keep Write Cache disabled until it flushes to the Low

WaterMark.

5. Once Write Cache is flushed to the Low WaterMark level, Write Caching is

automatically re-enabled.

Cache Page Size

Cache Page Size

Here we are discussing the use of the Cache Page Size. We say that it is the

same as saying Cache Block Size. Each “Page” or block in Cache is a fixed

size. And, in the Clariion, the entire Cache is the same fixed size. Therefore,

we feel that this is one of the areas in Cache where knowing your

environment (applications, etc) can make a difference. In the diagram above,

we are illustrating the use of Cache with three different applications, Oracle,

SQL, and Exchange. Next to the applications is a Block Size. We are using

these three applications in this diagram because these seem to be the most

common applications people come to class with.

http://bp2.blogger.com/_XRjGvLrhUF0/R6Is-t0cUpI/AAAAAAAAADk/ABMD-sL3ggs/s1600-h/Cache+Page+Size.jpg

Next to the applications is a default Block Size. Again, we are only using

these as examples. You want to verify the applications running on the

Clariion and their Block Sizes.

There are four different Page Size Settings in Cache for the Clariion, 2 KB, 4

KB, 8 KB, and 16 KB. Let’s start with the default Clariion Page Size of 8 KB.

Again, every “Page” in Cache will be 8 KB in size. If we have an application

like Oracle running on this Clariion, and Oracle using a default Block Size of

16 KB, that would mean that every Oracle Block of data to the Clariion would

be broken into two separate Pages in Cache. With SQL writing to this 8 KB

Page Size, it is a one to one ratio, as it is with Exchange, however, with every

Exchange Block of data, there is a 4 KB waste of space per block, which

could be filling up Cache more rapidly with this “wasted space.”

The next Page Size down shows a 4 KB Page Size for Cache. The nice thing

about this size in Cache is that there is no wasted space. Exchange is still in

a 1:1 ratio of blocks. However, SQL now has to split into two separate Cache

Pages, and Oracle splits into four separate Cache Pages. The good thing

about this size is “No Wasted Space.” The down side to this is now we have

to listen to the Oracle and SQL admins complain about performance.

So, we set the Page Size to 16 KB to appease the Oracle and SQL admins.

Here comes the problem again of wasted space in cache, which, depending

on your Clariion, you don’t have a lot of. With the 16 KB Page Size, all of the

applications write to one Cache Page. The applications are happy because of

this, but we are back to the wasted space. For every Exchange block written

to the Clariion, there is a waste of 12 KB Cache space. For every SQL Block,

there is a waste of 8 KB Cache Space.

If you are only using one of these applications on the Clariion, great, match

the Cache Page Size to that application. If that is not the case, you as the

Storage Administrator, will have to decide the Winners and Losers. Next to

each of the different page sizes, we have listed the Winners, and the Losers.

In the 8 KB Page Size, SQL and Exchange are winners because from the

application point of view, they are a 1:1 ratio. Oracle is a Loser because it is

split across two separate blocks in Cache. Another loser in this setting is the

Clariion Cache because of the wasted space.

In the 4 KB Page Size, Exchange and Cache are winners because Exchange

is again a ratio of 1:1, and no wasted space in Cache. Oracle and SQL are

losers because they are written to separate Pages in Cache.

With the 16 KB Page Size, the applications all win. Oracle, SQL and

Exchange are all a 1:1 ratio. The big loser in this setting is Cache. Cache is a

loser with all of the wasted space.

This, again is one of the places to look at for performance of Cache in a

Clariion. Knowing your environment plays a big piece in how things are

written to Cache.

Cache Allocation

Cache Allocation

In the illustration above, we are seeing again that if data is written to one

Storage Processor, it is MIRRORed to the other Storage Processor.

A host that writes data to SP A, will mirror to SP B, and vice versa. So, you

will be losing some Cache space to this mirroring. In this example, we are

setting SP A’s Write Cache to 1 ½ GB. Which means that over on SP B, 1 GB

http://bp0.blogger.com/_XRjGvLrhUF0/R5-Yj90cUhI/AAAAAAAAACk/hHf6uPQBtWQ/s1600-h/Cache+Allocation.jpg

of Cache space will be taken for the Mirroring of SP A’s Write Cache. The

same scenario is set for SP B. The same values are transferred across SPs for

Write Cache.

SP Usage

SP Usage is pre-allocated Cache Space that is used by the Clariion for things

like pointers/deltas, SnapView, MirrorView. The amount of space that is lost

per Storage Processor for SP Usage depends on a couple of things. First, is

the type of Clariion you have. Second, what Flare Code you are running on

the Clariion. We’ll talk later where to find the Flare Code your Clariion is

running.

In this example, we are using 750 MB per Storage Processor as the vaule for

SP Usage. To give you some real numbers:

Type of Clariion Flare Code SP Usage:

CX3-80 26 1464 MB

CX3-80 24 1464 MB

CX700 26 884 MB

CX700 24 832 MB

After Write Cache is allocated and SP Usage is taken into account, this leaves

us with 250 MB of Cache for Reads.

The nice thing about the Clariion though is that it allows you to change those

cache values. Let’s say for instance, that this initial setup above works for

you in the mornings when people are writing to a database, but later in the

day, the database has more reads. You can take from Write Cache and give

the rest to Read Cache. The other nice thing about it is that it can be scripted

from the Command Line Interface. Below the chart are the three commands

that you can use to change cache.

Command One

Before we can change the values of Cache, we must first disable Cache. This

command is the command to disable Write Cache, Read Cache of SP A and

SP B. Not only does this disable Cache, it also forces a Flush of Cache to disk.

This means that the command prompt will not return immediately. There will

be a delay in the command prompt returning until Cache is flushed. As I

always say, I cannot give you an amount of time that this will take (two

weeks). The answer is going to be….”it depends, you’ll have to test it.”

Command Two

This is the actual setting of Cache command. By default, the setting of Cache

is allocated in MegaBytes. By setting Write Cache to 2048 MB (2 GB), we are

telling the Clariion to take that number, and divide half of it for SP A Write

Cache, and half for SP B Write Cache. We don’t calculate into this the

Mirroring of Write Cache, just the actual usable space. Next, we specify the

amount for the Read Cache Size of SP A of 1250 MB (1.25 GB) and the Read

Cache Size of SP B of 1250 MB (1.25 GB). Read Caching is not Mirrored, so

we must specify both SPs Read Cache. Notice how by simply taking ½ GB

away from SP A and SP B Write Cache, we can allocate 1 GB more of Cache

space to the SPs for Reads.

Command Three

Finally, we have to re-enable Cache. The ones (1) next to –wc, -rca, and –rcb

stand for Enabling.

Changing the values of Cache could be done at any time, all day long if you

want to, though I wouldn’t recommend it. But, it could prove to be extremely

beneficial to performance of the Clariion. Acknowledgements from Writes,

and Reading from Cache is going to happen in Nanoseconds as opposed to

milliseconds coming from disk.

Another example of why to change Cache could be when Backups are going

to occur. Since you will be reading data from Clariion Luns, you could

allocate as much Cache to Reads as possible so that the Backup Host could

be retrieving data from Cache rather than disk. When the Backups are

complete, you could script that the Cache values go back to Production

Levels.

Caching

From the chart above, the amount of Cache that a Clariion contains is based

on the model.

Read Caching

First, we will describe the process of when a host issues a request for data

from the Clariion.

1.The host issues the request for data to the Storage Processor that owns the

LUN. If that data is sitting in Cache on the Storage Processor,

2.The SP sends the data back to the host.

If however, the data is not in Cache, the Storage Processor must go to disk

now to retrieve the data. (Step 1 ½ ). It reads the data from the LUN into

Read Cache of the owning Storage Processor. (Step 1 ¾ ) before it sends the

data to the host.

Write Caching

http://bp1.blogger.com/_XRjGvLrhUF0/R5-YXN0cUgI/AAAAAAAAACc/4_DE-h_2SHU/s1600-h/Caching.jpg

1.The host writes a block of data to the LUN’s owning Storage Processor.

2.The Storage Processor MIRRORs that data to the other Storage Processor.

3.The owning Storage Processor then sends the Acknowledgement back to

the host, that the data is “on disk.”

4.At a later time, the data will be “flushed” from Cache on the SP out to the

LUN.

Why does Write Cache MIRROR the data to the other Storage Processor

before it sends the acknowledgement back to the host?

This is done to ensure that both Storage Processors have the data in Cache

in the event of an SP failure. Let’s say that the owning Storage Processor

crashed (again, never happens). If that data was not written to the other

Storage Processor’s Cache, that data would be lost. But, because it was

written to the other SP Cache, that Storage Processor can now write that

data out to the LUN.

This MIRRORing of Write Cache is done through the CMI (Clariion Messaging

Interface) Channel which lives on the Clariion.

Zoning

http://bp3.blogger.com/_XRjGvLrhUF0/R5-Xdt0cUfI/AAAAAAAAACU/pgDAiPkJoWM/s1600-h/Zoning.jpg

On this page, we are going to discuss how a Host might be zoned through

switches to a Clariion. This host has two(2) Host Bus Adapters. From the

previous page, we know that the host must have at least one connection to

SP A and one connection to SP B. What we are illustrating here is from the

“Host to Clariion Configuration” page, Configuration Three. We are also going

to look at what is meant by “Single Inititiator Zoning”. Single Initiator Zoning

means that you create a zone with one HBA entry. We don’t want to have a

zone that would contain an HBAs from two(2) Hosts.

HBA1 is connected to Port 0 on the switch. SP A port 0 is connected to the

same switch at Port 14. Based on the World Wide Names of HBA1 and SP A

port 0, we can now create a zone through the switch software. The zone

could look as follows:

Zone HBA1 to SP A port 0

10:00:00:00:07:36:55:86

50:06:01:60:10:60:08:74

We also want to connect HBA1 to SP B. We connect SP B port 0 to Port 15 on

the same switch. That zone could look as follows:

Zone HBA1 to SP B port 0

10:00:00:00:07:36:55:86

50:06:01:68:10:60:08:74

HBA1 is now zoned and connected to both Storage Processors on the

Clariion.

We would repeat the same steps for HBA2 and the switch that it is connected

to. HBA2 is connected to Port 0 on the switch. SP A port 1 is connected to the

same switch at Port 14. Based on the World Wide Names of HBA1 and SP A

port 1, we can now create a zone through the switch software. The zone

could look as follows:

Zone HBA2 to SP A port 1

10:00:00:00:66:87:35:20

50:06:01:61:10:60:08:74

We also want to connect HBA2 to SP B. We connect SP B port 1 to Port 15 on

the same switch. That zone could look as follows:

Zone HBA2 to SP B port 1

10:00:00:00:66:87:35:20

50:06:01:69:10:60:08:74

Another way in which the zoning could have been done is:

Zone HBA1 to SP A port 0 and SP B port 0

10:00:00:00:07:36:55:86

50:06:01:60:10:60:08:74

50:06:01:68:10:60:08:74

Again, there is only one HBA in that zone. The preferred method is simply up

to you and how you want to manage the switches. The advantage of doing it

this way is that it cuts the number of zones on the switch in half, but could

be a little confusing (which could be nice for job security).

Now, what do we do if there is an HBA failure? First of all, that never

happens. (Kidding) This is where we go to the four(4) steps listed under HBA

Failure. The three R’s and a D. Let’s say that HBA1 were do fail. The first

thing we would do is to replace that failed HBA. Next, because we did our

zoning on the switch based on the World Wide Names of the HBAs, we would

have to rezone the switch for the new HBA because it would have a new

World Wide Name. The third step is to go to Navisphere, and using

Connectivity Status, Register the new HBA with the Clariion. And finally, the

Clariion does not automatically clean itself up. You would have to again, in

Connectivity Status, Deregister the failed HBA.

Storage Processor Ports WWNs

Each Storage Processor Port will have a unique World Wide Name associated

with it. What we are doing on this page is to “break down” what makes up

the SP Port WWN. What I am showing here are the three(3) pieces that make

up the WWN. The three(3) pieces are what I am calling the ‘EMC Flag’, the SP

Port Identifier, and the Array ID. All SP Port WWNs on Clariions start with the

same ‘EMC Flag’ of 50:06:01. When you are looking at the Switch Software

that shows the ports on the switch and what is plugged into those ports,

anytime you see a World Wide Name that starts with the 50:06:01, you will

know that a Clariion SP Port is connected there.

The next “piece” to the World Wide Name, is the SP Port Identifier. On all

Clariions, these numbers are the same as well. For instance, if you have 3

Clariions in your environment, every one of those Clariion’s SPA Port 0 World

Wide Name would start off 50:06:01:60. And every Clariion’s SP B Port 1

would start off 50:06:01:69. These SP Port Identifiers will not change from

Clariion to Clariion.

The last “piece” to the puzzle is the Array ID. This is related to the Unique ID

of the Clariion itself. Every Clariion has a unique World Wide Name

associated with it. But, that Array ID belongs to every port on that Clariion as

it shows above. Now, if you have two(2) Clariions in your environment, you

will see two(2) sets of Array IDs. Let’s say you have a Production Clariion and

a Development Clariion (I know, no one has that), the Production Clariion

http://bp2.blogger.com/_XRjGvLrhUF0/R5-XPd0cUeI/AAAAAAAAACM/yudxwK3FaPY/s1600-h/Storage+Processor+Port+WWNs.jpg

could have an Array ID of 10:60:08:74, and the Development Clariion could

have an Array ID of 10:60:06:23. So, the Production Clariion’s SP A Port 0

would be 50:06:01:60:10:60:08:74, and the Development Clariion’s SP A Port

0 would be 50:06:01:60:10:60:06:23.

Host Connectivity Limitations

This page is going to discuss how many hosts can connect to a Clariion. The

deciding factor in this is going to be the number of times you connect your

host(s) to the Clariion. We are going to use the three configurations that

were discussed in the prevoius blog. The chart above lists the number of

ports each Storage Processor contains based on the model, as well as the

number of Initiator Registration Records each port supports. An Initiator

Registration Record (IRR) is used everytime a host, via an HBA, is connected

and "Registered" with the Clariion. The Clariion now recognizes that this HBA

belongs to a specific host attached to the Clariion, and will now allow the

host to "talk" with the Clariion. The more times you connect and register a

host, the more IRRs it uses, thus taking away potential connections for other

or more hosts.

With Configuration One, even though it only has one HBA, that HBA must be

connected at least once to SP A and once to SP B. Again, this goes back to

the previous blog about access to the Clariion if a LUN were to trespass.

Therefore, this host is using two IRRs.

http://bp2.blogger.com/_XRjGvLrhUF0/R5-U4d0cUdI/AAAAAAAAAB4/kg7MpwgYA-0/s1600-h/Host+Connectivity+Limitations.jpg

With Configuration Two, this host has one connection from each HBA to one

SP Port on each Storage Processor. Even though this host has two HBAs, it is

still only using two IRRs. One connection to SPA, one connection to SP B.

With Configuration Three, this host has two connections to the Clariion from

each HBA. HBA1 is connected once to SPA and once to SP B. HBA2 is

connected once to SP A and once to SP B. This host is using four IRRs

because it is connected four times to the Clariion.

In the chart, we are trying to illustrate the maximum number of hosts that

can connect to a Clariion based on the host configurations. Again, the more

times you connect a host, the more IRRs you use, the less the number of

hosts that can be attached to a Clariion. If you are using a CX700, CX3-40 or

CX3-80, you have the possibility of hooking up 256 hosts based on each host

only having one connection to SP A and one connection to SP B. However, if

every host were connected four(4) times, as in Configuration three, that

number is cut in half to 128 hosts. If every host were connected to the

Clariion eight(8) times, the number is cut again to 64 hosts.

Host to Clariion Configurations

http://bp3.blogger.com/_XRjGvLrhUF0/R5-Umt0cUcI/AAAAAAAAABw/7vsKuR9fwJg/s1600-h/Host+to+Clariion+Configurations.jpg

Here we are looking at only three possible ways in which a host can be

attached to a Clariion. From talking with customers in class, these seem to

be the three most common ways in which the hosts are attached.

The key points to the slide are:

1. The LUN, the disk space that is created on the Clariion, that will eventually

be assigned to the host, is owned by one of the Storage Processors, not both.

2. The host needs to be physically connected via fibre, either directly

attached, or through a switch.

CONFIGURATION ONE

In Configuration One, we see a host that has a single Host Bus Adapter

(HBA), attached to a single switch. From the Switch, the cables run once to

SP A, and once to SP B. The reason this host is zoned and cabled to both SPs

is in the event of a LUN trespass. In Configuration One, if SP A would go

down, reboot, etc...the LUN would trespass to SP B. Because the host is

cabled and zoned to SP B, the host would still have access to the LUN via SP

B. The problem with this configuration is the list of Single Point(s) of Failure.

In the event that you would lose the HBA, the Switch, or a connection

between the HBA and the Switch (the fibre, GBIC on the switch, etc...), you

lose access to the Clariion, thereby losing access to your LUNs.

CONFIGURATION TWO

In Configuration Two, we have a host with two Host Bus Adapters. HBA1 is

attached to a switch, and from there, the host is zoned and cabled to SP B.

HBA2 is attached to a separate switch, and from there , the host is zoned

and cabled to SP A. The path from HBA2 to SP A, is shown as the "Active

Path" because that is the path data will leave the host from to get to the

LUN, as it is owned by SP A. The path from HBA1 to SP B, is shown as the

"Standby Path" because the LUN doesn't belong to SP B. The only time that

the host would use the "Standby Path" is in the event of a LUN Trespass. The

advantage of using Configuration Two over Configuration One, is that there is

no single point of failure.

Now, let's say we install PowerPath on the host. With PowerPath, the host

has the potential to do two things. First, it allows the host to initiate the

Trespass of the LUN. With PowerPath on the host, if there is a path failure

(HBA gone bad, switch down, etc...), the host will issue the trespass

command to the SPs, and the SPs will move the LUN, temporarily, from SP A

to SP B. The second advantage of PowerPath on a host, is that it allows the

host to 'Load Balance' data from the host. Again, this has nothing to do with

load balancing the Clariion SPs. We will get there later. However, in

Configuration Two, we only have one connection from the host to SP A. This

is the only path the host has and will use to move data for this LUN.

CONFIGURATION THREE

In Configuration Three, hardware wise, we have the same as Configuration

Two. However, notice that we have a few more cables running from the

switches to the Storage Processors. HBA1 is into the switch and zoned and

cabled to SP A and SP B. HBA2 is into the switch and zoned and cabled to SP

A and SP B. What this does now is to give HBA1 and HBA2 an 'Active Path' to

SP A, and HBA1 and HBA2, 'Standby Paths' to SP B. Because of this, the Host

now can route data down each active path to the Clariion, allowing the host

"Load Balancing" capabilities. Also, the only time a LUN should trespass from

one SP to another is if there is a Storage Processor failure. If the host were to

lose HBA1, it still has HBA2 with an active path to the Clariion. The same

goes for a switch failure and connection failure.

General Commands for Navisphere CLI Physical Container-Front End Ports Speeds

naviseccli –h 10.124.23.128 port –list -sfpstate

naviseccli –h 10.124.23.128 –set sp a –portid 0 2

naviseccli –h 10.124.23.128 backendbus –get –speeds 0

SP Reboot and Shutdown GUI

naviseccli –h 10.124.23.128 rebootsp

naviseccli –h 10.124.23.128 resetandhold

Disk Summary

naviseccli –h 10.124.23.128 getdisk

naviseccli –h 10.124.23.128 getdisk 0_0_9 (Bus_Enclosure_Disk - specific

disk)

Storage System Properties- Cache Tab

naviseccli –h 10.124.23.128 getcache

naviseccli –h 10.124.23.128 setcache –wc 0 –rca 0 –rcb 0 (to disable Write

and Read Cache)

naviseccli –h 10.124.23.128 setcache –p 4 –l 50 –h 70 (Set Page Size to 4 KB,

Low WaterMark to 50%, and High WaterMark to 70%)

naviseccli –h 10.124.23.128 setcache –wc 1 –rca 1 –rcb 1 (to enable Write

and Read Cache)

Storage System Properties- Memory Tab

naviseccli –h 10.124.23.128 setcache –wsz 2500 –rsza 100 –rszb 100

naviseccli –h 10.124.23.128 setcache –wsz 3072 –rsza 3656 –rszb 3656

(maximum amount of cache for CX3-80)

Creating a RAID Group

naviseccli –h 10.124.23.128 createrg 0 1_0_0 1_0_ 1 1_0_2 1_0_3 1_0_4 –rm

no –pri med (same Enclosure)

-rm (remove/destroy Raid Group after the last LUN is unbound for the Raid

Group) -pri (priority/rate of expansion/defragmentation of the Raid Group)

naviseccli –h 10.124.23.128 createrg 1 2_0_0 3_0_0 2_0_1 3_0_1 2_0_2 3_0_2

-raidtype r1_0 (for RAID 1_0 across enclosures)

RAID Group Properties - General

naviseccli –h 10.124.23.128 getrg 0

RAID Group Properties - Disks

naviseccli –h 10.124.23.128 getrg 0 –disks

Binding a LUN

naviseccli –h 10.124.23.128 bind r5 0 –rg 0 –rc 1 –wc 1 –sp a –sq gb –cap 10

bind raid type (r0, r1, r1_0, r3, r5, r6) -rg (raid group) -rc / -wc (read and

write cache) -sp (storage processor) -sq (size qualifier - mb, gb, tb, bc (block

count) -cap (size of the LUN)

LUN Properties

naviseccli –h 10.124.23.128 getlun 0

naviseccli –h 10.124.23.128 chglun –l 0 –name Exchange_Log_Lun_0

RAID Group Properties - Partitions

naviseccli –h 10.124.23.128 getrg 0 –lunlist

Destroying a RAID Group

naviseccli –h 10.124.23.128 removerg 0

Creating a Storage Group

navicli –h 10.127.24.128 storagegroup –create –gname ProductionHost

Storage Group Properties - LUNs with Host ID

navicli –h 10.127.24.128 storagegroup –addhlu –gname ProductionHost –alu

6 –hlu 6

navicli –h 10.127.24.128 storagegroup –addhlu –gname ProductionHost –alu

23 –hlu 23

Storage Group Properties - Hosts

navicli –h 10.127.24.128 storagegroup –connecthost –host ProductionHost –

gname ProductionHost

Destroying Storage Groups

navicli –h 10.127.24.128 storagegroup –destroy –gname ProductionHost

Verifying RAID Group Disk Order

Verifying RAID Group Disk Order

The examples above are from an output of running the get Raid Group

command from Navisphere Command Line Interface.

Both RAID Groups are configured as Raid type 1_0.

In an earlier blog we discussed the importance of configuring RAID 1_0 by

separating the Data disks and Mirrored Disks across multiple buses and

enclosures on the back of the Clariion. This diagram is to show how you

could verify if a RAID 1_0 Group is configured correctly or incorrectly.

The reason we are showing the output of the RAID Groups from the

command line is this is the only place to truly see if the RAID Groups were

configured properly.

The GUI will show the disks as the Clariion sees them in the order of the Bus

and Enclosure, not the order you have placed the disks in the RAID Group.

LUN Layouts

http://2.bp.blogspot.com/_XRjGvLrhUF0/SO4xu2VWzqI/AAAAAAAAAHE/I6Nc4qI2D6U/s1600-h/Verifying+RAID+Group+Disk+Order.jpg

LUN Layout

This diagram shows three different ways in which the same 6 LUNs could be

laid out on a RAID Group

In example number 1, the two heavily utilized LUNs have been placed at the

beginning and end of the LUNs in the RAID Group, meaning they were the

first and last LUNs created on the RAID Group, with lightly utilized LUNs

between them. Why this could be a disadvantage to the LUNs, RAID Group,

Disks, is that Example 1 would see a much higher rate of Seek Distances at

the Disk Level. With a higher Seek Distance rate, comes greater latency, and

longer response times for the data. The head has to travel, on average a

greater distance between the two busiest LUNs across the disks.

Example 2 has the two heavily utilized LUNs adjacent to each other at the

beginning of the RAID Group. While this is the best case scenario for the two

busiest LUNs, it could also result in high Seek Distances at the Disk Level

because the head would be traveling between the busiest LUNs and then

seeking a great distance on the disk when access is needed to the less

needed LUNs.

Example 3 shows the heavily utilized LUNs placed in the center of the RAID

Group. The advantage to this configuration is the head of the disk would

http://3.bp.blogspot.com/_XRjGvLrhUF0/SO4xRa7JV6I/AAAAAAAAAG8/9VuzbAWPOYc/s1600-h/LUN+Layout.jpg

remain between the two busiest LUNs, and then would have a much shorter

seek distance to the less utilized LUNs on the outer and inner edge of disks.

The problem with these types of configurations, is that for the most part, it is

too late to configure the LUNs in such a way. However, with the use of LUN

Migrations in Navisphere, and enough unallocated Disk Space, this could be

accomplished while having the LUNs online to the hosts. You will however

see an impact on the performance of these LUNs during this Migration

process.

But, if performance is an objective, it could be worth it in the long run to

make the changes. When LUNs and RAID Groups are initially configured, we

usually don’t know what type of Throughput to expect. After monitoring and

using Navisphere Analyzer, we could at a later time, begin to move LUNs

with heavier needs off of the same Raid Groups, and onto Raid Groups with

LUNs not so heavily accessed.

Stripe Size of a LUN

Calculating the Stripe Size of a LUN

To calculate the size of a stripe of data that the Clariion writes to a LUN, we

must know how many disks make up the Raid Group, as well as the Raid

http://bp2.blogger.com/_XRjGvLrhUF0/R-AHGMAc-1I/AAAAAAAAAE4/d695x4eMiTM/s1600-h/Calculating+Stripe+Size+of+a+LUN.jpg

Type, and how big a chunk of data is written out to a disk. In the illustration

above, we have two examples of Stripe Size of a LUN.

The top example shows a Raid 5, five disk Raid Group. We usually hear this

referred to as 4 + 1. That means that of the five disks that make up the Raid

Group, four of the disks are used to store the data, and the remaining disk is

used to store the parity information for the stripe of data in the event of a

disk failure and rebuild. Let’s base this on the Clariion settings of a disk

format in which it formats the disk into 128 blocks for the Element Size

(amount of blocks written to a disk before writing/striping to the next disk in

the Raid Group), which is equal to the 64 KB Chunk Size of data that is

written to a disk before writing/striping to the next disk in the Raid Group.

(see blog titled DISK FORMAT)

To determine the Data Stripe Size, we simply calculate the number of disks

in the Raid Group for Data (4) x the amount of data written per disk (64 KB),

and get the amount of data written in a Raid 5, Five disk Raid Group (4 + 1)

as 256 KB of data. To get the Element Stripe Size, we calculate the number

of disks in the Raid Group (4) x the number of blocks written per disk (128

blocks) and get the Element Stripe Size of 512 blocks.

The bottom example illustrates another Raid 5 group, however the number

of disks in the Raid Group is nine (9). This is often referred to as 8 + 1. Again,

eight (8) disks for data, and the remaining disk is used to store the parity

information for the stripe of data.

To determine the Data Stripe Size, we simply calculate the number of disks

in the Raid Group for Data (8) x the amount of data written per disk (64 KB),

and get the amount of data written in a Raid 5, Five disk Raid Group (8 + 1)

as 512 KB of data. To get the Element Stripe Size, we calculate the number

of disks in the Raid Group (8) x the number of blocks written per disk (128

blocks) and get the Element Stripe Size of 1024 blocks.

The confusion usually comes across in the terminology. The Stripe Size again

is the amount of data written to a stripe of the Raid Group, and the Element

Stripe Size is the number of blocks written to a stripe of a Raid Group.

Setting the Alignment Offset on ESX Server and a (Virtual) Windows Server

Setting the Alignment Offset on ESX Server and a (Virtual) Windows

Server

To add to the layer of confusion, we must discuss what needs to be done

when assigning a LUN to an ESX Server, and then creating the (virtual) disk

that will be assigned to the (Virtual) Windows Server.

As stated in the previous blog titled Disk Alignment, we must align the data

on the disks before any data is written to the LUN itself. We align the LUN on

the ESX Server because of the way in which a Clariion Formats the Disks in

the 128 blocks per disk (64 KB Chunk) and the metadata written to the LUN

from the ESX Server. Although, it is my understanding that ESX Server v.3.5

takes care of the initial offset setting of 128.

The following are the steps to align a LUN for Linux/ESX Server:

Execute the following steps to align VMFS

1. On service console, execute “fdisk /dev/sd”, where sd is the device on

which you would like to create the VMFS

2. Type “n” to create a new partition

3. Type “p” to create a primary partition

http://bp0.blogger.com/_XRjGvLrhUF0/R97BIMAc-0I/AAAAAAAAAEw/Blreq57L0MA/s1600-h/Setting+Alignment+Offset+on+ESX+Server+and+Virtual+Windows.jpg

4. Type “1” to create partition #1

5. Select the defaults to use the complete disk

6. Type “x” to get into expert mode

7. Type “b” to specify the starting block for partitions

8. Type “1” to select partition #1

9. Type “128” to make partition #1 to align on 64KB boundary

10.Type “r” to return to main menu

11.Type “t” to change partition type

12. Type “1” to select partition 1

13. Type “fb” to set type to fb (VMFS volume)

14. Type “w” to write label and the partition information to disk

Now, that the ESX Server has aligned it’s disk, when the cache on the

Clariion starts writing data to the disk, it will start writing data to the first

block on the second disk, or block number 128. And, because the Clariion

formats the disks in 64 KB Chunks, it will write one Chunk of data to a disk.

If we create a (Virtual) Windows Server on the ESX Server, we must take into

account that when Windows is assigned a LUN, it will also want to write a

signature to the disk. We know that it is a Virtual Machine, but Windows

doesn’t know that. It believes it is a real server. So, when Windows grabs the

LUN, it will write it’s signature to the disk. See blog titled DISK ALIGNMENT.

Again, the problem is that the Windows Signature will take up 63 blocks.

Starting at the first block (Block # 128) on the second disk in the RAID

Group, the Signature will write halfway across the second disk in the raid

group. When Cache begins to write the data out to disk, it will write to the

next available block, which is the 64th block on the second disk. In the top

illustration, we can see that a 64 KB Data Chunk that is written out to disk as

one operation will now span two disks, a Disk Cross. And from here on out for

that LUN, we will see a Disk Cross because there was no offset set on the

(Virtual) Windows Server.

In the bottom example, we see how the offset was set for the ESX Server,

the offset was also set on the (Virtual) Windows Server, and now Cache will

write out to a single disk in 64 KB Data Chunks, therefore limiting the

number of Disk Crosses.

Again, from the (Virtual) Windows Server we can set the offset for the LUNs

using either Diskpart or Diskpar.

To set the alignment using Diskpart, see the earlier Blog titled Setting the

Alignment Offset for 2003 Windows Servers(sp1).

To set the alignment using Diskpar:

C:\ diskpar –s 1

Set partition can only be done on a raw drive.

You can use Disk Manager to delete all existing partitions

Are you sure drive 1 is a raw device without any partition? (Y/N) y

----Drive 1 Geometry Information ----

Cylinders = 1174

TracksPerCylinder = 255

SectorsPerTrack = 63

BytesPerSector = 512

DiskSize = 9656478720 (Bytes) = 9209 (MB)

We are going to set the new disk partition.

All data on this drive will be lost. Continue (Y/N) ? Y

Please specify the starting offset (in sectors) : 128

Please specify the partition length (in MB) (Max = 9209) : 5120

Done setting partition

---- New Partition information ----

StatringOffset = 65536

PartitionLength = 5368709120

HiddenSectors = 128

PartitionNumber = 1

PartitionType = 7

As it shows in the bottom illustration from above, the ESX server has set an

offset, the (Virtual) Windows Machine has written it’s signature, and has set

the offset to start writing data to the first block on the third disk in the Raid

Group.

Setting Raid Group Command Parameters


Setting Cache Command Parameters


Tuesday, February 12, 2008

Setting the Alignment Offset

http://bp1.blogger.com/_XRjGvLrhUF0/R7SmVZuG9uI/AAAAAAAAAEo/h1W3cF9H4P4/s1600-h/Setting+Raid+Group+Command+Parameters.jpg


http://bp0.blogger.com/_XRjGvLrhUF0/R7SkNJuG9sI/AAAAAAAAAEU/Zd3lbhjhlWM/s1600-h/Setting+Cache+Command+Parameters.jpg


Posted by san guy at 11:13 AM 13 comments

Disk Alignment

Disk Alignment

This is one of the most crucial pieces that we can talk about so far regarding

performance. Having the disks that make up the LUN misaligned can be a

performance impact of up to 30% on an application. The reason this occurs is

because of the “Signature” or MetaData information that a host writes to the

beginning of a LUN/Disk. To understand this we must first look at how the

Clariion formats the LUNs.

http://bp0.blogger.com/_XRjGvLrhUF0/R7HwH5uG9rI/AAAAAAAAAEM/cvnKfCDPGS4/s1600-h/Setting+the+Alignment+Offset.jpg


http://bp1.blogger.com/_XRjGvLrhUF0/R7HvrJuG9qI/AAAAAAAAAEE/wYZjHGRPcKI/s1600-h/Disk+Alignment.jpg

In an earlier blog, we described how the Clariion formats the disks. The

Clariion formats the disks in blocks of 128 per disk, which is equivalent to a

64 KB of data that is written to a disk from Cache. Why this is a problem, is

that when an Operating System like Windows, grabs the LUN, it wants to

initialize the disk, or write a disk signature. The size of this disk signature is

63 blocks, or 31 ½ KB of disk space. Because the Clariion formats the disks

in 128 blocks, or 64 KB of disk space, that leaves 65 blocks, or 32 KB of disk

space remaining on the first disk for the host to write data. The problem is

that the host writes to Cache in whatever block size it does. Cache then

holds the data a writes the data out to disk in a 64 KB Data Chunk. Because

of the “Signature”, the 64 KB Data Chunk now has to go across two physical

disks on the Clariion. Usually, we say that hitting more disks is better for

performance. However, with this DISK CROSS, performance will go down on a

LUN because Cache is now waiting for an acknowledgement from two disks

instead of one disk. If one disk is overloaded with I/O, a disk is failing, etc…

this will cause a delay in the acknowledgement back to the Storage

Processor. This will be the case from now on when Cache writes every chunk

of data out to this LUN. This will impact not only the LUN Cache is writing too,

but to every LUN on the Raid Group may be affected.

By using an offset on a LUN from a Host Based Utility, ie Diskpart, or Diskpar

for Windows, we are allowing the Clariion to write a 64 KB Data Chunk to one

physical disk at a time. Essentially, what we are doing is giving up the

remaining disk space on the first physical disk in the Raid Group as the

illustration shows above. Window’s still writes it’s “Signature” to the first 63

blocks, but we use Diskpart, or Diskpar to offset the disk space the Clariion

Writes to of the remaining space on the first disk. When Cache writes out to

disk now, it will begin writing out to the first block on the second disk in the

Raid Group, thereby giving the full 128 blocks/64 KB chunk that the Clariion

hopes to write out to one physical disk.

The problem with all of this is that this offset or alignment needs to set on a

Window’s Disk/LUN before any data is written to the LUN. Once there is data

on the LUN, this cannot be done without destroying the existing LUN/data.

The only way to now fix this problem is to create a new LUN on the Clariion,

assign it to the host, set the offset/alignment, and do a host-based

copy/migration. Again, a Clariion LUN Migration is a block for block LUN

copy/move. All you are doing with a LUN Migration is moving the problem to

a new location on the Clariion.

Windows has two utilities from the Command prompt that can be run to set

the offset/alignment, Diskpar and Diskpart.

Diskpar is used for Window’s systems running Windows 2000, or 2003, not

using at least Service Pack 1. Diskpar can be downloaded as part of the

Resource Kit, and requires through its command line interface that the offset

be equal to 128. Diskpar sets the offset in blocks, Since the Clariion formats

the disks in 128 blocks, the Clariion will now offset writing to the LUN to

block number 128, which is the first block on the second disk.

Diskpart is for Windows Systems running Windows 2003, service pack 1 and

up. Diskpart sets the alignment in KiloBytes. Since the Clariion formats the

disk in 64 KB, the Clariion will now align the writing to the LUN in 64 KB

Chunks, or the first full 64 KB chunk, which is the second physical disk in the

Raid Group.

This is also an issue with Linux servers, as an offset will need to set as well.

Here again, the number to use is 128, because fdisk uses the number of

blocks, not KiloBytes.

The following blog entry will list the steps for setting the offset for Windows

2003, as well as Linux servers.


LUN Migration


LUN Migration

The process of a LUN Migration has been available in Navisphere as of Flare

Code or Release 16. The LUN Migration is a move of a LUN within a Clariion

from one location to another location. It is a two step process. First it is a

block by block copy of a “Source LUN” to its new location “Destination LUN”.

After the copy is complete, it then moves the “Source” LUNs location to its

new place in the Clariion.

The Process of the Migration.

Again, this type of LUN Migration is an internal move of a LUN, not like a

SANCopy where a Data Migration occurs between a Clariion and another

storage device. In the illustration above, we are showing that we are moving

Exchange off of the Vault drives onto Raid Group 10 on another Enclosure in

the Clariion. We will first discuss the process of the Migration, and then the

Rules of the Migration.

1. Create a Destination LUN. This is going to be the Source LUN’s new

location in the Clariion on the disks. The Destination LUN is a LUN which can

be on a different Raid Group, on a different BUS, on a different Enclosure.

The reason for a LUN Migration might be an instance where we may want to

offload a LUN from a busy Raid Group for performance issues. Or, we want to

move a LUN from Fibre Drives to ATA Drives. This we will discuss in the

RULES portion.

http://bp2.blogger.com/_XRjGvLrhUF0/R7HumZuG9pI/AAAAAAAAAD8/LLfGy8B30u4/s1600-h/LUN+Migration.jpg

2. Start the Migration from the Source LUN. From the LUN in

Navisphere, we simply right-click and select Migrate. Navisphere gives us a

window that displays the current information about the Source LUN, and a

selection window of the Destination LUN. Once we select the Destination LUN

and click Apply, the migration begins. The migration process is actually a two

step process. It is a copy first, then a move. Once the migration begins, it is a

block for block copy from the Source LUN (Original Location) to the

Destination LUN (New Location). This is important to know because the

Source LUN does not have to be offline while this process is running. The

host will continue to read and write to the Source LUN, which will write to

Cache, then Cache writing out to the disk. Because it is a copy, any new

write to the source lun will also write to the destination lun. At any time

during this process, you may cancel the Migration if the wrong LUN was

selected, or to wait until a later time. A priority level is also available to

speed up or slow down the process.

3. Migration Completes. When the migration completes, the Source LUN

will then MOVE to it’s new location in the Clariion. Again, there is nothing

that needs to be done from the host, as it is still the same LUN as it was to

begin with, just in a new space on the Clariion. The host doesn’t even know

that the LUN is on a Clariion. It thinks the LUN is a local disk. The Destination

LUN ID that you give a LUN when creating, will disappear. To the Clariion,

that LUN never existed. The Source LUN will occupy the space of the

Destination LUN, taking with it the same LUN ID, SP Ownership, and host

connectivity. The only things that may or may not change based on your

selection of the Destination might be the Raid type, Raid Group, size of the

LUN, or Drive Type. The original space that the Source LUN once occupied is

going to show as FREE Space in Navisphere on the Clariion. If you were to

look at the Raid Group where the Source LUN used to live, under the

Partitions tab, you will see the space the original LUN occupied as a Free.

The Source LUN is still in the same Storage Group, assigned to the Host as it

was before.

Migration Rules

The rules of a Migration as illustrated above are as follows.

The Destination LUN can be:

1. Equal to in size or larger. You can migrate a LUN to a LUN that is the

exact same block count size, or to a LUN that is larger in size, so long as the

host has the ability to see the additional space once the migration has

completed. Windows would need a rescan, reboot of the disks to see the

additional space, then using Diskpart, extend the Volume on the host. A host

that doesn’t haven’t the ability to extend a volume, would need a Volume

Manager software to grow a filesystem, etc.

2. The same or a different drive type. A destination LUN can be on the

same type of drives as the source, or a different type of drive. For instance,

you can migrate a LUN from Fibre Drives to ATA Drives when the Source LUN

no longer needs the faster type drives. This is a LUN to LUN copy/move, so

again, disk types will not stop a migration from happening, although it may

slow the process from completing.

3. The same or a different raid type. Again, because it is a LUN to LUN

copy, raid types don’t matter. You can move a LUN from Raid 1_0 to Raid 5

and reclaim some of the space on the Raid 1_0 disks. Or find that Raid 1_0

better suits your needs for performance and redundancy than Raid 5.

4. A Regular LUN or MetaLUN. The destination LUN only has to be equal in

size, so whether it is a regular LUN on a 5 disk Raid 5 group or a Striped

MetaLUN spread across multiple enclosures, buses, raid groups for

performance is completely up to you.

However, the Destination LUN cannot be:

1. Smaller in size. There is no way on a Clariion to shrink a LUN to allow a

user to reclaim space that is not being used.

2. A SnapView, MirrorView, or SanCopy LUN. Because these LUNs are

being used by the Clariion to replicate data for local recoveries, replicate

data to another Clariion for Disaster Recovery, or to move the data to/from

another storage device, they are not available as a Destination LUN.

3. In a Storage Group. If a LUN is in a Storage Group, it is believed to

belong to a Host. Therefore, the Clariion will not let you write over a LUN that

potentially belongs to another host.


MetaLUNs

MetaLUNs

The purpose of a MetaLUN is that a Clariion can grow the size of a LUN on

the ‘fly’. Let’s say that a host is running out of space on a LUN. From

Navisphere, we can “Expand” a LUN by adding more LUNs to the LUN that

the host has access to. To the host, we are not adding more LUNs. All the

host is going to see is that the LUN has grown in size. We will explain later

how to make space available to the host.

There are two types of MetaLUNs, Concatenated and Striped. Each has their

advantages and disadvantages, but the end result which ever you use, is

that you are growing, “expanding” a LUN.

A Concatenated MetaLUN is advantageous because it allows a LUN to be

“grown” quickly and the space made available to the host rather quickly as

well. The other advantage is that the Component LUNs that are added to the


http://bp0.blogger.com/_XRjGvLrhUF0/R7Ht-5uG9oI/AAAAAAAAAD0/ymcYzk4H6-w/s1600-h/MetaLUNs.jpg

LUN assigned to the Host can be of a different RAID type and of a different

size.

The host writes to Cache on the Storage Processor, the Storage Processor

then flushes out to the disk. With a Concatenated MetaLUN, the Clariion only

writes to one LUN at a time. The Clariion is going to write to LUN 6 first. Once

the Clariion fills LUN 6 with data, it then begins writing to the next LUN in the

MetaLUN, which is LUN 23. The Clariion will continue writing to LUN 23 until it

is full, then write to LUN 73. Because of this writing process, there is no

performance gain. The Clariion is still only writing to one LUN at a time.

A Striped MetaLUN is advantageous because if setup properly could

enhance performance as well as protection. Let’s look first at how the

MetaLUN is setup and written to, and how performance can be gained. With

the Striped MetaLUN, the Clariion writes to all LUNs that make up the

MetaLUN, not just one at a time. The advantage of this is more

spindles/disks. The Clariion will stripe the data across all of the LUNs in the

MetaLUN, and if the LUNs are on different Raid Groups, on different Buses,

this will allow the application to be striped across fifteen (15) disks, and in

the example above, three back-end buses of the Clariion. The workload of

the application is being spread out across the back-end of the Clariion,

thereby possibly increasing speed. As illustrated above, the first Data Stripe

(Data Stripe 1) that the Clariion writes out to disk will go across the five disks

on Raid Group 5 where LUN 6 lives. The next stripe of data (Data Stripe 2), is

striped across the five disks that make up RAID Group 10 where LUN23 lives.

And finally, the third stripe of data (Data Stripe 3) is striped across the five

disks that make up Raid Group 20 where LUN 73 lives. And then the Clariion

starts the process all over again with LUN6, then LUN 23, then LUN 73. This

gives the application 15 disks to be spread across, and three buses.

As for data protection, this would be similar to building a 15 disk raid group.

The problem with a 15 disk raid group is that if one disk where to fail, it

would take a considerable amount of time to rebuild the failed disk from the

other 14 disks. Also, if there were two disks to fail in this raid group, and it

was RAID 5, data would be lost. In the drawing above, each of the LUNs is on

a different RAID group. That would mean that we could lose a disk in RAID

Group 5, RAID Group 10, and RAID Group 20 at the same time, and still have

access to the data. The other advantage of this configuration is that the

rebuilds are occurring within each individual RAID Group. Rebuilding from

four disks is going to be much faster than the 14 disks in a fifteen disk RAID

Group.

The disadvantage of using a Striped MetaLUN is that it takes time to create.

When a component LUN is added to the MetaLUN, the Clariion must restripe

the data across the existing LUN(s) and the new LUN. This takes time and

resources of the Clariion. There may be a performance impact while a

Striped MetaLUN is re-striping the data. Also, the space is not available to

the host until the MetaLUN has completed re-striping the data.


Access Logix

Access Logix

Access Logix, often referred to as ‘LUN Masking’, is the Clariion term for:

1. Assigning LUNs to a particular Host

2. Making sure that hosts cannot see every LUN in the Clariion

Let’s talk about making sure that every host cannot see every LUN in the

Clariion first.


http://bp3.blogger.com/_XRjGvLrhUF0/R7GyJpuG9nI/AAAAAAAAADs/MILQjmmslxs/s1600-h/Access+Logix.jpg

Access Logix is an enabler on the Clariion that allows hosts to connect to the

Clariion, but not have the ability to just go out and take ownership of every

LUN. Think of this situation. You have ten Window’s Hosts attached to the

Clariion, five Solaris Hosts, eight HP Hosts, etc… If all of the hosts were

attached to the Clariion (zoning), and there was no such thing as Access

Logix, every host could potentially see every LUN after a rescan or 17

reboots by Window’s. Probably not a good thing to have more than one host

writing to a LUN at a time, let alone different Operating Systems writing to

the same LUNs.

Now, in order for a host to see a LUN, a few things must be done first in

Navisphere.

1. For a Host, a Storage Group must be created. In the illustration above, the

‘Storage Group’ is like a bucket.

2. We have to Connect the host to the Storage Group

3. Finally, we have to add the LUNs to the Host’s Storage Group we want the

host to see.

From the illustration above, let’s start with the Windows Host on the far left

side. We created a Storage Group for the Windows Host. You can name the

Storage Group whatever you want in Navisphere. It would make sense to

name the Storage Group the same as the Host name. Second, we connected

the host to the Storage Group. Finally, we added LUNs to the Storage Group.

Now, the host has the ability to see the LUNs, after a rescan, or a reboot.

However, in the Storage Group, when the LUNs are added to the Storage

Group, there is a column on the bottom right-side of the Storage Group

window that is labeled Host ID. You will notice that as the LUNs are placed

into the Storage Group, Navisphere gives each LUN a Host ID number. The

host ID number starts at 0, and continues to 255. We can place up to 256

LUNs into a Storage Group. The reason for this, is that the Host has no idea

that the LUN is on a Clariion. The host believes that the LUN is a Local Disk.

For the host, this is fine. In Windows, the host is going to rescan, and pick up

the LUNs as the next available disk. In the example above, the Windows Host

picks up LUNs 6 and 23, but to the host, after a rescan/reboot, the host is

going to see the LUNs as Disk 4 and Disk 5, which we can now initialize, add

a drive letter, format, create the partition, and make the LUN visible through

the host.

In the case of the Solaris Host’s Storage Group, when we added the LUNs to

the Storage Group, we changed LUN 9s host id to 9, and LUN 15s host id to

15. This allows the Solaris host to see the Clariion LUN 9 as c_t_d 9, and LUN

15 as c_t_d 15. If we hadn’t changed the Host ID number for the LUNs

however, Navisphere would have assigned LUN 9 with the Host ID of 0, and

LUN 15 with the Host ID of 1. Then the host would see LUN 9 as c_t_d 0 and

LUN 15 as c_t_d 15.

The last drawing is an example of a Clustered environment. The blue server

is the Active Node of the cluster, and the orange server is the

Standby/Passive Node of the cluster. In this example, we created a Storage

Group in Navisphere for each host in the Cluster. Into the Active Node

Storage Group, we place LUN 8. LUN 8 also went into the Passive Node

Storage Group. A LUN can belong to multiple storage groups. The reason for

this, is if we only placed LUN 8 in to the Active Node Storage Group, not into

the Passive Node Storage Group, and the Cluster failed over to the Passive

Node for some reason, there would be no LUN to see. A host can only see

what is in it’s storage group. That is why LUN 8 is in both Storage Groups.

Now, if this is not a Clustered Environment, this brings up another problem.

The Clariion does not limit who has access, or read/write privileges to a LUN.

When a LUN is assigned to a Storage Group, the LUN belongs to the host. If

we assign a LUN out to two hosts, with no Cluster setup, we are giving

simultaneous access of a LUN to two different servers. This means that each

server would assume ownership of the LUN, and constantly be overwriting

each other’s data.

We also added LUN 73 to the Active Node Storage Group, and LUN 74 to the

Passive Node Storage Group. This allows each server to see LUN 8 for

failover purposes, but LUN 73 only belonging to the Active Node Host, and

LUN 74 belonging to the Passive Node Host. If the cluster fails over to the

Passive Node, the Passive Node will see LUN 8 and LUN 74, not LUN 73

because it is not in the Storage Group.

Notice that LUN 28 is in the Clariion, but not assigned to anyone at the time.

No host has the ability to access LUN 28.

All in One- Clariion Blogs

Documents

Transcript of All in One- Clariion Blogs