METHOD FOR THROTTLING BACKUP AGENT’S SAVE PROCESS · server’s jobdb daemon process at...

9
METHOD FOR THROTTLING BACKUP AGENT’S SAVE PROCESS Ajith Gopinath Senior Software Quality Engineer Dell EMC [email protected] Gururaj Kulkarni Consultant Software Quality Engineer Dell EMC [email protected] Sathyamoorthy Viswanathan Staff Engineer Gigamon [email protected]

Transcript of METHOD FOR THROTTLING BACKUP AGENT’S SAVE PROCESS · server’s jobdb daemon process at...

METHOD FOR THROTTLING BACKUP AGENT’S SAVE PROCESSAjith GopinathSenior Software Quality EngineerDell [email protected]

Gururaj KulkarniConsultant Software Quality EngineerDell [email protected]

Sathyamoorthy ViswanathanStaff [email protected]

2016 EMC Proven Professional Knowledge Sharing 2

Table of Contents

Introduction and problem statement ........................................................................................... 3

Proposed solution ...................................................................................................................... 4

Queue Processing ..................................................................................................................... 5

Architecture process flow ........................................................................................................... 6

Architecture Diagram ................................................................................................................. 7

Process flow .............................................................................................................................. 8

Disclaimer: The views, processes or methodologies published in this article are those of the authors.

They do not necessarily reflect Dell EMC’s views, processes or methodologies.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.

2016 EMC Proven Professional Knowledge Sharing 3

Introduction and problem statement

Currently, in a data protection scenario, the backup server will accept all the backup sessions

from the data center clients which are configured on the backup server and keep it in queue for

processing until the server parallelism is exhausted. The JOBDB keeps track of all sessions

which are active and queued.

When the current sessions are completed, the queued sessions would be activated to running

session and backup would be completed. In a scenario where a large number of backup clients

are configured and more data sets are to be backed up, the overall time spent by the backup

session in queued sessions would be high. There is a chance that the queued session would

have to wait seemingly forever and would time out with error.

Sessions may get dropped due to network latency.

If any timeout variable is set on the backup server / backup client agent, all queued up

sessions will be aborted and backups of those clients will fail.

There are scenarios where backup server is exhausted due to overload [large number

of queued sessions]. Certainly unable to handle the process which leads to backup

failure.

There is no intelligence built between backup server and backup clients to throttle the incoming

save processes based on the current backup server load.

For example, NetWorker® is an enterprise backup application which has a max server

parallelism limit of 1024. If a server receives more than 1024 sessions, it will put sessions in

queue and if the current backup sessions are taking more time than the timeout value to finish

the backup the backup of the queued session will get aborted. In a scenario where a feature like

parallel savestreams are enabled, one save set itself spawn multiple save session which would

eventually exhaust the server parallelism if more such backup clients are configured.

2016 EMC Proven Professional Knowledge Sharing 4

Proposed solution

The proposed solution is to introduce throttling the client backup agent save processes based

on the actual server load which can help to avoid overloading server process without

compromising on the client QoS.

Job process should be built with intelligent mechanism to throttle the backup agent’s save

process which can be achieved by creating 3 different types of queues, priority checking and

pooling mechanism.

The queues are

QUEUE_ACTIVE – Tracks the current active backup sessions from the backup server.

Queue size is equivalent to max server parallelism.

QUEUE_WAIT – Tracks the number of sessions waiting to get the active session from

the backup server. Queue size is always less than or equal to active queue.

QUEUE_SLEEP – Tracks the incoming session to the backup server when server is fully

loaded and not able to take another session. The queue size should be greater than or

equal to the server parallelism value.

When a scheduled backup job is initiated from the backup server, backup client should talk to

jobdb daemon process in the backup server to determine whether backup server can process

the save process from the clients. If the jobdb daemon finss that the server is fully engaged, all

new save sessions from the clients will queued into QUEUE_SLEEP. In some scenarios, jobdb

daemon tells the client that QUEUE_SLEEP is also full, and then the client should not send

anymore save sessions to the server. The client should then hold jobs and poll the backup

server’s jobdb daemon process at particular intervals [60 sec] to check if QUEUE_SLEEP is

free to send more sessions. An infinite wait period for the sessions can be avoided by

implementing a timeout period so that the sessions don’t have to wait forever.

At the same time, client backup priority also should be considered based on the SLA. If client

backup policy is configured with higher priority, the server should process the save from the

particular client than for a client with lesser priority value set. Building intel-mechanism should

honor the client without any compromise on it.

2016 EMC Proven Professional Knowledge Sharing 5

Queue Processing

These 3 queues when implemented will reduce overhead of server process (nsrd) as

well as jobdb process.

All currently running process will be in QUEUE_ACTIVE if the numbers of incoming

sessions are less than server parallelism.

Jobdb process will keep track of incoming sessions and will keep them in the

QUEUE_ACTIVE if the current active sessions are more than server parallelism value.

QUEUE_WAIT will act as a buffer between the QUEUE_ACTIVE and QUEUE_SLEEP.

When the QUEUE_ACTIVE is freeing some sessions, then QUEUE_WAIT sessions will

be moved to QUEUE_ACTIVE and QUEUE_SLEEP will receive sessions from clients

which are pooling from the clients end.

Advantages

Throttling client backup agent’s process will reduce network bandwidth congestion as in

cases of server is busy and cannot take more sessions. This will be intelligently notified

to the client process so that the client does not have to send the save session and wait

indefinitely at the server side.

Helps improve server performance as the new queue mechanism will offload the session

tracking workload from jobdb daemon process, helping both nsrd and jobdb process.

Avoids situations where the server is getting exhausted as with this solution, it is

intelligently managed.

Backup success ratio will increase since client sessions won’t be dropped unnecessarily.

2016 EMC Proven Professional Knowledge Sharing 6

Architecture process flow

JOBDB will be implemented with 3 different queues

1. QUEUE_ACTIVE

2. QUEUE_WAIT

3. QUEUE_SLEEP

Length of QUEUE_ACTIVE will be equivalent to the server parallelism value.

QUEUE_WAIT will be less than or equal to QUEUE_WAIT.

QUEUE_SLEEP will be defined with fixed size value based on the historical data, but

should be greater than or equal to server parallelism value.

When the scheduled backup is triggered from the backup server, the client’s backup

process daemon will contact the JOBDB daemon process in the backup server. The

backup will proceeded only when the jobdb daemon process a nod to the backup

process.

JOBDB daemon will maintain the status of the server’s current load.

If server is fully engaged, the client sessions will be queued at QUEUE_SLEEP

If QUEUE_SLEEP is also full, then client backup agent should hold the session, which

will help prevent network traffic / congestion. These processes will be notified when the

backup server is relatively load free and open to take more sessions.

Client backup agent polls the JOBDB process to check if the QUEUE_SLEEP is free to

queue more sessions.

JOBDB process should have the intelligence to honor client QoS and the backup policy

SLA.

If client is configured with higher priority, JOBDB daemon should consider this and take

into consideration when it processes the save sessions from different clients.

If QUEUE_ACTIVE is freeing some session then queued session will be moved in

upward direction and QUEUE_SLEEP will tell the polling clients to send in more

sessions which were in hold.

2016 EMC Proven Professional Knowledge Sharing 7

Architecture Diagram

2016 EMC Proven Professional Knowledge Sharing 8

Process flow

2016 EMC Proven Professional Knowledge Sharing 9

Dell EMC believes the information in this publication is accurate as of its publication date. The

information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO

RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE

INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED

WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying and distribution of any Dell EMC software described in this publication requires an

applicable software license.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.