Caching in Distributed Environment

25
Caching in the Distributed Environment Based on the article published in the Microsoft Architecture Journal : Issue 17 Available on-line at http://www.msarchitecturejournal.com/pdf/Journal17.pdf Abhijit Gadkari 1

description

Based on my article published in the Microsoft Architecture Journal : Issue 17Available on-line at http://www.msarchitecturejournal.com/pdf/Journal17.pdfAbhijitGadkari1

Transcript of Caching in Distributed Environment

Page 1: Caching in Distributed Environment

Caching in the Distributed Environment

Based on the article published in the Microsoft Architecture Journal : Issue 17Available on-line at http://www.msarchitecturejournal.com/pdf/Journal17.pdf

Abhijit Gadkari

1

Page 3: Caching in Distributed Environment

AgendaBackground info and basics

Different types of cache like temporal , spatial , primed and demand cache

Some Examples

Caching in the ORM world!

Transactional cache and Shared cache

Managing the interaction

Size of a cache and its impact on application performance

Five minute introduction of “Velocity” – Microsoft ‘s Distributed Caching platform

Open Forum !

3

Page 4: Caching in Distributed Environment

Basics

Data is stored in memory – i.e. L1, L2, L3 etc. known as cache. This concept is extensively used in the von Neumann Architecture.

Memory Access time is measured in access time. Given an address , the memory presents the data at some other time

Memory Access Time = Latency + Transfer Size / Transfer Rate [2]

Cloud

Hard Disk

RAM

On Board Cost per byteStorage SizeLatency

Persistence

4

Page 5: Caching in Distributed Environment

5

Data

Reference Data

ActivityData

ResourceData

Types of Data

Understanding the different types of data and their semantics helps to understand the different caching needs that comes with usage of that data type. [1]

Page 6: Caching in Distributed Environment

6

Data Type [1] Caching Strategy [1]

Reference Data Practically immutable, non-volatile and long lasting in nature -ideal candidate for caching. Can be shared across processes / application. For example, zip code, state list, department list, etc.

Activity Data Activity data is generated by the currently executing activity as part of a business transaction. Only good for the life on the transaction. Short lived in nature. For example, shopping cart on e-commerce web site.

Resource Data Highly dependent on domain logic and volatile in nature. Cache only when required. [a.k.a. don’t cache unless and until absolutely required]. Commonly associated keywords –concurrency , locking, ACID, dirty read, corrupt cache, business logic, etc. For example, quantity information in an inventory application.

Unknown DO NOT CACHE [ME]

“Keep a data item in electronic memory if its access frequency is five minutes or higher, otherwise keep it in magnetic memory”[2]Wikipedia defines cache as “a temporary area where frequently accessed data can

be stored for rapid access”[3]

Why ? – For Performance and Availability

Page 7: Caching in Distributed Environment

Principle of LocalityBased on work done in 1959 on Atlas System’sVirtual Memory [4]

Temporal CacheGood for frequently accessed , relatively nonvolatile data. For example, drop-down list on a web page

Spatial Cache Data adjacent to recently referenced data will be requested in near future. For example, GridViewpaging

7

Page 8: Caching in Distributed Environment

Temporal Cache

using System.Web.Caching8

public sealed class Cache : IEnumerable

Page 9: Caching in Distributed Environment

Spatial Cache

9

In .NET, cache can be synchronized using SqlCacheDependency

Page 10: Caching in Distributed Environment

Primed and Demand Cache [5,6]

Primed and Demand cache is based on the future use of the data. Predating future is not easy and should be based on sound engineering principals

The primed cache pattern is applicable when the cache or the part of the cache can be predicted in advance. For example, a web browser cache

The demand cache pattern is useful when cache can not be predicted in advance. For example, a cached copy of user credentials

The primed cache is populated at the beginning of the application, whereas the demand cache is populated during the execution of the application

10

Page 11: Caching in Distributed Environment

Primed Cache

In .NET ICachedReport interface can be used to store thepre-populated reports. The primed cache results in an almost constant size cache structure

11

Page 12: Caching in Distributed Environment

Demand Cache

1 user can have many roles 1 role can have many permissions

Managing demand cache Minimize memory leakMaximize hit-ratioEffective eviction policy

In dynamic environmentAdaptive Caching Strategiescan be very effective

12

Page 13: Caching in Distributed Environment

Caching in the ORM World!

cust_id type credit_allowed

3456 gold 1

7890 bronze 0

RDBMS

IMPEDA NCEMIS MATCH

In memory object graphRDBMS – persistent storage

Ms Entity Framework /LINQJDO, TopLink, Hibernate, NHibernate

The ORM manager populates the data stored in persistent storagelike database in the form of an object graph. An object graph is agood caching candidate

13

Customer

Gold Silver Bronze

Page 14: Caching in Distributed Environment

14

Page 15: Caching in Distributed Environment

Layered Cache Architecture

The layering principle is based on the explicitSeparation of responsibilities

Cache layering is prevalent in many ORM solutions. For Example, Velocity, Hibernate

The first layer represents the transactional cache and theSecond layer is the sharedcache designed as a processor clustered cache

15

Page 16: Caching in Distributed Environment

Transactional Cache

Objects formed in a valid state and participating ina transaction can be stored in the transactional cache

Strictly bounded by the ACID rules

Transactional cache size is small size and short lived

Thrashing , cache corruption and caching conflicts should be strictly avoided

Many caching frameworks offer out of the boxprepackaged transactional cache solution

16

Page 17: Caching in Distributed Environment

Shared Cache

Can be implemented as a process cache or clustered cache. The clustered cache introduces resource replication overhead

Shared cache is a read-only cache

Distributed caching solutions typically implements a shared cache solution

Can be implemented as an identity map. For example, caching read-only, static reports using ICachedReport

17

Page 18: Caching in Distributed Environment

18

Page 19: Caching in Distributed Environment

Chasing the Right Size Cache

Remember the 80-20 rule a.k.a. Pareto principle and the bell shapedgraph

19

Page 20: Caching in Distributed Environment

Microsoft project code named Velocity [1]http://msdn.microsoft.com/fi-fi/library/cc645013(en-us).aspx

20

Distributed in-memory application cache platform Can store any serializable CLR objectAllows clustering and provides ASP.NET session provider object so that ASP.NET session objects can be stored in the distributed cache without having to write to database

Page 21: Caching in Distributed Environment

21

Application Application

Web Server[s] / App Server[s]

Database

Application Application

Distributed Cache

Database

Web Server[s] / App Server[s]

Conventional Stack Stack with Distributed Cache

Physicalimplementation

One Logical View

Application Application

Velocity

Named Cache

Regions

Regions

Named Cache

Regions

Page 22: Caching in Distributed Environment

22

Features [1]

Machine -> Cache Host -> Named Cache -> Regions -> Cache Items -> objects

Cache Operations Get [select]– Returns object or entire Cache itemAdd [insert]- Creates new entry else exception if entry existsPut[update] - Replaces existing entry or creates a new oneRemove [delete]- Removes existing entry

Expiration and Eviction Policy is based on time-to-live [TTL] logic

Concurrency model supports optimistic version based updates and pessimistic locking

“Velocity” can be deployed as a service or embedded within the application. For example, host application can be ASP.NET / .NET application

Page 23: Caching in Distributed Environment

23

// Create instance of cachefactory (reads appconfig)CacheFactory fac = new CacheFactory();

// Get a named cache from the factoryCache catalog = fac.GetCache("catalogcache");

// Simple Get/Putcatalog.Put("toy-101", new Toy("thomas", .,.));

// From the same or a different clientToy toyObj = (Toy)catalog.Get("toy-101");

// Region based Get/Putcatalog.CreateRegion("toyRegion");

// Both toy and toyparts are put in the same region catalog.Put("toyRegion", "toy-101", new Toy( .,.));Catalog.Put("toyRegion", "toypart-100", new ToyParts(…));

Toy toyObj = (Toy)catalog.Get("toyRegion", "toy-101");

Example [1]

Page 24: Caching in Distributed Environment

24

ResourcesBased on the paper “Caching in the Distributed Environment” published in the Microsoft Architecture Journal : Issue 17

1. Microsft Project Code Named “Velocity” by N. Sampathkumar, MKrishnaprasad and A. Nori2.Transaction Processing : Concepts and Techniques by Jim Gray and Andreas Reuter [ISBN: 1558601902]3. http://en.wikipedia.org/wiki/Cache4. “The Locality Principle” by Peter J. Denning , Communications of the ACM”, July 2005, Vol 48, No 75. “Caching Patterns and Implementation”, by Octavian Paul Rotaru, Leonardo Journal of Sciences LJS: 5:8 , January-June 20066. Data Access Patterns: Database Interactions in Object-Oriented Applications, by Clifton Nock, Addision Wesley

Page 25: Caching in Distributed Environment

Open Forum !

Abhijit [email protected]

Blog : http://soaas.blogspot.com/

25