Storage and Alfresco

31
Storage Foundation and Alfresco Toni de la Fuente Principal Solutions Engineer, Americas [email protected] Blog: blyx.com – Twitter: @ToniBlyx

Transcript of Storage and Alfresco

Page 1: Storage and Alfresco

Storage Foundation and Alfresco

Toni de la Fuente Principal Solutions Engineer, Americas [email protected] Blog: blyx.com – Twitter: @ToniBlyx

Page 2: Storage and Alfresco

Agenda •  Intro to Storage Concepts •  Hardware •  Alfresco Storage Related Solutions

–  Alfresco S3 •  Caching contentstore

–  Alfresco XAM –  Content Store Selector –  Replication / Geo-clusters / Redundancy

•  Partners Solutions –  Alf2CAS, Star Storage

•  Storage Best Practices with Alfresco •  Backup and Recovery

Page 3: Storage and Alfresco

Intro to Storage Concepts: stack

File Protocol NFS, CIFS, SMB

File System Ext3, Ext4, RaiserFS, XFS, GFS, NTFS, FAT32, GlusterFS, OCFS, ZFS

Block Management MDM, LVM (Logical Volume Management)

Block Protocol SCSI, SATA, FC

RAID (HW or SW) Mirrors, Stripes

Hardware Disks, connectors, racks, FC switches

Page 4: Storage and Alfresco

Intro to Storage Concepts •  Hard drive types and interfaces

–  PATA: Parallel Advanced Technology Attachment •  AKA IDE or EIDE, older, 20pin connector, less efficient, use

to be 4K – 5K rpm.

–  SATA: Serial ATA •  Similar to PATA, different connector, more energy efficient,

between 5K and 10K rpm.

–  SCSI: Small Computer System Interface •  Spin at 10K and 15K rpm, need a controller

–  SSD: Solid State Drives •  No mechanical, semiconductors, much faster than

mechanical and less likely to break down than others.

Page 5: Storage and Alfresco

Intro to Storage Concepts •  Hard drive types and interfaces

–  FC: Fibre Channel •  Successor to parallel SCSI, broader usage than mere disk

interfaces, used for SANs. –  SAS: Serial Attached SCSI

•  Similar to SCSI but serial rather than parallel. –  Other interfaces end user oriented:

•  USB •  Firewire •  Thunderbolt

•  CAS Content-addressable storage, is a mechanism for storing information that can be retrieved based on its content, not its storage location. (EMC Centera / Caringo)

•  XAM standard interface for archiving in CAS.

Page 6: Storage and Alfresco

Intro to Storage Concepts •  RAID types (SW or HW)

ß Faster with parity

Page 7: Storage and Alfresco

Intro to Storage Concepts Main differences between SAN and NAS

A SAN is a shared "network" of storage •  Block access to LUNs •  Online and offline storage •  SAN device = storage array •  Zoning: data integrity and

security •  Dedicated fiber network Protocols: •  SCSI over Fibre Channel •  SCSI over IP/Ethernet (iSCSI)

and FC, Infiniband

NAS is a file system shared over a network

•  File access to data •  Online storage only •  NAS device = File server or

"filer” already formatted Protocols: •  NFS, CIFS over IP over

Ethernet

Page 8: Storage and Alfresco

Intro to Storage Concepts Who should need a SAN? •  Database servers and ECM: Oracle, SQL Server, DB2 and

other database servers. •  File servers: Using SAN-based storage for file servers lets

you expand file server resources quickly, makes them run better, and enables you to manage your file-based NAS storage through the SAN.

•  Backup servers: SAN-based backup is dramatically faster than LAN-based backup.

•  Voice/video servers: Manage large amounts of data very quickly.

•  High-performance application servers: Applications such as document management, customer relationship management, billing, data warehouses, and other high-performance and critical applications all benefit by what a SAN can provide.

Page 9: Storage and Alfresco

Intro to Storage Concepts •  Evolution

Internal Storage

Direct-Attach Storage (DAS)

Network-Attached Storage (NAS)

Page 10: Storage and Alfresco

Hardware HBA

CARD

Tape Library

Fibre Cables

Storage Arrays

Page 11: Storage and Alfresco

Alfresco Storage Related Solutions Alfresco S3 Connector •  An alternative contentstore implementation that uses S3 directly (S3

APIs) •  Somewhat equivalent to XAM, but not identical

–  Unlike XAM, S3 doesn’t offer retention policies •  Enterprise only

–  USD10K for Alfresco Standard –  USD13.4K for Alfresco Enterprise

•  Shipped as a single repo-side AMP •  Can only be installed into a new Alfresco instance (no migration!) •  Configuration must be done before first start. •  Can also configure caching content store (default cache size: 50GB) •  Only supported if Alfresco is running on Amazon EC2 •  Amazon EBS still required for database files, indexes, etc. •  Does not support S3 Encryption yet.

Page 12: Storage and Alfresco

Alfresco Storage Related Solutions Alfresco XAM Connector (deprecated) •  Made to get access from Alfresco to XAM

enabled storage devices. •  New XAM connector available •  Only EMC Centera supported •  Released with 3.4, Jan 2011. •  Enterprise only •  Still being supported for existing customers

–  until November 30th 2014 or their current subscription runs out, whichever comes first.

Page 13: Storage and Alfresco

Alfresco Storage Related Solutions Content Store Selector •  Storage policies based in

business rules •  Since Alfresco 3.2 •  Examples

o  By type: Large video files on fast expensive drives. Office documents on slower, more cost effective, drives.

o  By business unit, by age, by usage, by ...

•  Leverage Rules and Actions to drive

SSD $$$

SATA Drive

$

SSD = Solid State Drives FC = Fibre Channel

Policy Rules

Policy Rules

FC Drives

$$

Page 14: Storage and Alfresco

Alfresco Storage Related Solutions Content Replication (Alfresco on-premise to Alfresco on-premise) •  Distributed repository replication

–  Selective replication of spaces and content –  Support for full, incremental and delete –  One source – multiple destinations –  Replicas are read-only (update at source only - re-

direct if needed) •  Benefits

–  Support geographically dispersed companies –  Provide fast local access –  Remove single point of failure –  Reduce wide area network traffic

Page 15: Storage and Alfresco

Alfresco Storage Related Solutions Content Replication / Geo-clusters / Redundancy •  Alfresco Cloud Sync: On premise ßà Cloud

–  Content oriented not for storage replication

•  Synchronization feature between Alfresco on-premises (Not available yet).

•  Alfresco Desktop Sync: from Windows or Mac desktop to Alfresco on-premise (not available yet)

Page 16: Storage and Alfresco

Alfresco Storage Related Solutions Geo-clusters and Redundancy •  Geo-clusters can be done by replicating DB and Content

store. Supported? –  Low level replication/sync –  Some customers has this. –  Some customer uses NetApp NAS storage and Golden-gate for DB replication –  Other replication tools: EMC Clariion, EMC Symmetrix or IBM Total Storage.

Page 17: Storage and Alfresco

Partners Solutions •  Xenit Alf2Cas

–  Caringo Castor integration –  Deprecated?

•  Star Storage – Hitachi Content Platform (HCP) –  Content archiving, additional storage and faster content backup –  Alfresco Enterprise: 3.4.x, 4.0.x –  Hitachi Content Platform (HCP): 4.x, 5.x, 6.x

Page 18: Storage and Alfresco

Third Party – Community Solutions •  StorNext

–  It is not a connector is a solution for data life cycle management in the background

–  Alfresco can see it as mount point and is not aware about that –  Runs over FC

•  EMC Atmos –  XAM connector for Alfresco

•  Alfresco Cloud Store –  Amazon S3 –  https://code.google.com/p/alfresco-cloud-store/

•  Amazon S3 for on premise –  https://issues.alfresco.com/jira/browse/AMZNSSS-26

•  Walrus? The S3 alternative for Eucalyptus

Page 19: Storage and Alfresco

Storage Best Practices •  Content Store

–  Use Content Store Selector for managing different size of contents.

–  Default content store should be faster than others for writing to avoid bottlenecks (contents come to default then copied to other content store)

–  WORM disks as non default content store (cleaner - Jefferies)

–  SAN if possible –  If NAS use a dedicated LAN if possible –  LVM if possible (scalability, snapshot) –  Clean trash bin often –  Delete “contentstore.deleted” often

Page 20: Storage and Alfresco

Storage Best Practices •  Indexes (SOLR or Lucene)

–  Dedicated disk local or SAN. –  Avoid NAS. –  Have at least 50-75% of space free (backup and

merge) –  Consider using different file system for Lucene

backup and Solr backup. •  Logs

–  Set your logs directory in different file system as Content Store and Indexes.

Page 21: Storage and Alfresco

Backup and Recovery •  Recovery Time Objective: (RTO) The amount of time

that it takes to get your systems back online.

•  Recovery Point Objective: (RPO)This is the last consistent data transaction prior to the disaster. If you had a disaster, how much data would be lost?

•  The Disaster Recovery plan (DR) focuses on getting your business back up and running after a major outage

•  The Business Continuance plan (BCP) focuses on keeping your business running DURING the disaster.

Page 22: Storage and Alfresco

Backup and Recovery •  Alfresco Backup and Recovery Tool is

available: –  http://blyx.com/open-source-contributions/alfresco-

bart/

•  Alfresco Backup and Recovery White Paper: –  http://www.slideshare.net/toniblyx/alfresco-backup-

and-disaster-recovery-white-paper

Page 23: Storage and Alfresco

Common Questions to SE? •  Best practices to storage.

–  You got it

•  NAS or SAN? –  SAN if possible! Or NAS backed by a SAN is common as well. NAS is not bad

but now you know why is different.

•  Required space for DB, Indexes, Content Store? –  It depends of any case but DB and Indexes use to be a 20% of the Content Store

space (each).

•  Do you have an Archiving solution? –  Alfresco can be integrated with Archiving solutions like mentioned above and

implemented with Content Store Selector.

•  Do you have a backup/recovery solution? –  http://www.slideshare.net/toniblyx/alfresco-backup-and-disaster-recovery-white-

paper

•  Do you have an data encryption solution? –  Yes, Alfresco Encryption at Rest:

http://docs.alfresco.com/5.0/concepts/encrypted-overview.html

Page 24: Storage and Alfresco

What kind of storage can I use with Alfresco? •  Any mountable volumes that can be made to

appear as standard local filesystems (local disks, NAS, SAN, etc.)

•  Amazon S3 (for Alfresco installations in AWS) •  Centera (through the now open source

connector) •  EMC Atmos (through a partner-created

integration) •  CAStor (through a dated partner-created

integration)

Page 25: Storage and Alfresco

Appendix 1: Deleting content

Page 26: Storage and Alfresco

Deleting Content •  A complex process •  You need to know this because it impacts

–  Disk space management –  Backup and recovery procedures (and their integrity) –  Security and auditing

•  You have a wide degree of control over what happens and when

•  You need to do some work •  More info page 24

http://www.slideshare.net/toniblyx/alfresco-security-best-practices-guide

Page 27: Storage and Alfresco

Node deletion workspace://SpacesStore   alf_node  

alf_content_data  

alf_content_url  

alf_node_properties  

others   2e3839d2d345.bin  

archive://SpacesStore  

contentstore  

~/alf_data  

contentstore.deleted  

filesystem  database  

User  deletes  document  

workspace://SpacesStore   alf_node  

alf_content_data  

alf_content_url  

alf_node_properties  

others   2e3839d2d345.bin  

archive://SpacesStore  

contentstore  

~/alf_data  

contentstore.deleted  

filesystem  database  

Page 28: Storage and Alfresco

Node deletion

Wastebasket  emp5es  

workspace://SpacesStore   alf_node  

alf_content_data  

alf_content_url  orphan_time  =  'now'  

alf_node_properties  

2e3839d2d345.bin  

archive://SpacesStore  

contentstore  

~/alf_data  

contentstore.deleted  

filesystem  database  

workspace://SpacesStore   alf_node  

alf_content_data  

alf_content_url  

alf_node_properties  

others   2e3839d2d345.bin  

archive://SpacesStore  

contentstore  

~/alf_data  

contentstore.deleted  

filesystem  database  

Page 29: Storage and Alfresco

Node deletion workspace://SpacesStore   alf_node  

alf_content_data  

alf_content_url  orphan_time  =  'now'  

alf_node_properties  

2e3839d2d345.bin  

archive://SpacesStore  

contentstore  

~/alf_data  

contentstore.deleted  

filesystem  database  

contentStoreCleaner  Runs  

workspace://SpacesStore   alf_node  

alf_content_data  

alf_content_url  

alf_node_properties  

2e3839d2d345.bin  

archive://SpacesStore  

contentstore  

~/alf_data  

contentstore.deleted  

filesystem  database  

Page 30: Storage and Alfresco

Questions?

Page 31: Storage and Alfresco