Centralized logging with Flume

20
Log Ingestion on Big Data Platform with Flume

Transcript of Centralized logging with Flume

Page 1: Centralized logging with Flume

Log Ingestion on Big Data Platform with Flume

Page 2: Centralized logging with Flume

Agenda

•Why Centralized Logging on Hadoop

•Flume Introduction

•Simple Flume Logging

•Centralized and Scalable Flume Logging

•Leveraging log data

•Example

2

Page 3: Centralized logging with Flume

•There are tons of logs generated from Applications•These logs are stored on local disks on individual nodes.•Log files containing records are required to archive in near real time to

create some value.•Enable analytics on logs for diagnosing issues on Hadoop platform.

3

Use Case: Centralized Logging Requirements

Page 4: Centralized logging with Flume

Centralized Log Management & Analytics : Goals

•Have a central repository to store large volume of machine generated data from all sources and tiers of applications and infrastructures•Feed log data from multiple sources to the common repository in a non-intrusive way and in near real time•Enable analytics on log data using standard analytical solutions•Provide capability to search and correlate information across different sources for quick problem isolation and resolution.•Improve operational intelligence and •Be centralized without redundancy of multiple agents on all hosts for log collections

4

Page 5: Centralized logging with Flume

Solution Components for centralized logging

Flume•Flume is a streaming service, distributed as part of Apache Hadoop ecosystem, and primarily a reliable way of getting stream and log data into HDFS. Its pluggable architecture supports any consumer. A correctly configured pipeline of Flume is guaranteed to not lose data, provided durable channels are used.•Each Flume agent consists of three major components: sources, channels, and sinks.

SourcesAn active component that receives events from a specialized location or mechanism

and places it on one or Channels.

Different Source types:Specialized sources for integrating with well-known systems. Example: Syslog, NetcatAvroSource NetcatSource SpoolDirectorySourceExecSource JMSSource SyslogTcpSource SyslogUDPSource

5

Page 6: Centralized logging with Flume

Channels A passive component that buffers the incoming events until they are drained by Sinks.

Different Channels offer different levels of persistence:Memory Channel: volatileData lost if JVM or machine restartsFile Channel: backed by WAL implementation Data not lost unless the disk dies.Eventually, when the agent comes back data can be accessed.Channels are fully transactionalProvide weak ordering guarantees (in case of failures / rollbacks )Can work with any number of Sources and Sinks.

Handles upstream bursts Upstream or downstream buffers

Page 7: Centralized logging with Flume

Sinks

An active component that removes events from a Channel and transmits them to their next hop destination.

Different types of Sinks:Terminal sinks that deposit events to their final destination. For example: HDFS, HBase, Kite-Solr, Elastic Search

Sinks support serialization to user’s preferred formats.

HDFS sink supports time-based and arbitrary bucketing of data while writing to HDFS.IPC sink for Agent-to-Agent communication: Avro

Require exactly one channel to function

Page 8: Centralized logging with Flume

Flume Multi Tier Setup

[Client]+ Agent [ Agent]* Destination_______

Page 9: Centralized logging with Flume
Page 10: Centralized logging with Flume

Interceptors

InterceptorFlume has the capability to modify/drop events in-flight. This is done with the help of interceptors. An interceptor can modify or even drop events based on any criteria chosen by the developer of the interceptor.

Built-in Interceptors allow adding headers such as timestamps, hostname, static markers etc.Custom interceptors can introspect event payload to create specific headers where necessary

Page 11: Centralized logging with Flume

Configuration Example: Flume Agents

● Hierarchical● Flow of components

11

Page 12: Centralized logging with Flume

Contextual Routing with InterceptorsAchieved using Interceptors and Channel SelectorsTerminal Sinks can directly use Headers to make destination selections

HDFS Sink can use headers values to create dynamic path for files that the event will be added to.

# channel selector configurationagent_foo.sources.avro-AppSrv-source1.selector.type = multiplexingagent_foo.sources.avro-AppSrv-source1.selector.header = Stateagent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.optional.CA = mem-channel-1 file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1

Page 13: Centralized logging with Flume

Flume Client An entity that generates events and sends them to one or more Agents.

• Example• Flume/Syslog log4j Appender• Custom Client using Client SDK (org.apache.flume.api)• Embedded Agent – An agent embedded within your application

• Decouples Flume from the system where event data is consumed from• Not needed in all cases

Page 14: Centralized logging with Flume

Client Applications

Configuration Example: Log4j

•log4j.appender.syslog=org.apache.log4j.net.SyslogAppender•log4j.appender.syslog.Facility=LOCAL3•log4j.appender.syslog.FacilityPrinting=false•log4j.appender.syslog.Header=true•log4j.appender.syslog.SyslogHost=FlumedestinationHost:4444•log4j.appender.syslog.layout=org.apache.log4j.PatternLayout•log4j.appender.syslog.layout.ConversionPattern= TYPE: DUMMY %p: (%F:%L) %x %m %n

14

Below is log4 configuration snippet , To enable java applications to send events

Page 15: Centralized logging with Flume

For Non log4j Applications Rsyslog

•Rsyslog is an open-source software utility used on UNIX and Unix-like computer systems for forwarding log messages in an IP network. It implements the basic syslog protocol, extends it with content-based filtering, rich filtering capabilities, flexible configuration options and adds features such as using TCP for transport.

● Used in most of the Linux distros as standard logger ● Has multiple facilities for application use local0-local7 (avoid local7)● Can poll any file on system and send new events over the network to syslog destinations● service rsyslog restart

$ModLoad imfile$InputFileName /var/log/NEWAPP/NEWAPP.log$InputFileTag TYPE:_NEWAPP$WorkDirectory /var/spool/rsyslog/NEWAPP$InputFileStateFile NEWAPP-log$InputFileFacility local7$InputFilePersistStateInterval 10$InputFileSeverity info$RepeatedMsgReduction off$InputRunFileMonitorlocal7.* @@flumehost:4444

Page 16: Centralized logging with Flume

Solution: Near Real Time Log Archive to Hadoop Platform

16

Event Flow :: Simple Flume Logging

Page 17: Centralized logging with Flume

Solution: Near Real Time Log Archive to Hadoop Platform

17

•Less centralized , avoiding single point of failure.•In case collector fails , events are still not lost.•Scope for further scalability , with minimum configuration.

Page 18: Centralized logging with Flume

Configuration Example: Flume Multi tier Config

●Flume Listener Agents■ This agent gathers events from multiple applications.■ can also perform event inspections using interceptors in this tier.■ Each event is analyzed and sent forward with appropriate header(only) updates so next agent

can make sense of it.■ We can use filechannel or any other durable channel here.■ Events aggregated for next tier

●Flume Writer Tier■ Minimum connections to HDFS ■ This agent gets events from aggregator and reads headers.■ According to header events are sent to relevant location on HDFS.

18

Page 19: Centralized logging with Flume

DDL for creating a Hive table with log data,

CREATE TABLE logData_H2 ( Ltype STRING, event_time STRING, porder STRING, SEVERITY STRING, SCLASS STRING, PHO STRING , MESG STRING ) ROW FORMATDELIMITED FIELDS TERMINATED BY ','STORED AS TEXTFILE

LOCATION '/data/logmgmt/_DUMMY/raz-XPS14/150703/';

Page 20: Centralized logging with Flume

Thank you