Regex in Your SPL - SplunkConf...SED is a stream editor. It can be used to create substitutions in...

Post on 02-Dec-2020

3 views 0 download

Transcript of Regex in Your SPL - SplunkConf...SED is a stream editor. It can be used to create substitutions in...

Regex in Your SPLAn Easy Introduction

Michael Simko | Sr. Engineer, Instructor

September 2017 | Washington, DC

During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC.

The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release.

Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.

Forward-Looking Statements

THIS SLIDE IS REQUIRED FOR ALL 3 PARTY PRESENTATIONS.

Basics of Regular Expressions

What is this Regex thing all about?

© 2017 SPLUNK INC.

1. Filtering. Eliminate unwanted data in your searches

2. Matching. Advanced pattern matching to find the results you need

3. Field Extraction on-the-fly

What’s in it for me?

Regex in Splunk SPL

“A regular expression is an object that describes a pattern of characters. Regular expressions are used to perform pattern-matching and ‘search-and-replace’functions on text.”– w3schools.com

“Regular expressions are an extremely powerful tool for manipulating text and data…If you don't use regular expressions yet, you will...”– Mastering Regular Expressions,

O’Rielly, Jeffery E.F. Friedl

“A regular expression is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids.”– Regexbuddy.com (and others –

Original source unknown)

What Is Regex?What People Say

Regex Basics

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter(letter,#,or_)\WNotaWordCharacter

Operators:*ZeroorMore+OneorMore?ZeroorOne

Theseelementsworktogethertospecifyapattern

The Main Elements

Regex BasicsThe Main Elements

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Operators:*ZeroorMore+OneorMore?ZeroorOne

Sample Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+

^ RegexisAnchoredtothebeginningoftheline\d+ isoneormoredigits\w+ isoneormorewordcharacters\s withouta+or*isasinglespace: is the literal character colon

Regex BasicsThe Main Elements

Operators:*ZeroorMore+OneorMore?ZeroorOne

Sample Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+

MatchingString:22Aug2017 18:45:20 Onthisdate,MichaelmadeBBQreferences

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Regex BasicsTo Protect and Give Options

SpecialCharacters:Togivemultipleoptions:|Thepipecharacter(alsocalled“or”)

ProtectingCharacters:Toescapeorprotectspecialcharacters:\TheBacklashorback-whack

Protectperiods,[],(),{},etcwhenyouwanttousetheliteralcharacter

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Regex BasicsTo Protect and Give Options

Regex: Indiana|Purdue Regex:\d+\.\d+\.\d+\.\d+

Purdue 8w3l.72719w5l.792Indiana 5w4l.50015w8l.652

LoginFailureFrom192.168.12.145LoginSuccessFrom10.35.36.37

(we’lldotheaboveadifferentwaylater)

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Regex BasicsOnly Some May Pass

IncludeCharacters:[…]

ExcludeCharacters:[^…]

Exampleusage:[a-zA-Z0-9] Exampleusage:[^]

InclusionCharacters:[]Include[^]Exclude

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Regex BasicsOnly Some May Pass

Regex:server:[a-z0-9]+ Regex:server:[^]

server:253fsf2,host=23423server:253fsf2,host=23423server:253f sf2,host=23423

Keepgoingsolongasyouhit

charactersthatarelowercasea-Zor0-9

Gountilyouhitaspace

InclusionCharacters:[]Include[^]Exclude

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Regex BasicsSay What Again

RepetitionisusedtodefinetheexactnumberofcharactersOranupperandlowerboundaryofacceptablecharacters

(ortheexactnumberofrepetitionsofapattern)

Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions

InclusionCharacters:[]Include[^]Exclude

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Regex BasicsSay What Again

Regex:IP: \d{3}\.\d{3}\.\d{3}\.\d{3} Regex:IP: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}IP:172.106.190.100IP:10.24.255.2IP:224.252.2.52

IP:172.16.19.1IP:10.24.255.2IP:224.252.2.52

Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions

InclusionCharacters:[]Include[^]Exclude

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Regex BasicsTo Protect and Give Options

Usetospecifyrepetitionforadjacentelementsinordertoformpatterns

Laterwe’llusetheseas“capturegroups”

Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions

InclusionCharacters:[]Include[^]Exclude

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

LogicalGroupings:()WrapsetsoftheRegex

Regex BasicsTo Protect and Give Options

AlternateRegex:IP:(\d{1,3}\.){3}\d{1,3}IP:172.16.19.1IP:10.24.255.2IP:224.252.2.52

Repeats\d{1,3}\.threetimesThentacksonthelast\d{1,3}

RevisitingtheIPMatchingfromacoupleofslidesago

Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions

InclusionCharacters:[]Include[^]Exclude

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

LogicalGroupings:()WrapsetsoftheRegex

Regex Basics

Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions

InclusionCharacters:[]Include[^]Exclude

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

LogicalGroupings:()WrapsetsoftheRegex

NamedCaptureGroups:(?<CaptureGroupName>stuff)

Thisnamesthecapturegroup(e.g.,logicalgrouping).Nowwhenyoureturnthecapture,ithasanameandnotjust“CaptureGroup1”

The Last (Not so Basic) Element

Regex BasicsThe Last (Not so Basic) Element

Regex:user:\s(?<username>[^@]+)

Log1:blahblahuser: msimko@splunk.comLog2:moreblahuser: michael@kinneygroup.com

Gountilwehitan@CaptureasfieldusernameAnchoroffuser:\s

Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions

InclusionCharacters:[]Include[^]Exclude

ProtectionCharacters:\ Thenextcharacterisaliteral

SpecialCharacters:|Alternative/“or”

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

LogicalGroupings:()WrapsetsoftheRegex

NamedCaptureGroups:(?<CaptureGroupName>stuff)

Regex in SPLUsing Regular Expressions to improve your SPL

▶ Field Extractions• erex• rex• Interactive Field Extractor• Props – Extract• Transforms - Report

▶ Evaluation• Regex• match• replace

Regex in Your SPLSearch Time Regex

Fields are fundamental to Splunk Search

Regex provides granularity when evaluating data

© 2017 SPLUNK INC.

Field ExtractionsOn the fly (No need to work ahead)

erex CommandField Extractions Using Examples

UseSplunktogenerateregularexpressionsbyprovidingalistofvaluesfromthedata.

▶ Scenario: Extract the first word of each sample phrase from | windbag• Step 1, find the samples• Step 2, extract the field

erex CommandField Extractions Using Examples

|windbag |erexfirstwords examples="Unë,یؤلمن,Կրնամ"

Eastereggthatcreatessampledata

ExamplesfromthedataNewFieldtocreate

Erex Command: …| erex <newFieldName> examples=“example1,example2”

erex CommandField Extractions Using Examples

|windbag |erexfirstwordsexamples="Unë,یؤلمن,Կրնամ"

Thevalueserexgeneratedbasedonthesamples

NewFieldcreated

erex CommandField Extractions Using Examples

▶ Erex is a great introduction to using regular expressions for field extraction.• Erex provides the rex that it generated• Going forward, use the rex in your saved

searches and dashboards.• Rex is more efficient

rex CommandExtract Fields Using Regular Expressions at Search Time

…|rexfield={what_field}“FrontAnchor(?<extraction>{characters}+)BackAnchor”

Creates a Field Extraction

rex CommandExtract Fields Using Regular Expressions at Search Time

| windbag | rex field=sample "^(?<FirstWord>[\S+]*)"

Specifythefieldtorexfrom

FrontAnchorNamedFieldExtraction Grabanynon-spacecharacter

rex CommandExtract Fields Using Regular Expressions at Search Time

| windbag | rex field=sample"^(?<FirstWord>[\S+]*)"

NamedFieldExtraction Grabanynon-spacecharacter

rex CommandUse Rex to Perform SED Style Substitutions

SED is a stream editor. It can be used to create substitutions in data.

Splunk uses the rex command to perform Search-Time substitutions.

rex CommandUse Rex to Perform SED Style Substitutions

Setthemode

s forsubstituteg forglobal(morethanonce)

Substitutethestuffbetweenthefirst/andsecond/ withthestuffbetweensecond/ andthird/

()tocreateacapturegroup\1topastecapturegroup

| windbag | search lang="*Norse" | rex mode=sed "s/Old (Norse)/Not-so-old \1/g"

rex CommandUse Rex to Perform SED Style Substitutions

Setthemode

s forsubstitute

()tocreateacapturegroup\1topastecapturegroup

…| rex mode=sed "s/Old (Norse)/Not-so-old \1/g"

Result:

© 2017 SPLUNK INC.

EvaluationUsing Regular Expressions for Pattern Matching

Regex CommandFilter Using Regular Expressions

sourcetype=fs_notification |regexchgs="^modtime"

Fieldtoevaluate Regex

Match FunctionFilter Using Regular Expressions

…|evaln=if(match(field,”^MyRegex”,1,2)

match(SUBJECT),”REGEX”

sourcetype=access_combined_wcookie|eval com=if(match(referer,"http:.*\.com"),"True","False")

Match.Returns1foritmatches,0fornot.

Fieldtoevaluate TheRegex

Replace CommandSwitch Data at Search Time

Replacefieldvalueswiththevaluesyouspecify…|replace“<whoever>” WITH“<whomever>” IN<target_field>

Replace CommandSwitch Data at Search Time

Replacefieldvalueswiththevaluesyouspecify…|replace“<whoever>” WITH“<whomever>” IN<target_field>

| windbag | replace "Euro" with "Euro: How is a currency a language" in lang

Stringtobereplaced

Stringtoreplacewith

Fieldinwhichtomakethereplacement

operator operator

PersistenceRegular Expressions That Exist Outside Your Search

Untilthispoint,everyoneofourextractionshaveonlyexistedinthesearch.But,whatifwewantthemtopersist?Ortosharethem?

1.InteractiveFieldExtractor

2.ExtractionsinProps/Transforms

– Walk-through UI

– You may want to rewrite the generated Regex

– Does not require admin rights

– Straight editing in props.conf

– Requires Admin Rights (or an admin to put in place)

– Edit directly in transforms.conf

– Invoked by props.conf

– Requires Admin Rights (or an admin to put in place)

Persistent Field ExtractionsComparing The Persistent Field Extractions

InteractiveFieldExtractor ExtractinProps ReportinTransforms

Q&AMichael Simko | Sr. Engineer/Instructor

© 2017 SPLUNK INC.

1. Use Regex to create powerful filters in your SPL

2. Use Regex to create field extractions

3. Regex doesn’t have to be hard. You can do this!

Regex in your SPL

Key Takeaways

© 2017 SPLUNK INC.

Don't forget to rate this session in the .conf2017 mobile app

Thank You

Appendix ACaveats

rex Command – CaveatUse Rex to Perform SED Style Substitutions

The substitution from rex comes after the lang field is extracted. So even though the event data is showing us the substitution, the field lang is showing the original value.

Caveat:| windbag | search lang="*Norse" | rex mode=sed "s/Old (Norse)/Not-so-old \1/g"

Appendix BExercises to Practice With

Regex BasicsThe Main Elements

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Operators:*ZeroorMore+OneorMore?ZeroorOne

Scenario Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+LearnbyFire:WhichofthesewillthesampleRegexmatch?

A. 002421Februari10831:242525:22352B. 07Feb1712:53:36AMC. Feb13201718:46:56D. 14February201707:45:47Z

(answersonnextslide)

Regex BasicsThe Main Elements

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Operators:*ZeroorMore+OneorMore?ZeroorOne

A. 002421 Februari 1083 1:242525:22352B. 07 Feb 17 12:53:36AMC. Feb 13 2017 18:46:56D. 14 February 2017 07:45:47:46

Scenario Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+LearnbyFire:WhichofthesewillthesampleRegexmatch?

Regex BasicsThe Main Elements

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Operators:*ZeroorMore+OneorMore?ZeroorOne

Practice:Createa Regexthatdescribesallthreeofthefollowingstrings

06February2017192.168.1.205Apr201410.2.1.15031July202019..15.63

Regex BasicsThe Main Elements

ControlCharacters:^StartofaLine$EndofaLine

CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters

Operators:*ZeroorMore+OneorMore?ZeroorOne

Scenario:Createa Regexthatdescribesthefollowingstrings

06 February 2017 192.168.1.205 Apr 2014 10.2.1.15031 July 2020 19..15.63

Asolution:\d+\s\w+\s\d+\s\d*\.\d*\.\d*\.\d*

Regex Basics

1.OpenupyourSplunk2.|windbag|head20|table_raw3.Copythe_rawdata4.PastethedatainRegex101.com

Goals:Extractthefollowingfieldsforeachevent:langsampleTheDatewithoutTimeTheTime

Performtheseas“named” extractions

The Main Elements

Replace CommandSwitch Data at Search Time

Sillyversiontotryonyourown|windbag|head20|replace"1"WITH"Uno"inodd

Tryit,thenclickthedownchevrontoseetheresults