Regex in Your SPL - SplunkConf...SED is a stream editor. It can be used to create substitutions in...
Transcript of Regex in Your SPL - SplunkConf...SED is a stream editor. It can be used to create substitutions in...
Regex in Your SPLAn Easy Introduction
Michael Simko | Sr. Engineer, Instructor
September 2017 | Washington, DC
During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC.
The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Forward-Looking Statements
THIS SLIDE IS REQUIRED FOR ALL 3 PARTY PRESENTATIONS.
Basics of Regular Expressions
What is this Regex thing all about?
© 2017 SPLUNK INC.
1. Filtering. Eliminate unwanted data in your searches
2. Matching. Advanced pattern matching to find the results you need
3. Field Extraction on-the-fly
What’s in it for me?
Regex in Splunk SPL
“A regular expression is an object that describes a pattern of characters. Regular expressions are used to perform pattern-matching and ‘search-and-replace’functions on text.”– w3schools.com
“Regular expressions are an extremely powerful tool for manipulating text and data…If you don't use regular expressions yet, you will...”– Mastering Regular Expressions,
O’Rielly, Jeffery E.F. Friedl
“A regular expression is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids.”– Regexbuddy.com (and others –
Original source unknown)
What Is Regex?What People Say
Regex Basics
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter(letter,#,or_)\WNotaWordCharacter
Operators:*ZeroorMore+OneorMore?ZeroorOne
Theseelementsworktogethertospecifyapattern
The Main Elements
Regex BasicsThe Main Elements
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Operators:*ZeroorMore+OneorMore?ZeroorOne
Sample Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+
^ RegexisAnchoredtothebeginningoftheline\d+ isoneormoredigits\w+ isoneormorewordcharacters\s withouta+or*isasinglespace: is the literal character colon
Regex BasicsThe Main Elements
Operators:*ZeroorMore+OneorMore?ZeroorOne
Sample Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+
MatchingString:22Aug2017 18:45:20 Onthisdate,MichaelmadeBBQreferences
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Regex BasicsTo Protect and Give Options
SpecialCharacters:Togivemultipleoptions:|Thepipecharacter(alsocalled“or”)
ProtectingCharacters:Toescapeorprotectspecialcharacters:\TheBacklashorback-whack
Protectperiods,[],(),{},etcwhenyouwanttousetheliteralcharacter
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Regex BasicsTo Protect and Give Options
Regex: Indiana|Purdue Regex:\d+\.\d+\.\d+\.\d+
Purdue 8w3l.72719w5l.792Indiana 5w4l.50015w8l.652
LoginFailureFrom192.168.12.145LoginSuccessFrom10.35.36.37
(we’lldotheaboveadifferentwaylater)
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Regex BasicsOnly Some May Pass
IncludeCharacters:[…]
ExcludeCharacters:[^…]
Exampleusage:[a-zA-Z0-9] Exampleusage:[^]
InclusionCharacters:[]Include[^]Exclude
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Regex BasicsOnly Some May Pass
Regex:server:[a-z0-9]+ Regex:server:[^]
server:253fsf2,host=23423server:253fsf2,host=23423server:253f sf2,host=23423
Keepgoingsolongasyouhit
charactersthatarelowercasea-Zor0-9
Gountilyouhitaspace
InclusionCharacters:[]Include[^]Exclude
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Regex BasicsSay What Again
RepetitionisusedtodefinetheexactnumberofcharactersOranupperandlowerboundaryofacceptablecharacters
(ortheexactnumberofrepetitionsofapattern)
Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions
InclusionCharacters:[]Include[^]Exclude
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Regex BasicsSay What Again
Regex:IP: \d{3}\.\d{3}\.\d{3}\.\d{3} Regex:IP: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}IP:172.106.190.100IP:10.24.255.2IP:224.252.2.52
IP:172.16.19.1IP:10.24.255.2IP:224.252.2.52
Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions
InclusionCharacters:[]Include[^]Exclude
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Regex BasicsTo Protect and Give Options
Usetospecifyrepetitionforadjacentelementsinordertoformpatterns
Laterwe’llusetheseas“capturegroups”
Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions
InclusionCharacters:[]Include[^]Exclude
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
LogicalGroupings:()WrapsetsoftheRegex
Regex BasicsTo Protect and Give Options
AlternateRegex:IP:(\d{1,3}\.){3}\d{1,3}IP:172.16.19.1IP:10.24.255.2IP:224.252.2.52
Repeats\d{1,3}\.threetimesThentacksonthelast\d{1,3}
RevisitingtheIPMatchingfromacoupleofslidesago
Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions
InclusionCharacters:[]Include[^]Exclude
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
LogicalGroupings:()WrapsetsoftheRegex
Regex Basics
Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions
InclusionCharacters:[]Include[^]Exclude
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
LogicalGroupings:()WrapsetsoftheRegex
NamedCaptureGroups:(?<CaptureGroupName>stuff)
Thisnamesthecapturegroup(e.g.,logicalgrouping).Nowwhenyoureturnthecapture,ithasanameandnotjust“CaptureGroup1”
The Last (Not so Basic) Element
Regex BasicsThe Last (Not so Basic) Element
Regex:user:\s(?<username>[^@]+)
Log1:blahblahuser: [email protected]:moreblahuser: [email protected]
Gountilwehitan@CaptureasfieldusernameAnchoroffuser:\s
Repetition:{#}NumberofRepetitions{#,#}RangeofRepetitions
InclusionCharacters:[]Include[^]Exclude
ProtectionCharacters:\ Thenextcharacterisaliteral
SpecialCharacters:|Alternative/“or”
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
LogicalGroupings:()WrapsetsoftheRegex
NamedCaptureGroups:(?<CaptureGroupName>stuff)
Regex in SPLUsing Regular Expressions to improve your SPL
▶ Field Extractions• erex• rex• Interactive Field Extractor• Props – Extract• Transforms - Report
▶ Evaluation• Regex• match• replace
Regex in Your SPLSearch Time Regex
Fields are fundamental to Splunk Search
Regex provides granularity when evaluating data
© 2017 SPLUNK INC.
Field ExtractionsOn the fly (No need to work ahead)
erex CommandField Extractions Using Examples
UseSplunktogenerateregularexpressionsbyprovidingalistofvaluesfromthedata.
▶ Scenario: Extract the first word of each sample phrase from | windbag• Step 1, find the samples• Step 2, extract the field
erex CommandField Extractions Using Examples
|windbag |erexfirstwords examples="Unë,یؤلمن,Կրնամ"
Eastereggthatcreatessampledata
ExamplesfromthedataNewFieldtocreate
Erex Command: …| erex <newFieldName> examples=“example1,example2”
erex CommandField Extractions Using Examples
|windbag |erexfirstwordsexamples="Unë,یؤلمن,Կրնամ"
Thevalueserexgeneratedbasedonthesamples
NewFieldcreated
erex CommandField Extractions Using Examples
▶ Erex is a great introduction to using regular expressions for field extraction.• Erex provides the rex that it generated• Going forward, use the rex in your saved
searches and dashboards.• Rex is more efficient
rex CommandExtract Fields Using Regular Expressions at Search Time
…|rexfield={what_field}“FrontAnchor(?<extraction>{characters}+)BackAnchor”
Creates a Field Extraction
rex CommandExtract Fields Using Regular Expressions at Search Time
| windbag | rex field=sample "^(?<FirstWord>[\S+]*)"
Specifythefieldtorexfrom
FrontAnchorNamedFieldExtraction Grabanynon-spacecharacter
rex CommandExtract Fields Using Regular Expressions at Search Time
| windbag | rex field=sample"^(?<FirstWord>[\S+]*)"
NamedFieldExtraction Grabanynon-spacecharacter
rex CommandUse Rex to Perform SED Style Substitutions
SED is a stream editor. It can be used to create substitutions in data.
Splunk uses the rex command to perform Search-Time substitutions.
rex CommandUse Rex to Perform SED Style Substitutions
Setthemode
s forsubstituteg forglobal(morethanonce)
Substitutethestuffbetweenthefirst/andsecond/ withthestuffbetweensecond/ andthird/
()tocreateacapturegroup\1topastecapturegroup
| windbag | search lang="*Norse" | rex mode=sed "s/Old (Norse)/Not-so-old \1/g"
rex CommandUse Rex to Perform SED Style Substitutions
Setthemode
s forsubstitute
()tocreateacapturegroup\1topastecapturegroup
…| rex mode=sed "s/Old (Norse)/Not-so-old \1/g"
Result:
© 2017 SPLUNK INC.
EvaluationUsing Regular Expressions for Pattern Matching
Regex CommandFilter Using Regular Expressions
sourcetype=fs_notification |regexchgs="^modtime"
Fieldtoevaluate Regex
Match FunctionFilter Using Regular Expressions
…|evaln=if(match(field,”^MyRegex”,1,2)
match(SUBJECT),”REGEX”
sourcetype=access_combined_wcookie|eval com=if(match(referer,"http:.*\.com"),"True","False")
Match.Returns1foritmatches,0fornot.
Fieldtoevaluate TheRegex
Replace CommandSwitch Data at Search Time
Replacefieldvalueswiththevaluesyouspecify…|replace“<whoever>” WITH“<whomever>” IN<target_field>
Replace CommandSwitch Data at Search Time
Replacefieldvalueswiththevaluesyouspecify…|replace“<whoever>” WITH“<whomever>” IN<target_field>
| windbag | replace "Euro" with "Euro: How is a currency a language" in lang
Stringtobereplaced
Stringtoreplacewith
Fieldinwhichtomakethereplacement
operator operator
PersistenceRegular Expressions That Exist Outside Your Search
Untilthispoint,everyoneofourextractionshaveonlyexistedinthesearch.But,whatifwewantthemtopersist?Ortosharethem?
1.InteractiveFieldExtractor
2.ExtractionsinProps/Transforms
– Walk-through UI
– You may want to rewrite the generated Regex
– Does not require admin rights
– Straight editing in props.conf
– Requires Admin Rights (or an admin to put in place)
– Edit directly in transforms.conf
– Invoked by props.conf
– Requires Admin Rights (or an admin to put in place)
Persistent Field ExtractionsComparing The Persistent Field Extractions
InteractiveFieldExtractor ExtractinProps ReportinTransforms
Q&AMichael Simko | Sr. Engineer/Instructor
© 2017 SPLUNK INC.
1. Use Regex to create powerful filters in your SPL
2. Use Regex to create field extractions
3. Regex doesn’t have to be hard. You can do this!
Regex in your SPL
Key Takeaways
© 2017 SPLUNK INC.
Don't forget to rate this session in the .conf2017 mobile app
Thank You
Appendix ACaveats
rex Command – CaveatUse Rex to Perform SED Style Substitutions
The substitution from rex comes after the lang field is extracted. So even though the event data is showing us the substitution, the field lang is showing the original value.
Caveat:| windbag | search lang="*Norse" | rex mode=sed "s/Old (Norse)/Not-so-old \1/g"
Appendix BExercises to Practice With
Regex BasicsThe Main Elements
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Operators:*ZeroorMore+OneorMore?ZeroorOne
Scenario Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+LearnbyFire:WhichofthesewillthesampleRegexmatch?
A. 002421Februari10831:242525:22352B. 07Feb1712:53:36AMC. Feb13201718:46:56D. 14February201707:45:47Z
(answersonnextslide)
Regex BasicsThe Main Elements
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Operators:*ZeroorMore+OneorMore?ZeroorOne
A. 002421 Februari 1083 1:242525:22352B. 07 Feb 17 12:53:36AMC. Feb 13 2017 18:46:56D. 14 February 2017 07:45:47:46
Scenario Regex: ^\d+\s\w+\d+\s\d+:\d+:\d+LearnbyFire:WhichofthesewillthesampleRegexmatch?
Regex BasicsThe Main Elements
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Operators:*ZeroorMore+OneorMore?ZeroorOne
Practice:Createa Regexthatdescribesallthreeofthefollowingstrings
06February2017192.168.1.205Apr201410.2.1.15031July202019..15.63
Regex BasicsThe Main Elements
ControlCharacters:^StartofaLine$EndofaLine
CharacterTypes:\sWhiteSpace\SNotwhitespace\dDigit\DNotDigit\wWordCharacter\WNotWordCharacters
Operators:*ZeroorMore+OneorMore?ZeroorOne
Scenario:Createa Regexthatdescribesthefollowingstrings
06 February 2017 192.168.1.205 Apr 2014 10.2.1.15031 July 2020 19..15.63
Asolution:\d+\s\w+\s\d+\s\d*\.\d*\.\d*\.\d*
Regex Basics
1.OpenupyourSplunk2.|windbag|head20|table_raw3.Copythe_rawdata4.PastethedatainRegex101.com
Goals:Extractthefollowingfieldsforeachevent:langsampleTheDatewithoutTimeTheTime
Performtheseas“named” extractions
The Main Elements
Replace CommandSwitch Data at Search Time
Sillyversiontotryonyourown|windbag|head20|replace"1"WITH"Uno"inodd
Tryit,thenclickthedownchevrontoseetheresults