Regular Expressions: The Proper Care and Feeding Zain Naboulsi MSDN Developer Evangelist Microsoft.

Post on 03-Jan-2016

218 views 0 download

Transcript of Regular Expressions: The Proper Care and Feeding Zain Naboulsi MSDN Developer Evangelist Microsoft.

Regular Expressions: Regular Expressions: The Proper Care and FeedingThe Proper Care and Feeding

Zain NaboulsiZain NaboulsiMSDN Developer EvangelistMSDN Developer EvangelistMicrosoftMicrosoft

Introduction to Regular ExpressionsIntroduction to Regular Expressions

What Are Regular Expressions?What Are Regular Expressions?

Why Would I Want To Use Them?Why Would I Want To Use Them?

Common MisconceptionsCommon Misconceptions

Anatomy of An Regular ExpressionAnatomy of An Regular Expression

DisclaimerDisclaimer

All opinions in this session are All opinions in this session are provided "AS IS" with no warranties, provided "AS IS" with no warranties, and confer no rights.and confer no rights.

All opinions are my mine and don't All opinions are my mine and don't necessarily reflect the opinion of necessarily reflect the opinion of Microsoft.Microsoft.

What Are What Are Regular Expressions?Regular Expressions?

Regular ExpressionsRegular Expressions““Regular expressions provide a powerful, Regular expressions provide a powerful, flexible, and efficient method for processing flexible, and efficient method for processing text. text.

[They allow] you to quickly parse large [They allow] you to quickly parse large amounts of text to find specific character amounts of text to find specific character patterns; to extract, edit, replace, or delete patterns; to extract, edit, replace, or delete text substrings; or to add the extracted strings text substrings; or to add the extracted strings to a collection in order to generate a report.”to a collection in order to generate a report.”

http://msdn2.microsoft.com/en-us/library/hs600312.aspx

Do What?Do What?

Simply put, regular expressions will help you Simply put, regular expressions will help you find text patterns and do pretty much find text patterns and do pretty much whatever you want to it.whatever you want to it.

It sounds simple but regular expressions are It sounds simple but regular expressions are one of the most difficult and least understood one of the most difficult and least understood constructs in programming.constructs in programming.

WarningWarning

Regular expressions are part art and part Regular expressions are part art and part science. There is a steep learning curve but science. There is a steep learning curve but the rewards are significant.the rewards are significant.

The PossibilitiesThe Possibilities

Okay, So What Is A Pattern?Okay, So What Is A Pattern?

““a regular or repetitive form, order, or a regular or repetitive form, order, or arrangement”arrangement”

http://encarta.msn.com/dictionary_1861724272/pattern.html

PATTERNS ARE PATTERNS ARE EVERYWHEREEVERYWHERE

Checker BoardChecker Board

Fibonacci SequenceFibonacci Sequence

TextText

The IP Address for the server is 192.169.1.3 The IP Address for the server is 192.169.1.3 but it should be 192.168.1.5, and I am not but it should be 192.168.1.5, and I am not sure how we managed to get into the sure how we managed to get into the 192.169.1 subnet but we need to remove 192.169.1 subnet but we need to remove ourselves from it immediately unless we are ourselves from it immediately unless we are moving to it then I want the new IP to be moving to it then I want the new IP to be 192.169.1.3 I suppose.192.169.1.3 I suppose.

YOU HAVE USED YOU HAVE USED PATTERNS BEFOREPATTERNS BEFORE

Wildcard Searches For FilesWildcard Searches For Files

Wildcards = VERY simple pattern matching Wildcards = VERY simple pattern matching constructs and are NOT regular expressionsconstructs and are NOT regular expressions

Examples:Examples:*.txt*.txt

b*b*b*b*

?un.txt?un.txt

Why Use Why Use Regular Expressions?Regular Expressions?

Major Uses of Major Uses of Regular ExpressionsRegular Expressions

Matching = find any text anywhere Matching = find any text anywhere regardless of complexityregardless of complexity

Substitution = once found, you can replace Substitution = once found, you can replace texttext

FeaturesFeatures

Can literally turn 10 lines of code into 1 Can literally turn 10 lines of code into 1

Extremely efficient pattern matching Extremely efficient pattern matching mechanismmechanism

Once learned, becomes one of the most Once learned, becomes one of the most indispensible techniques you can haveindispensible techniques you can have

Languages That SupportLanguages That SupportRegular ExpressionsRegular Expressions

All .NET languagesAll .NET languages

JScriptJScript

XML: XPath & XQueryXML: XPath & XQuery

T-SQLT-SQL

PERLPERL

JavaJava

[insert language here][insert language here]

ASP.NET ControlASP.NET Control

Common Common MisconceptionsMisconceptions

MisconceptionsMisconceptions

Regular Expressions can do complex Regular Expressions can do complex programming logicprogramming logic

Regular Expressions can do mathRegular Expressions can do math

Regular Expressions will give me winning Regular Expressions will give me winning lottery numberslottery numbers

Anatomy of an Anatomy of an Regular ExpressionRegular Expression

A Sample ExpressionA Sample Expression

^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$

AnatomyAnatomy

CharactersCharacters

MetacharactersMetacharacters

SubexpressionsSubexpressions

CharactersCharacters

A literal character represents any valid value A literal character represents any valid value represented by the current encoding method.represented by the current encoding method.

For example the “@” literal character is For example the “@” literal character is represented as the decimal value 65 in the represented as the decimal value 65 in the ASCII encoding system.ASCII encoding system.

^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$

MetacharactersMetacharacters

Unlike literal characters, metacharacters are Unlike literal characters, metacharacters are used as “place holders” for characters.used as “place holders” for characters.

For example, the metacharacter “\t” in regular For example, the metacharacter “\t” in regular expressions represents the tab character, expressions represents the tab character, whereas the “\d” matches any digit 0 through whereas the “\d” matches any digit 0 through 9.9.

^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$

SubexpressionsSubexpressions

These are simply smaller expressions nested These are simply smaller expressions nested inside larger ones.inside larger ones.

For example, the following expression has a For example, the following expression has a subexpression inside it:subexpression inside it:

(john|jane)doe(john|jane)doe

Must Have ResourcesMust Have Resources

ToolsTools

http://www.RegExLib.com

http://www.ultrapico.com/Expresso.htm

BookBook

ToolsTools

SummarySummary

SummarySummary

Regular expressions can be used to Regular expressions can be used to manipulate and change textmanipulate and change text

While there is a steep learning curve, regular While there is a steep learning curve, regular expressions are invaluable as a programming expressions are invaluable as a programming tooltool

Regular expressions are supported by Regular expressions are supported by virtually all major programming languagesvirtually all major programming languages

Next StepsNext StepsCheck out some of the patterns on the Check out some of the patterns on the RegExLib siteRegExLib site

Do a live search on regular expressions and Do a live search on regular expressions and see what others have to say about themsee what others have to say about them

Prepare your self mentally for a rewarding Prepare your self mentally for a rewarding journey into the world of regular expressionsjourney into the world of regular expressions

Have Fun!!!Have Fun!!!