Characters, String and Regular expressions. Characters char data type is used to represent a single...

63
Characters, String and Regular expressions

Transcript of Characters, String and Regular expressions. Characters char data type is used to represent a single...

Page 1: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Characters, String and Regular expressions

Page 2: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Characters

char data type is used to represent a single character.

Characters are stored in a computer memory using some form of encoding.Java uses Unicode, which includes ASCII,

for representing char constants.

Page 3: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

ASCII Encoding

For example, character 'O' is 79 (row value 70 + col value 9 = 79).

For example, character 'O' is 79 (row value 70 + col value 9 = 79).

O

9

70

Page 4: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Unicode Encoding

The Unicode Worldwide Character Standard (Unicode) supports the interchange, processing, and display of the written texts of diverse languages.

Java uses the Unicode standard for representing char constants.char ch1 = 'X';

System.out.println(ch1);System.out.println( (int) ch1);

X88

Page 5: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Character Processing

Declaration and initialization

Declaration and initialization

char ch1, ch2 = ‘X’;

Type conversion between int and char.

Type conversion between int and char.

System.out.print("ASCII code of character X is " + (int) 'X' );

System.out.print("Character with ASCII code 88 is " + (char)88 );

This comparison returns true because ASCII value of 'A' is 65 while that of 'c' is 99.

This comparison returns true because ASCII value of 'A' is 65 while that of 'c' is 99.

‘A’ < ‘c’

Page 6: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Strings

A string is a sequence of characters that is treated as a single value.

Instances of the String class are used to represent strings in Java.

Page 7: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

String declaration

Create a String objectString <variable name>;

<variable name>=new String(“<value of a string>”);

• Create a String literalString <variable name>;

<variable name> = “<value of a string”;

Page 8: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Example

String word1;

word1 = new String(“Java”);

OR

String word1;

word1 = “Java”;

Page 9: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Examples

We can do thisbecause Stringobjects areimmutable.

Page 10: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

String constructor

No-argument constructor One-argument constructor

A String object

One-argument constructor A char array

Three-argument constructor A char array An integer specifies the starting position An integer specifies the number of characters to access

Page 11: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

To compute how many characters the string myString contains, we use:

a. myString.size

b. myString.size()

c. myString.length

d. myString.length()

Page 12: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

To compute how many characters the string myString contains, we use:

a. myString.size

b. myString.size()

c. myString.length

d. myString.length()

Page 13: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

Java uses this to represent characters of diverse languages

a. ASCII

b. UNICODE

c. EDBIC

d. BINARY

Page 14: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

Java uses this to represent characters of diverse languages

a. ASCII

b. UNICODE

c. EDBIC

d. BINARY

Page 15: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Accessing Individual Elements

Individual characters in a String accessed with the charAt method.

0 1 2 3 4 5 6

S u m a t r a

String name = "Sumatra";

nameThis variable refers to the whole string.

This variable refers to the whole string.

name.charAt( 3 )The method returns the character at position # 3.

The method returns the character at position # 3.

Page 16: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Other Useful String Operators

Method Meaning

compareTo Compares the two strings.str1.compareTo( str2 )

substring Extracts the a substring from a string.str1.substring( 1, 4 )

trim Removes the leading and trailing spaces.str1.trim( )

valueOf Converts a given primitive data value to a string.String.valueOf( 123.4565 )

startsWith Returns true if a string starts with a specified prefix string.str1.startsWith( str2 )

endsWith Returns true if a string ends with a specified suffix string.str1.endsWith( str2 )

length Return the length of the given string

str1.length()

Page 17: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Compute Length of a string

Method: length()

Returns the length of a stringExample:

String strVar;

strVar = new String(“Java”);

int len = strVar.length();

Page 18: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Substring

Method:

Extract a substring from a given string by specifying the beginning and ending positions

Example:

String strVar, strSubStr;

strVar = new String(“Exam after Easter”);

strSubStr = strVar.substring(0,4);

Page 19: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Index position of a substring within another string

Method: Find an index position of a substring within another string.

Example:String strVar1 = “Google it”;String strVar2 = “Google”;int index;

index = strVar1.indexOf(strVar2);

Page 20: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

String concatenation

Method: Create a new string from two strings by concatenating the two strings.

Example:String strVar1 = “Google”;String strVar2 = “ Search Engine”;String sumStr;

sumStr = strVar1.concat(strVar2);

Page 21: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

What method is used to refer to individual character in a String

a. getBytes

b. indexOf

c. getChars

d. charAt

Page 22: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

What method is used to refer to individual character in a String

a. getBytes

b. indexOf

c. getChars

d. charAt

Page 23: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

To compare two strings in Java, we use

a. ==

b. equals method

c. !=

d. <>

Page 24: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

To compare two strings in Java, we use

a. ==

b. equals method

c. !=

d. <>

Page 25: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

String comparison

Methods:equalsequalsIgnoreCasecompareTocompareToIgnoreCase

Page 26: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

String comparison

equals, and equalsIgnoreCase

Example:

String string1 =“COMPSCI”;

String string2 = “compsci”

boolean isEqual;

isEqual = string1.equals(string2);

Page 27: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Common error

Comparing references with == can lead to logic errors, because == compares the references to determine whether they refer to the same object, not whether two objects have the same contents. When two identical (but separate) objects are compared with ==, the result will be false. When comparing objects to determine whether they have the same contents, use method equals.

Page 28: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

String comparison

compareTo

Example:

String string1 =“Adam”;String string2 = “AdamA”;int compareResult;compareResult =

string1.compareTo(string2);

Page 29: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

String comparison

- string1.compareTo(string2)Compares two strings lexicographically

will return 0 if two strings are equalwill return negative value if string1 is less than

string 2will return positive value if string1 is greater

than string 2

The comparison is based on the Unicode value of each character in the strings

Page 30: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

String comparison

The comparison is based on the Unicode value of each character in the strings

let k be the smallest index valid for both strings; compareTo returns the difference of the two character values at position k in the two string -- that is, the value: character at the position k of string 1 – character at the position k of string 2

Page 31: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

regionMatches

regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)

A substring of this String object is compared to a substring of the argument other.

The result is true if these substrings represent character sequences that are the same, ignoring case if and only if ignoreCase is true.

Page 32: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

The String Class is Immutable

In Java a String object is immutableThis means once a String object is created, it

cannot be changed, such as replacing a character with another character or removing a character

The String methods we have used so far do not change the original string. They created a new string from the original. For example, substring creates a new string from a given string.

Page 33: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

If x.equals(y) is true, then x==y is always true

a. True

b. False

Page 34: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

If x.equals(y), then x==y is always true

a. True

b. False

Page 35: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

The StringBuffer Class

In many string processing applications, we would like to change the contents of a string. In other words, we want it to be mutable.

Manipulating the content of a string, such as replacing a character, appending a string with another string, deleting a portion of a string, and so on, may be accomplished by using the StringBuffer class.

Page 36: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

StringBuffer Example

StringBuffer word = new StringBuffer("Java");word.setCharAt(0, 'D');word.setCharAt(1, 'i');

Changing a string Java to Diva

word

: StringBuffer

Java

Before

word

: StringBuffer

Diva

After

Page 37: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Delete a substring from a StringBuffer object

StringBuffer word = new StringBuffer(“CCourse”);

word.delete(0,1);

Page 38: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Append a string

StringBuffer word = new StringBuffer(“CS ”);

word.append(“Course”);

Page 39: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Insert a string

StringBuffer word = new StringBuffer(“MCS Course”);

word.insert(4,“220”);

Page 40: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Convert from StringBuffer to String

StringBuffer word = new StringBuffer(“Java”);

word.setCharAt(0,’D’);

word.setCharAt(1,’i’);

System.out.println(word.toString());

Page 41: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

Both the String and StringBuffer classes include the charAt and setCharAt methods

a. Trueb. False

Page 42: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

Both the String and StringBuffer classes include the charAt and setCharAt methods

a. Trueb. False

Page 43: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

What will be the value of str after the following statements are executed:

String str;StringBuffer strBuf;str ="Decaffeinated";strBuf = new StringBuffer(str.substring(2,7));strBuf.setCharAt(1,'o');strBuf.append('e');str = strBuf.toString();

Page 44: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

StringBuffer methods

Method length Return StringBuffer length

Method capacity Return StringBuffer capacity

Method setLength Increase or decrease StringBuffer length

Method ensureCapacity Set StringBuffer capacity Guarantee that StringBuffer has minimum

capacity

Page 45: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Class StringTokenizer

TokenizerPartition String into individual substringsUse delimiter

Typically whitespace characters (space, tab, newline, etc)

Java offers java.util.StringTokenizer

Page 46: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Outline 1 // Fig. 29.18: TokenTest.java

2 // StringTokenizer class.

3 import java.util.Scanner;

4 import java.util.StringTokenizer;

5

6 public class TokenTest

7 {

8 // execute application

9 public static void main( String args[] )

10 {

11 // get sentence

12 Scanner scanner = new Scanner( System.in );

13 System.out.println( "Enter a sentence and press Enter" );

14 String sentence = scanner.nextLine();

15

16 // process user sentence

17 StringTokenizer tokens = new StringTokenizer( sentence );

18 System.out.printf( "Number of elements: %d\nThe tokens are:\n",

19 tokens.countTokens() );

20

21 while ( tokens.hasMoreTokens() )

22 System.out.println( tokens.nextToken() );

23 } // end main

24 } // end class TokenTest Enter a sentence and press Enter This is a sentence with seven tokens Number of elements: 7 The tokens are: This is a sentence with seven tokens

Page 47: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Pattern Example Suppose students are assigned a three-digit code:

The first digit represents the major (5 indicates computer science);

The second digit represents either in-state (1), out-of-state (2), or international (3);

The third digit indicates campus housing: On-campus dorms are numbered 1-7. Students living off-campus are represented by the digit 8.

The 3-digit pattern to represent computer science majors living on-campus is

5[123][1-7]

firstcharacter

is 5second

characteris 1, 2, or 3

thirdcharacter

is any digit between 1 and 7

Page 48: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Regular Expressions, Class Pattern and Class MatcherRegular expression

Sequence of characters and symbolsUseful for validating input and ensuring data formatFacilitate the construction of a compiler

Regular-expression operations in StringMethod matches

Matches the contents of a String to regular expressionReturns a boolean indicating whether the match

succeeded

Page 49: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Regular Expressions, Class Pattern and Class Matcher

Predefine character classesEscape sequence that represents a group

of characterDigit

Numeric characterWord character

Any letter, digit, underscoreWhitespace character

Space, tab, carriage return, newline, form feed

Page 50: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Predefined character classes.

Character Matches Character Matches

\d any digit \D any non-digit

\w any word character \W any non-word character

\s any whitespace \S any non-whitespace

Page 51: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Regular ExpressionsOther patterns

Square brackets ([])Match characters that do not have a predefined character

classE.g., [aeiou] matches a single character that is a vowel

Dash (-)Ranges of charactersE.g., [A-Z] matches a single uppercase letter

^Not include the indicated charactersE.g., [^Z] matches any character other than Z

Page 52: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Regular expression

QuantifiersPlus (+)

Match one or more occurrencesE.g., A+

Matches AAA but not empty string

Asterisk (*)Match zero or more occurrencesE.g., A*

Matches both AAA and empty string

Others in Fig. 29.22

Page 53: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Quantifiers used in regular expressions.

Quantifier Matches

* Matches zero or more occurrences of the pattern.

+ Matches one or more occurrences of the pattern.

? Matches zero or one occurrences of the pattern.

{n} Matches exactly n occurrences.

{n,} Matches at least n occurrences.

{n,m} Matches between n and m (inclusive) occurrences.

Page 54: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Regular Expression Examples

Expression Description[013] A single digit 0, 1, or 3.

[0-9][0-9] Any two-digit number from 00 to 99.

[0-9&&[^4567]] A single digit that is 0, 1, 2, 3, 8, or 9.

[a-z0-9] A single character that is either a lowercase letter or a digit.

[a-zA-Z][a-zA-Z0-9_$]*

A valid Java identifier consisting of alphanumeric characters, underscores, and dollar signs, with the first character being an alphabet.

[wb](ad|eed) Matches wad, weed, bad, and beed.

Page 55: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Regular expression

Replacing substrings and splitting stringsString method replaceAll

Replace text in a string with new textString method replaceFirst

Replace the first occurrence of a pattern matchString method split

Divides string into several substrings

Page 56: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Regular expression

Class PatternRepresents a regular expression

Class MatchContains a regular-expression pattern and a CharSequence

Interface CharSequenceAllows read access to a sequence of charactersString and StringBuffer implement CharSequence

Page 57: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Regular Expression Examples

Expression Description[013] A single digit 0, 1, or 3.

[0-9][0-9] Any two-digit number from 00 to 99.

[0-9&&[^4567]] A single digit that is 0, 1, 2, 3, 8, or 9.

[a-z0-9] A single character that is either a lowercase letter or a digit.

[a-zA-z][a-zA-Z0-9_$]*

A valid Java identifier consisting of alphanumeric characters, underscores, and dollar signs, with the first character being an alphabet.

[wb](ad|eed) Matches wad, weed, bad, and beed.

Page 58: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Matching

Searches for 2 character pattern whose first character may be any uppercase letter between A and G, and whose second character may be any number except 4

Searches for a character pattern that may be any alphabet except p,q,r,s, or t

A. str.matches(“[a-zA-Z]&&[^pqrst]”);B. str.matches(“[A-G][0-9 && [^4]]”);

Page 59: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Matching

Searches for 2 character pattern whose first character may be any uppercase letter between A and G, and whose second character may be any number except 4

Searches for a character pattern that may be any alphabet except p,q,r,s, or t

A. str.matches(“[a-zA-Z]&&[^pqrst]”);B. str.matches(“[A-G][0-9]&&[^4]”);

B

A

Page 60: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

Which character sequence would be used to designate a character pattern of a fixed length of three digits

A. 3[0-9]

B. [0-9][0-9][0-9]

C. [0-9]3

Page 61: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

Which character sequence would be used to designate a character pattern of a fixed length of three digits

A. 3[0-9]

B. [0-9][0-9][0-9]

C. [0-9]3

Page 62: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

Choose the correct argument for the following code that searches for any number between 100 and 999 in a given string.

str.matches(“ “);

A. [0-9][0-9][1-9]B. [0-9][0-9][0-9]C. [1-9][0-9][0-9]

Page 63: Characters, String and Regular expressions. Characters char data type is used to represent a single character. Characters are stored in a computer memory.

Review

Choose the correct argument for the following code that searches for any number between 100 and 999 in a given string.

str.matches(“ “);

A. [0-9][0-9][1-9]B. [0-9][0-9][0-9]C. [1-9][0-9][0-9]