Characters, String and Regular expressions. Characters char data type is used to represent a single...

Post on 27-Dec-2015

218 views 1 download

Transcript of Characters, String and Regular expressions. Characters char data type is used to represent a single...

Characters, String and Regular expressions

Characters

char data type is used to represent a single character.

Characters are stored in a computer memory using some form of encoding.Java uses Unicode, which includes ASCII,

for representing char constants.

ASCII Encoding

For example, character 'O' is 79 (row value 70 + col value 9 = 79).

For example, character 'O' is 79 (row value 70 + col value 9 = 79).

O

9

70

Unicode Encoding

The Unicode Worldwide Character Standard (Unicode) supports the interchange, processing, and display of the written texts of diverse languages.

Java uses the Unicode standard for representing char constants.char ch1 = 'X';

System.out.println(ch1);System.out.println( (int) ch1);

X88

Character Processing

Declaration and initialization

Declaration and initialization

char ch1, ch2 = ‘X’;

Type conversion between int and char.

Type conversion between int and char.

System.out.print("ASCII code of character X is " + (int) 'X' );

System.out.print("Character with ASCII code 88 is " + (char)88 );

This comparison returns true because ASCII value of 'A' is 65 while that of 'c' is 99.

This comparison returns true because ASCII value of 'A' is 65 while that of 'c' is 99.

‘A’ < ‘c’

Strings

A string is a sequence of characters that is treated as a single value.

Instances of the String class are used to represent strings in Java.

String declaration

Create a String objectString <variable name>;

<variable name>=new String(“<value of a string>”);

• Create a String literalString <variable name>;

<variable name> = “<value of a string”;

Example

String word1;

word1 = new String(“Java”);

OR

String word1;

word1 = “Java”;

Examples

We can do thisbecause Stringobjects areimmutable.

String constructor

No-argument constructor One-argument constructor

A String object

One-argument constructor A char array

Three-argument constructor A char array An integer specifies the starting position An integer specifies the number of characters to access

Review

To compute how many characters the string myString contains, we use:

a. myString.size

b. myString.size()

c. myString.length

d. myString.length()

Review

To compute how many characters the string myString contains, we use:

a. myString.size

b. myString.size()

c. myString.length

d. myString.length()

Review

Java uses this to represent characters of diverse languages

a. ASCII

b. UNICODE

c. EDBIC

d. BINARY

Review

Java uses this to represent characters of diverse languages

a. ASCII

b. UNICODE

c. EDBIC

d. BINARY

Accessing Individual Elements

Individual characters in a String accessed with the charAt method.

0 1 2 3 4 5 6

S u m a t r a

String name = "Sumatra";

nameThis variable refers to the whole string.

This variable refers to the whole string.

name.charAt( 3 )The method returns the character at position # 3.

The method returns the character at position # 3.

Other Useful String Operators

Method Meaning

compareTo Compares the two strings.str1.compareTo( str2 )

substring Extracts the a substring from a string.str1.substring( 1, 4 )

trim Removes the leading and trailing spaces.str1.trim( )

valueOf Converts a given primitive data value to a string.String.valueOf( 123.4565 )

startsWith Returns true if a string starts with a specified prefix string.str1.startsWith( str2 )

endsWith Returns true if a string ends with a specified suffix string.str1.endsWith( str2 )

length Return the length of the given string

str1.length()

Compute Length of a string

Method: length()

Returns the length of a stringExample:

String strVar;

strVar = new String(“Java”);

int len = strVar.length();

Substring

Method:

Extract a substring from a given string by specifying the beginning and ending positions

Example:

String strVar, strSubStr;

strVar = new String(“Exam after Easter”);

strSubStr = strVar.substring(0,4);

Index position of a substring within another string

Method: Find an index position of a substring within another string.

Example:String strVar1 = “Google it”;String strVar2 = “Google”;int index;

index = strVar1.indexOf(strVar2);

String concatenation

Method: Create a new string from two strings by concatenating the two strings.

Example:String strVar1 = “Google”;String strVar2 = “ Search Engine”;String sumStr;

sumStr = strVar1.concat(strVar2);

Review

What method is used to refer to individual character in a String

a. getBytes

b. indexOf

c. getChars

d. charAt

Review

What method is used to refer to individual character in a String

a. getBytes

b. indexOf

c. getChars

d. charAt

Review

To compare two strings in Java, we use

a. ==

b. equals method

c. !=

d. <>

Review

To compare two strings in Java, we use

a. ==

b. equals method

c. !=

d. <>

String comparison

Methods:equalsequalsIgnoreCasecompareTocompareToIgnoreCase

String comparison

equals, and equalsIgnoreCase

Example:

String string1 =“COMPSCI”;

String string2 = “compsci”

boolean isEqual;

isEqual = string1.equals(string2);

Common error

Comparing references with == can lead to logic errors, because == compares the references to determine whether they refer to the same object, not whether two objects have the same contents. When two identical (but separate) objects are compared with ==, the result will be false. When comparing objects to determine whether they have the same contents, use method equals.

String comparison

compareTo

Example:

String string1 =“Adam”;String string2 = “AdamA”;int compareResult;compareResult =

string1.compareTo(string2);

String comparison

- string1.compareTo(string2)Compares two strings lexicographically

will return 0 if two strings are equalwill return negative value if string1 is less than

string 2will return positive value if string1 is greater

than string 2

The comparison is based on the Unicode value of each character in the strings

String comparison

The comparison is based on the Unicode value of each character in the strings

let k be the smallest index valid for both strings; compareTo returns the difference of the two character values at position k in the two string -- that is, the value: character at the position k of string 1 – character at the position k of string 2

regionMatches

regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)

A substring of this String object is compared to a substring of the argument other.

The result is true if these substrings represent character sequences that are the same, ignoring case if and only if ignoreCase is true.

The String Class is Immutable

In Java a String object is immutableThis means once a String object is created, it

cannot be changed, such as replacing a character with another character or removing a character

The String methods we have used so far do not change the original string. They created a new string from the original. For example, substring creates a new string from a given string.

Review

If x.equals(y) is true, then x==y is always true

a. True

b. False

Review

If x.equals(y), then x==y is always true

a. True

b. False

The StringBuffer Class

In many string processing applications, we would like to change the contents of a string. In other words, we want it to be mutable.

Manipulating the content of a string, such as replacing a character, appending a string with another string, deleting a portion of a string, and so on, may be accomplished by using the StringBuffer class.

StringBuffer Example

StringBuffer word = new StringBuffer("Java");word.setCharAt(0, 'D');word.setCharAt(1, 'i');

Changing a string Java to Diva

word

: StringBuffer

Java

Before

word

: StringBuffer

Diva

After

Delete a substring from a StringBuffer object

StringBuffer word = new StringBuffer(“CCourse”);

word.delete(0,1);

Append a string

StringBuffer word = new StringBuffer(“CS ”);

word.append(“Course”);

Insert a string

StringBuffer word = new StringBuffer(“MCS Course”);

word.insert(4,“220”);

Convert from StringBuffer to String

StringBuffer word = new StringBuffer(“Java”);

word.setCharAt(0,’D’);

word.setCharAt(1,’i’);

System.out.println(word.toString());

Review

Both the String and StringBuffer classes include the charAt and setCharAt methods

a. Trueb. False

Review

Both the String and StringBuffer classes include the charAt and setCharAt methods

a. Trueb. False

Review

What will be the value of str after the following statements are executed:

String str;StringBuffer strBuf;str ="Decaffeinated";strBuf = new StringBuffer(str.substring(2,7));strBuf.setCharAt(1,'o');strBuf.append('e');str = strBuf.toString();

StringBuffer methods

Method length Return StringBuffer length

Method capacity Return StringBuffer capacity

Method setLength Increase or decrease StringBuffer length

Method ensureCapacity Set StringBuffer capacity Guarantee that StringBuffer has minimum

capacity

Class StringTokenizer

TokenizerPartition String into individual substringsUse delimiter

Typically whitespace characters (space, tab, newline, etc)

Java offers java.util.StringTokenizer

Outline 1 // Fig. 29.18: TokenTest.java

2 // StringTokenizer class.

3 import java.util.Scanner;

4 import java.util.StringTokenizer;

5

6 public class TokenTest

7 {

8 // execute application

9 public static void main( String args[] )

10 {

11 // get sentence

12 Scanner scanner = new Scanner( System.in );

13 System.out.println( "Enter a sentence and press Enter" );

14 String sentence = scanner.nextLine();

15

16 // process user sentence

17 StringTokenizer tokens = new StringTokenizer( sentence );

18 System.out.printf( "Number of elements: %d\nThe tokens are:\n",

19 tokens.countTokens() );

20

21 while ( tokens.hasMoreTokens() )

22 System.out.println( tokens.nextToken() );

23 } // end main

24 } // end class TokenTest Enter a sentence and press Enter This is a sentence with seven tokens Number of elements: 7 The tokens are: This is a sentence with seven tokens

Pattern Example Suppose students are assigned a three-digit code:

The first digit represents the major (5 indicates computer science);

The second digit represents either in-state (1), out-of-state (2), or international (3);

The third digit indicates campus housing: On-campus dorms are numbered 1-7. Students living off-campus are represented by the digit 8.

The 3-digit pattern to represent computer science majors living on-campus is

5[123][1-7]

firstcharacter

is 5second

characteris 1, 2, or 3

thirdcharacter

is any digit between 1 and 7

Regular Expressions, Class Pattern and Class MatcherRegular expression

Sequence of characters and symbolsUseful for validating input and ensuring data formatFacilitate the construction of a compiler

Regular-expression operations in StringMethod matches

Matches the contents of a String to regular expressionReturns a boolean indicating whether the match

succeeded

Regular Expressions, Class Pattern and Class Matcher

Predefine character classesEscape sequence that represents a group

of characterDigit

Numeric characterWord character

Any letter, digit, underscoreWhitespace character

Space, tab, carriage return, newline, form feed

Predefined character classes.

Character Matches Character Matches

\d any digit \D any non-digit

\w any word character \W any non-word character

\s any whitespace \S any non-whitespace

Regular ExpressionsOther patterns

Square brackets ([])Match characters that do not have a predefined character

classE.g., [aeiou] matches a single character that is a vowel

Dash (-)Ranges of charactersE.g., [A-Z] matches a single uppercase letter

^Not include the indicated charactersE.g., [^Z] matches any character other than Z

Regular expression

QuantifiersPlus (+)

Match one or more occurrencesE.g., A+

Matches AAA but not empty string

Asterisk (*)Match zero or more occurrencesE.g., A*

Matches both AAA and empty string

Others in Fig. 29.22

Quantifiers used in regular expressions.

Quantifier Matches

* Matches zero or more occurrences of the pattern.

+ Matches one or more occurrences of the pattern.

? Matches zero or one occurrences of the pattern.

{n} Matches exactly n occurrences.

{n,} Matches at least n occurrences.

{n,m} Matches between n and m (inclusive) occurrences.

Regular Expression Examples

Expression Description[013] A single digit 0, 1, or 3.

[0-9][0-9] Any two-digit number from 00 to 99.

[0-9&&[^4567]] A single digit that is 0, 1, 2, 3, 8, or 9.

[a-z0-9] A single character that is either a lowercase letter or a digit.

[a-zA-Z][a-zA-Z0-9_$]*

A valid Java identifier consisting of alphanumeric characters, underscores, and dollar signs, with the first character being an alphabet.

[wb](ad|eed) Matches wad, weed, bad, and beed.

Regular expression

Replacing substrings and splitting stringsString method replaceAll

Replace text in a string with new textString method replaceFirst

Replace the first occurrence of a pattern matchString method split

Divides string into several substrings

Regular expression

Class PatternRepresents a regular expression

Class MatchContains a regular-expression pattern and a CharSequence

Interface CharSequenceAllows read access to a sequence of charactersString and StringBuffer implement CharSequence

Regular Expression Examples

Expression Description[013] A single digit 0, 1, or 3.

[0-9][0-9] Any two-digit number from 00 to 99.

[0-9&&[^4567]] A single digit that is 0, 1, 2, 3, 8, or 9.

[a-z0-9] A single character that is either a lowercase letter or a digit.

[a-zA-z][a-zA-Z0-9_$]*

A valid Java identifier consisting of alphanumeric characters, underscores, and dollar signs, with the first character being an alphabet.

[wb](ad|eed) Matches wad, weed, bad, and beed.

Matching

Searches for 2 character pattern whose first character may be any uppercase letter between A and G, and whose second character may be any number except 4

Searches for a character pattern that may be any alphabet except p,q,r,s, or t

A. str.matches(“[a-zA-Z]&&[^pqrst]”);B. str.matches(“[A-G][0-9 && [^4]]”);

Matching

Searches for 2 character pattern whose first character may be any uppercase letter between A and G, and whose second character may be any number except 4

Searches for a character pattern that may be any alphabet except p,q,r,s, or t

A. str.matches(“[a-zA-Z]&&[^pqrst]”);B. str.matches(“[A-G][0-9]&&[^4]”);

B

A

Review

Which character sequence would be used to designate a character pattern of a fixed length of three digits

A. 3[0-9]

B. [0-9][0-9][0-9]

C. [0-9]3

Review

Which character sequence would be used to designate a character pattern of a fixed length of three digits

A. 3[0-9]

B. [0-9][0-9][0-9]

C. [0-9]3

Review

Choose the correct argument for the following code that searches for any number between 100 and 999 in a given string.

str.matches(“ “);

A. [0-9][0-9][1-9]B. [0-9][0-9][0-9]C. [1-9][0-9][0-9]

Review

Choose the correct argument for the following code that searches for any number between 100 and 999 in a given string.

str.matches(“ “);

A. [0-9][0-9][1-9]B. [0-9][0-9][0-9]C. [1-9][0-9][0-9]