Characters, String and Regular expressions. Characters char data type is used to represent a single...
-
Upload
karin-white -
Category
Documents
-
view
218 -
download
1
Transcript of Characters, String and Regular expressions. Characters char data type is used to represent a single...
Characters, String and Regular expressions
Characters
char data type is used to represent a single character.
Characters are stored in a computer memory using some form of encoding.Java uses Unicode, which includes ASCII,
for representing char constants.
ASCII Encoding
For example, character 'O' is 79 (row value 70 + col value 9 = 79).
For example, character 'O' is 79 (row value 70 + col value 9 = 79).
O
9
70
Unicode Encoding
The Unicode Worldwide Character Standard (Unicode) supports the interchange, processing, and display of the written texts of diverse languages.
Java uses the Unicode standard for representing char constants.char ch1 = 'X';
System.out.println(ch1);System.out.println( (int) ch1);
X88
Character Processing
Declaration and initialization
Declaration and initialization
char ch1, ch2 = ‘X’;
Type conversion between int and char.
Type conversion between int and char.
System.out.print("ASCII code of character X is " + (int) 'X' );
System.out.print("Character with ASCII code 88 is " + (char)88 );
This comparison returns true because ASCII value of 'A' is 65 while that of 'c' is 99.
This comparison returns true because ASCII value of 'A' is 65 while that of 'c' is 99.
‘A’ < ‘c’
Strings
A string is a sequence of characters that is treated as a single value.
Instances of the String class are used to represent strings in Java.
String declaration
Create a String objectString <variable name>;
<variable name>=new String(“<value of a string>”);
• Create a String literalString <variable name>;
<variable name> = “<value of a string”;
Example
String word1;
word1 = new String(“Java”);
OR
String word1;
word1 = “Java”;
Examples
We can do thisbecause Stringobjects areimmutable.
String constructor
No-argument constructor One-argument constructor
A String object
One-argument constructor A char array
Three-argument constructor A char array An integer specifies the starting position An integer specifies the number of characters to access
Review
To compute how many characters the string myString contains, we use:
a. myString.size
b. myString.size()
c. myString.length
d. myString.length()
Review
To compute how many characters the string myString contains, we use:
a. myString.size
b. myString.size()
c. myString.length
d. myString.length()
Review
Java uses this to represent characters of diverse languages
a. ASCII
b. UNICODE
c. EDBIC
d. BINARY
Review
Java uses this to represent characters of diverse languages
a. ASCII
b. UNICODE
c. EDBIC
d. BINARY
Accessing Individual Elements
Individual characters in a String accessed with the charAt method.
0 1 2 3 4 5 6
S u m a t r a
String name = "Sumatra";
nameThis variable refers to the whole string.
This variable refers to the whole string.
name.charAt( 3 )The method returns the character at position # 3.
The method returns the character at position # 3.
Other Useful String Operators
Method Meaning
compareTo Compares the two strings.str1.compareTo( str2 )
substring Extracts the a substring from a string.str1.substring( 1, 4 )
trim Removes the leading and trailing spaces.str1.trim( )
valueOf Converts a given primitive data value to a string.String.valueOf( 123.4565 )
startsWith Returns true if a string starts with a specified prefix string.str1.startsWith( str2 )
endsWith Returns true if a string ends with a specified suffix string.str1.endsWith( str2 )
length Return the length of the given string
str1.length()
Compute Length of a string
Method: length()
Returns the length of a stringExample:
String strVar;
strVar = new String(“Java”);
int len = strVar.length();
Substring
Method:
Extract a substring from a given string by specifying the beginning and ending positions
Example:
String strVar, strSubStr;
strVar = new String(“Exam after Easter”);
strSubStr = strVar.substring(0,4);
Index position of a substring within another string
Method: Find an index position of a substring within another string.
Example:String strVar1 = “Google it”;String strVar2 = “Google”;int index;
index = strVar1.indexOf(strVar2);
String concatenation
Method: Create a new string from two strings by concatenating the two strings.
Example:String strVar1 = “Google”;String strVar2 = “ Search Engine”;String sumStr;
sumStr = strVar1.concat(strVar2);
Review
What method is used to refer to individual character in a String
a. getBytes
b. indexOf
c. getChars
d. charAt
Review
What method is used to refer to individual character in a String
a. getBytes
b. indexOf
c. getChars
d. charAt
Review
To compare two strings in Java, we use
a. ==
b. equals method
c. !=
d. <>
Review
To compare two strings in Java, we use
a. ==
b. equals method
c. !=
d. <>
String comparison
Methods:equalsequalsIgnoreCasecompareTocompareToIgnoreCase
String comparison
equals, and equalsIgnoreCase
Example:
String string1 =“COMPSCI”;
String string2 = “compsci”
boolean isEqual;
isEqual = string1.equals(string2);
Common error
Comparing references with == can lead to logic errors, because == compares the references to determine whether they refer to the same object, not whether two objects have the same contents. When two identical (but separate) objects are compared with ==, the result will be false. When comparing objects to determine whether they have the same contents, use method equals.
String comparison
compareTo
Example:
String string1 =“Adam”;String string2 = “AdamA”;int compareResult;compareResult =
string1.compareTo(string2);
String comparison
- string1.compareTo(string2)Compares two strings lexicographically
will return 0 if two strings are equalwill return negative value if string1 is less than
string 2will return positive value if string1 is greater
than string 2
The comparison is based on the Unicode value of each character in the strings
String comparison
The comparison is based on the Unicode value of each character in the strings
let k be the smallest index valid for both strings; compareTo returns the difference of the two character values at position k in the two string -- that is, the value: character at the position k of string 1 – character at the position k of string 2
regionMatches
regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)
A substring of this String object is compared to a substring of the argument other.
The result is true if these substrings represent character sequences that are the same, ignoring case if and only if ignoreCase is true.
The String Class is Immutable
In Java a String object is immutableThis means once a String object is created, it
cannot be changed, such as replacing a character with another character or removing a character
The String methods we have used so far do not change the original string. They created a new string from the original. For example, substring creates a new string from a given string.
Review
If x.equals(y) is true, then x==y is always true
a. True
b. False
Review
If x.equals(y), then x==y is always true
a. True
b. False
The StringBuffer Class
In many string processing applications, we would like to change the contents of a string. In other words, we want it to be mutable.
Manipulating the content of a string, such as replacing a character, appending a string with another string, deleting a portion of a string, and so on, may be accomplished by using the StringBuffer class.
StringBuffer Example
StringBuffer word = new StringBuffer("Java");word.setCharAt(0, 'D');word.setCharAt(1, 'i');
Changing a string Java to Diva
word
: StringBuffer
Java
Before
word
: StringBuffer
Diva
After
Delete a substring from a StringBuffer object
StringBuffer word = new StringBuffer(“CCourse”);
word.delete(0,1);
Append a string
StringBuffer word = new StringBuffer(“CS ”);
word.append(“Course”);
Insert a string
StringBuffer word = new StringBuffer(“MCS Course”);
word.insert(4,“220”);
Convert from StringBuffer to String
StringBuffer word = new StringBuffer(“Java”);
word.setCharAt(0,’D’);
word.setCharAt(1,’i’);
System.out.println(word.toString());
Review
Both the String and StringBuffer classes include the charAt and setCharAt methods
a. Trueb. False
Review
Both the String and StringBuffer classes include the charAt and setCharAt methods
a. Trueb. False
Review
What will be the value of str after the following statements are executed:
String str;StringBuffer strBuf;str ="Decaffeinated";strBuf = new StringBuffer(str.substring(2,7));strBuf.setCharAt(1,'o');strBuf.append('e');str = strBuf.toString();
StringBuffer methods
Method length Return StringBuffer length
Method capacity Return StringBuffer capacity
Method setLength Increase or decrease StringBuffer length
Method ensureCapacity Set StringBuffer capacity Guarantee that StringBuffer has minimum
capacity
Class StringTokenizer
TokenizerPartition String into individual substringsUse delimiter
Typically whitespace characters (space, tab, newline, etc)
Java offers java.util.StringTokenizer
Outline 1 // Fig. 29.18: TokenTest.java
2 // StringTokenizer class.
3 import java.util.Scanner;
4 import java.util.StringTokenizer;
5
6 public class TokenTest
7 {
8 // execute application
9 public static void main( String args[] )
10 {
11 // get sentence
12 Scanner scanner = new Scanner( System.in );
13 System.out.println( "Enter a sentence and press Enter" );
14 String sentence = scanner.nextLine();
15
16 // process user sentence
17 StringTokenizer tokens = new StringTokenizer( sentence );
18 System.out.printf( "Number of elements: %d\nThe tokens are:\n",
19 tokens.countTokens() );
20
21 while ( tokens.hasMoreTokens() )
22 System.out.println( tokens.nextToken() );
23 } // end main
24 } // end class TokenTest Enter a sentence and press Enter This is a sentence with seven tokens Number of elements: 7 The tokens are: This is a sentence with seven tokens
Pattern Example Suppose students are assigned a three-digit code:
The first digit represents the major (5 indicates computer science);
The second digit represents either in-state (1), out-of-state (2), or international (3);
The third digit indicates campus housing: On-campus dorms are numbered 1-7. Students living off-campus are represented by the digit 8.
The 3-digit pattern to represent computer science majors living on-campus is
5[123][1-7]
firstcharacter
is 5second
characteris 1, 2, or 3
thirdcharacter
is any digit between 1 and 7
Regular Expressions, Class Pattern and Class MatcherRegular expression
Sequence of characters and symbolsUseful for validating input and ensuring data formatFacilitate the construction of a compiler
Regular-expression operations in StringMethod matches
Matches the contents of a String to regular expressionReturns a boolean indicating whether the match
succeeded
Regular Expressions, Class Pattern and Class Matcher
Predefine character classesEscape sequence that represents a group
of characterDigit
Numeric characterWord character
Any letter, digit, underscoreWhitespace character
Space, tab, carriage return, newline, form feed
Predefined character classes.
Character Matches Character Matches
\d any digit \D any non-digit
\w any word character \W any non-word character
\s any whitespace \S any non-whitespace
Regular ExpressionsOther patterns
Square brackets ([])Match characters that do not have a predefined character
classE.g., [aeiou] matches a single character that is a vowel
Dash (-)Ranges of charactersE.g., [A-Z] matches a single uppercase letter
^Not include the indicated charactersE.g., [^Z] matches any character other than Z
Regular expression
QuantifiersPlus (+)
Match one or more occurrencesE.g., A+
Matches AAA but not empty string
Asterisk (*)Match zero or more occurrencesE.g., A*
Matches both AAA and empty string
Others in Fig. 29.22
Quantifiers used in regular expressions.
Quantifier Matches
* Matches zero or more occurrences of the pattern.
+ Matches one or more occurrences of the pattern.
? Matches zero or one occurrences of the pattern.
{n} Matches exactly n occurrences.
{n,} Matches at least n occurrences.
{n,m} Matches between n and m (inclusive) occurrences.
Regular Expression Examples
Expression Description[013] A single digit 0, 1, or 3.
[0-9][0-9] Any two-digit number from 00 to 99.
[0-9&&[^4567]] A single digit that is 0, 1, 2, 3, 8, or 9.
[a-z0-9] A single character that is either a lowercase letter or a digit.
[a-zA-Z][a-zA-Z0-9_$]*
A valid Java identifier consisting of alphanumeric characters, underscores, and dollar signs, with the first character being an alphabet.
[wb](ad|eed) Matches wad, weed, bad, and beed.
Regular expression
Replacing substrings and splitting stringsString method replaceAll
Replace text in a string with new textString method replaceFirst
Replace the first occurrence of a pattern matchString method split
Divides string into several substrings
Regular expression
Class PatternRepresents a regular expression
Class MatchContains a regular-expression pattern and a CharSequence
Interface CharSequenceAllows read access to a sequence of charactersString and StringBuffer implement CharSequence
Regular Expression Examples
Expression Description[013] A single digit 0, 1, or 3.
[0-9][0-9] Any two-digit number from 00 to 99.
[0-9&&[^4567]] A single digit that is 0, 1, 2, 3, 8, or 9.
[a-z0-9] A single character that is either a lowercase letter or a digit.
[a-zA-z][a-zA-Z0-9_$]*
A valid Java identifier consisting of alphanumeric characters, underscores, and dollar signs, with the first character being an alphabet.
[wb](ad|eed) Matches wad, weed, bad, and beed.
Matching
Searches for 2 character pattern whose first character may be any uppercase letter between A and G, and whose second character may be any number except 4
Searches for a character pattern that may be any alphabet except p,q,r,s, or t
A. str.matches(“[a-zA-Z]&&[^pqrst]”);B. str.matches(“[A-G][0-9 && [^4]]”);
Matching
Searches for 2 character pattern whose first character may be any uppercase letter between A and G, and whose second character may be any number except 4
Searches for a character pattern that may be any alphabet except p,q,r,s, or t
A. str.matches(“[a-zA-Z]&&[^pqrst]”);B. str.matches(“[A-G][0-9]&&[^4]”);
B
A
Review
Which character sequence would be used to designate a character pattern of a fixed length of three digits
A. 3[0-9]
B. [0-9][0-9][0-9]
C. [0-9]3
Review
Which character sequence would be used to designate a character pattern of a fixed length of three digits
A. 3[0-9]
B. [0-9][0-9][0-9]
C. [0-9]3
Review
Choose the correct argument for the following code that searches for any number between 100 and 999 in a given string.
str.matches(“ “);
A. [0-9][0-9][1-9]B. [0-9][0-9][0-9]C. [1-9][0-9][0-9]
Review
Choose the correct argument for the following code that searches for any number between 100 and 999 in a given string.
str.matches(“ “);
A. [0-9][0-9][1-9]B. [0-9][0-9][0-9]C. [1-9][0-9][0-9]