Google speech recognition api Java example

google speech recognition api javascript and google speech recognition api documentation
JohenCorner Profile Pic
JohenCorner,France,Professional
Published Date:02-08-2017
Your Website URL(Optional)
Comment
Working with Text If you’ve been reading this book sequentially, you’ve read all about the core Java language constructs, including the object-oriented aspects of the language and the use of threads. Now it’s time to shift gears and start talking about the Java Application Programming Interface (API), the collection of classes that compose the standard Java packages and come with every Java implementation. Java’s core packages are one of its most distin‐ guishing features. Many other object-oriented languages have similar features, but none has as extensive a set of standardized APIs and tools as Java does. This is both a reflection of and a reason for Java’s success. Table 10-1 lists some of the important packages in the API and their corresponding chapters in this book. Table 10-1. Java API packages Package Contents Chapter Basic language classes 4–9 java.lang Reflection 7 java.lang.reflect java.util.concurrent Thread utilities 9 java.text International text classes and regular expressions 10 java.util.regex java.util Utilities and collections classes 10–12 java.io Input and output 12 java.nio Input and output 12 java.net Networking and Remote Method Invocation classes 13–14 java.rmi Remote Method Invocation classes 13 Web applications 15 javax.servlet Swing GUI and 2D graphics 16–20 javax.swing java.awt 315 www.finebook.irPackage Contents Chapter Images, sound, and video 21 java.awt.image javax.imageio javax.media java.beans JavaBeans API 22 java.applet The Applet API 23 javax.xml The XML API 24 As you can see in Table 10-1, we have examined some classes in java.lang in earlier chapters while looking at the core language constructs. Starting with this chapter, we throw open the Java toolbox and begin examining the rest of the API classes, starting with text-related utilities, because they are fundamental to all kinds of applications. Text-Related APIs In this chapter, we cover most of the special-purpose, text-related APIs in Java, from simple classes for parsing words and numbers to advanced text formatting, interna‐ tionalization, and regular expressions. But because so much of what we do with com‐ puters is oriented around text, classifying APIs as strictly text-related can be somewhat arbitrary. Some of the text-related packages we cover in the next chapter include the Java Calendar API, the Properties and User Preferences APIs, and the Logging API. But some of the most important tools in the text arena are those for working with the Ex‐ tensible Markup Language, XML. In Chapter 24, we cover XML in detail, along with the XSL/XSLT stylesheet language. Together they provide a powerful framework for rendering documents. Strings We’ll start by taking a closer look at the Java String class (or, more specifically, java.lang.String). Because working withStrings is so fundamental, it’s important to understand how they are implemented and what you can do with them. AString object encapsulates a sequence of Unicode characters. Internally, these characters are stored in a regular Java array, but the String object guards this array jealously and gives you access to it only through its own API. This is to support the idea that Strings are immutable; once you create aString object, you can’t change its value. Lots of operations on aString object appear to change the characters or length of a string, but what they really do is return a new String object that copies or internally references the needed characters of the original. Java implementations make an effort to consolidate identical strings used in the same class into a shared-string pool and to share parts of Strings where possible. 316 Chapter 10: Working with Text a The original motivation for all of this was performance. Immutable Strings can save memory and be optimized for speed by the Java VM. The flip side is that a programmer should have a basic understanding of the String class in order to avoid creating an excessive number of String objects in places where performance is an issue. That was especially true in the past, when VMs were slow and handled memory poorly. Nowadays, 1 string usage is not usually an issue in the overall performance of a real application. Constructing Strings Literal strings, defined in your source code, are declared with double quotes and can be assigned to aString variable: String quote = "To be or not to be"; Java automatically converts the literal string into a String object and assigns it to the variable. Strings keep track of their own length, so String objects in Java don’t require special terminators. You can get the length of aString with thelength() method. You can also test for a zero length string by usingisEmpty(): int length = quote.length(); boolean empty = quote.isEmpty(); Strings can take advantage of the only overloaded operator in Java, the+ operator, for string concatenation. The following code produces equivalent strings: String name = "John " + "Smith"; String name = "John ".concat("Smith"); Literal strings can’t span lines in Java source files, but we can concatenate lines to produce the same effect: String poem = "'Twas brillig, and the slithy toves\n" + " Did gyre and gimble in the wabe:\n" + "All mimsy were the borogoves,\n" + " And the mome raths outgrabe.\n"; Embedding lengthy text in source code is not normally something you want to do. In this and the following chapter, we’ll talk about ways to loadStrings from files, special packages called resource bundles, and URLs. Technologies like Java Server Pages and template engines also provide a way to factor out large amounts of text from your code. 1. When in doubt, measure it If yourString-manipulating code is clean and easy to understand, don’t rewrite it until someone proves to you that it is too slow. Chances are that they will be wrong. And don’t be fooled by relative comparisons. A millisecond is 1,000 times slower than a microsecond, but it still may be negligible to your application’s overall performance. Strings 317 a For example, in Chapter 14, we’ll see how to load our poem from a web server by opening a URL like this: InputStream poem = new URL( "http://myserver/dodgson/jabberwocky.txt").openStream(); In addition to making strings from literal expressions, you can construct a String directly from an array of characters: char data = new char 'L', 'e', 'm', 'm', 'i', 'n', 'g' ; String lemming = new String( data ); You can also construct aString from an array of bytes: byte data = new byte (byte)97, (byte)98, (byte)99 ; String abc = new String(data, "ISO8859_1"); In this case, the second argument to theString constructor is the name of a character- encoding scheme. TheString constructor uses it to convert the raw bytes in the speci‐ fied encoding to the internally used standard 2-byte Unicode characters. If you don’t specify a character encoding, the default encoding scheme on your system is used. We’ll discuss character encodings more when we talk about the Charset class, IO, in Chap‐ 2 ter 12. Conversely, the charAt() method of the String class lets you access the characters of aString in an array-like fashion: String s = "Newton"; for ( int i = 0; i s.length(); i++ ) System.out.println( s.charAt( i ) ); This code prints the characters of the string one at a time. Alternately, we can get the characters all at once withtoCharArray(). Here’s a way to save typing a bunch of single quotes and get an array holding the alphabet: char abcs = "abcdefghijklmnopqrstuvwxyz".toCharArray(); The notion that aString is a sequence of characters is also codified by theString class implementing the interface java.lang.CharSequence, which prescribes the methods length() andcharAt() as well as a way to get a subset of the characters. Strings from Things Objects and primitive types in Java can be turned into a default textual representation as a String. For primitive types like numbers, the string should be fairly obvious; for object types, it is under the control of the object itself. We can get the string 2. On Mac OS X, the default encoding is MacRoman. In Windows, it is CP1252. On some Unix platforms it is ISO8859_1. 318 Chapter 10: Working with Text a representation of an item with the staticString.valueOf() method. Various overloa‐ ded versions of this method accept each of the primitive types: String one = String.valueOf( 1 ); // integer, "1" String two = String.valueOf( 2.384f ); // float, "2.384" String notTrue = String.valueOf( false ); // boolean, "false" All objects in Java have a toString() method that is inherited from the Object class. For many objects, this method returns a useful result that displays the contents of the object. For example, a java.util.Date object’s toString() method returns the date it represents formatted as a string. For objects that do not provide a representation, the string result is just a unique identifier that can be used for debugging. TheString.val ueOf() method, when called for an object, invokes the object’stoString() method and returns the result. The only real difference in using this method is that if you pass it a null object reference, it returns theString “null” for you, instead of producing aNull PointerException: Date date = new Date(); // Equivalent, e.g., "Fri Dec 19 05:45:34 CST 1969" String d1 = String.valueOf( date ); String d2 = date.toString(); date = null; d1 = String.valueOf( date ); // "null" d2 = date.toString(); // NullPointerException String concatenation uses thevalueOf() method internally, so if you “add” an object or primitive using the plus operator (+), you get aString: String today = "Today's date is :" + date; You’ll sometimes see people use the empty string and the plus operator (+) as shorthand to get the string value of an object. For example: String two = "" + 2.384f; String today = "" + new Date(); Comparing Strings The standardequals() method can compare strings for equality; they contain exactly the same characters in the same order. You can use a different method, equalsIgnore Case(), to check the equivalence of strings in a case-insensitive way: String one = "FOO"; String two = "foo"; one.equals( two ); // false one.equalsIgnoreCase( two ); // true A common mistake for novice programmers in Java is to compare strings with the == operator when they intend to use the equals() method. Remember that strings are Strings 319 a objects in Java, and== tests for object identity; that is, whether the two arguments being tested are the same object. In Java, it’s easy to make two strings that have the same characters but are not the same string object. For example: String foo1 = "foo"; String foo2 = String.valueOf( new char 'f', 'o', 'o' ); foo1 == foo2 // false foo1.equals( foo2 ) // true This mistake is particularly dangerous because it often works for the common case in which you are comparing literal strings (strings declared with double quotes right in the code). The reason for this is that Java tries to manage strings efficiently by combining them. At compile time, Java finds all the identical strings within a given class and makes only one object for them. This is safe because strings are immutable and cannot change. You can coalesce strings yourself in this way at runtime using the String intern() method. Interning a string returns an equivalent string reference that is unique across the VM. ThecompareTo() method compares the lexical value of theString to anotherString, determining whether it sorts alphabetically earlier than, the same as, or later than the target string. It returns an integer that is less than, equal to, or greater than zero: String abc = "abc"; String def = "def"; String num = "123"; if ( abc.compareTo( def ) 0 ) // true if ( abc.compareTo( abc ) == 0 ) // true if ( abc.compareTo( num ) 0 ) // true ThecompareTo() method compares strings strictly by their characters’ positions in the Unicode specification. This works for simple text but does not handle all language var‐ iations well. The Collator class, discussed next, can be used for more sophisticated comparisons. The Collator class Thejava.text package provides a sophisticated set of classes for comparing strings in specific languages. German, for example, has vowels with umlauts and another character that resembles the Greek letter beta and represents a double “s.” How should we sort these? Although the rules for sorting such characters are precisely defined, you can’t assume that the lexical comparison we used earlier has the correct meaning for lan‐ guages other than English. Fortunately, theCollator class takes care of these complex sorting problems. In the following example, we use aCollator designed to compare German strings. You can obtain a defaultCollator by calling theCollator.getInstance() method with no 320 Chapter 10: Working with Text a arguments. Once you have an appropriate Collator instance, you can use its com pare() method, which returns values just likeString’scompareTo() method. The fol‐ lowing code creates two strings for the German translations of “fun” and “later,” using Unicode constants for these two special characters. It then compares them, using a Collator for the German locale. (Locales help you deal with issues relevant to particular languages and cultures; we’ll talk about them in detail later in this chapter.) The result in this case is that “fun” (Spaß) sorts before “later” (später): String fun = "Spa\u00df"; String later = "sp\u00e4ter"; Collator german = Collator.getInstance(Locale.GERMAN); if (german.compare(fun, later) 0) // true Using collators is essential if you’re working with languages other than English. In Spanish, for example, “ll” and “ch” are treated as unique characters and alphabetized separately. A collator handles cases like these automatically. Searching TheString class provides several simple methods for finding fixed substrings within a string. ThestartsWith() andendsWith() methods compare an argument string with the beginning and end of theString, respectively: String url = "http://foo.bar.com/"; if ( url.startsWith("http:") ) // true TheindexOf() method searches for the first occurrence of a character or substring and returns the starting character position, or-1 if the substring is not found: String abcs = "abcdefghijklmnopqrstuvwxyz"; int i = abcs.indexOf( 'p' ); // 15 int i = abcs.indexOf( "def" ); // 3 int I = abcs.indexOf( "Fang" ); // -1 Similarly,lastIndexOf() searches backward through the string for the last occurrence of a character or substring. The contains() method handles the very common task of checking to see whether a given substring is contained in the target string: String log = "There is an emergency in sector 7"; if ( log.contains("emergency") ) pageSomeone(); // equivalent to if ( log.indexOf("emergency") = -1 ) ... For more complex searching, you can use the Regular Expression API, which allows you to look for and parse complex patterns. We’ll talk about regular expressions later in this chapter. Strings 321 aEditing A number of methods operate on theString and return a newString as a result. While this is useful, you should be aware that creating lots of strings in this manner can affect performance. If you need to modify a string often or build a complex string from com‐ ponents, you should use theStringBuilder class, as we’ll discuss shortly. trim() is a useful method that removes leading and trailing whitespace (i.e., carriage return, newline, and tab) from theString: String str = " abc "; str = str.trim(); // "abc" In this example, we threw away the originalString (with excess whitespace), and it will be garbage-collected. The toUpperCase() and toLowerCase() methods return a new String of the appro‐ priate case: String down = "FOO".toLowerCase(); // "foo" String up = down.toUpperCase(); // "FOO" substring() returns a specified range of characters. The starting index is inclusive; the ending is exclusive: String abcs = "abcdefghijklmnopqrstuvwxyz"; String cde = abcs.substring( 2, 5 ); // "cde" The replace() method provides simple, literal string substitution. One or more oc‐ currences of the target string are replaced with the replacement string, moving from beginning to end. For example: String message = "Hello NAME, how are you?".replace( "NAME", "Penny" ); // "Hello Penny, how are you?" String xy = "xxooxxxoo".replace( "xx", "X" ); // "XooXxoo" The String class also has two methods that allow you to do more complex pattern substitution:replaceAll() andreplaceFirst(). Unlike the simplereplace() meth‐ od, these methods use regular expressions (a special syntax) to describe the replacement pattern, which we’ll cover later in this chapter. String Method Summary Table 10-2 summarizes the methods provided by theString class. Table 10-2. String methods Method Functionality charAt() Gets a particular character in the string Compares the string with another string compareTo() 322 Chapter 10: Working with Text aMethod Functionality concat() Concatenates the string with another string contains() Checks whether the string contains another string Returns a string equivalent to the specified character array copyValueOf() endsWith() Checks whether the string ends with a specified suffix equals() Compares the string with another string equalsIgnoreCase() Compares the string with another string, ignoring case getBytes() Copies characters from the string into a byte array Copies characters from the string into a character array getChars() hashCode() Returns a hashcode for the string indexOf() Searches for the first occurrence of a character or substring in the string Fetches a unique instance of the string from a global shared-string pool intern() isEmpty() Returns true if the string is zero length lastIndexOf() Searches for the last occurrence of a character or substring in a string length() Returns the length of the string matches() Determines if the whole string matches a regular expression pattern Checks whether a region of the string matches the specified region of another string regionMatches() replace() Replaces all occurrences of a character in the string with another character replaceAll() Replaces all occurrences of a regular expression pattern with a pattern replaceFirst() Replaces the first occurrence of a regular expression pattern with a pattern split() Splits the string into an array of strings using a regular expression pattern as a delimiter startsWith() Checks whether the string starts with a specified prefix substring() Returns a substring from the string toCharArray() Returns the array of characters from the string Converts the string to lowercase toLowerCase() toString() Returns the string value of an object toUpperCase() Converts the string to uppercase trim() Removes leading and trailing whitespace from the string valueOf() Returns a string representation of a value StringBuilder and StringBuffer In contrast to the immutable string, thejava.lang.StringBuilder class is a modifiable and expandable buffer for characters. You can use it to create a big string efficiently. StringBuilder and StringBuffer are twins; they have exactly the same API. String Builder was added in Java 5.0 as a drop-in, unsynchronized replacement for String Buffer. We’ll come back to that in a bit. Strings 323 a First, let’s look at some examples ofString construction: // Could be better String ball = "Hello"; ball = ball + " there."; ball = ball + " How are you?"; This example creates an unnecessaryString object each time we use the concatenation operator (+). Whether this is significant depends on how often this code is run and how big the string actually gets. Here’s a more extreme example: // Bad use of + ... while( (line = readLine()) = EOF ) text += line; This example repeatedly produces new String objects. The character array must be copied over and over, which can adversely affect performance. The solution is to use a StringBuilder object and itsappend() method: StringBuilder sb = new StringBuilder("Hello"); sb.append(" there."); sb.append(" How are you?"); StringBuilder text = new StringBuilder(); while( (line = readline()) = EOF ) text.append( line ); Here, theStringBuilder efficiently handles expanding the array as necessary. We can get aString back from theStringBuilder with itstoString() method: String message = sb.toString(); You can also retrieve part of a StringBuilder as a String by using one of the sub string() methods. You might be interested to know that when you write a long expression using string concatenation, the compiler generates code that uses a StringBuilder behind the scenes: String foo = "To " + "be " + "or"; It is really equivalent to: String foo = new StringBuilder().append("To ").append("be ").append("or").toString(); In this case, the compiler knows what you are trying to do and takes care of it for you. The StringBuilder class provides a number of overloaded append() methods for adding any type of data to the buffer. StringBuilder also provides a number of over‐ loadedinsert() methods for inserting various types of data at a particular location in the string buffer. Furthermore, you can remove a single character or a range of characters with the deleteCharAt() and delete() methods. Finally, you can replace part of the 324 Chapter 10: Working with Text a StringBuilder with the contents of aString using thereplace() method. TheString andStringBuilder classes cooperate so that, in some cases, no copy of the data has to be made; the string data is shared between the objects. You should use aStringBuilder instead of aString any time you need to keep adding characters to a string; it’s designed to handle such modifications efficiently. You can convert theStringBuilder to aString when you need it, or simply concatenate or print it anywhere you’d use aString. As we said earlier, StringBuilder was added in Java 5.0 as a replacement for String Buffer. The only real difference between the two is that the methods of StringBuff er are synchronized and the methods ofStringBuilder are not. This means that if you wish to useStringBuilder from multiple threads concurrently, you must synchronize the access yourself (which is easily accomplished). The reason for the change is that most simple usage does not require any synchronization and shouldn’t have to pay the associated penalty (slight as it is). Internationalization The Java VM lets us write code that executes in the same way on any Java platform. But in a global marketplace, that is only half the battle. A big question remains: will the application content and data be understandable to end users worldwide? Must users know English to use your application? The answer is that Java provides thorough sup‐ port for localizing the text of your application for most modern languages and dialects. In this section, we’ll talk about the concepts of internationalization (often abbreviated “I18N”) and the classes that support them. The java.util.Locale Class Internationalization programming revolves around the Locale class. The class itself is very simple; it encapsulates a country code, a language code, and a rarely used variant code. Commonly used languages and countries are defined as constants in theLocale class. (Maybe it’s ironic that these names are all in English.) You can retrieve the codes or readable names, as follows: Locale l = Locale.ITALIAN; System.out.println(l.getCountry()); // IT System.out.println(l.getDisplayCountry()); // Italy System.out.println(l.getLanguage()); // it System.out.println(l.getDisplayLanguage()); // Italian The country codes comply with ISO 3166. You will find a complete list of country codes at the RIPE Network Coordination Centre. The language codes comply with ISO 639. A complete list of language codes is online at the US government website. There is no official set of variant codes; they are designated as vendor-specific or platform-specific. You can get an array of all supported Locales with the static getAvailableLocales() Internationalization 325 amethod (which you might use to let your users choose). Or you can retrieve the default Locale for the location where your code is running with the static Locale.getDe fault() method and let the system decide for you. Many classes throughout the Java API use aLocale to decide how to represent text. We ran into one earlier when talking about sorting text with the Collator class. We’ll see more later in this chapter used to format numbers and currency strings, and again in the next chapter with the DateFormat class, which uses Locales to determine how to format and parse dates and times. Without getting into the details yet, here is a quick example: System.out.printf( Locale.ITALIAN, "%f\n", 3.14 ); // "3,14" The preceding statement uses the Italian Locale to indicate that the decimal number 3.14 should be formatted as it would in Italian, using a comma instead of a decimal point. We’ll talk more about formatting text later in this chapter. Resource Bundles Before we move on to the details of formatting messages and values, we might take a step back and ask a bigger question: what about the messages themselves? How can we write and manage applications that are truly multilingual in their user interfaces and in all the messages they display to the user? We can discover our locale, but how do we manage all of the application text in our code? TheResourceBundle class offers a clean, flexible solution for factoring out the text and resources of your application into language-specific classes or text files. A ResourceBundle is a collection of objects that your application can access by name. It acts much like theHashtable orMap collections we’ll discuss in Chapter 11, looking up objects based onStrings that serve as keys. AResourceBundle of a given name may be defined for many different Locales. To get a particular ResourceBundle, call the factory methodResourceBundle.getBundle(), which accepts the name of theResour ceBundle and aLocale. The following example gets theResourceBundle named “Mes‐ sage” for two Locales; from each bundle, it retrieves the message whose key is “Hello‐ Message” and prints the message: import java.util.; public class Hello public static void main(String args) ResourceBundle bun; bun = ResourceBundle.getBundle("Message", Locale.ITALY); System.out.println(bun.getString("HelloMessage")); bun = ResourceBundle.getBundle("Message", Locale.US); System.out.println(bun.getString("HelloMessage")); 326 Chapter 10: Working with Text a The getBundle() method throws the runtime exception MissingResourceException if an appropriateResourceBundle cannot be located. You can provide ResourceBundles in two ways: either as compiled Java classes (hard- coded Java) or as simple property files. Resource bundles implemented as classes are either subclasses of ListResourceBundle or direct implementations of ResourceBun dle. Resource bundles backed by a property file are represented at runtime by a Prop ertyResourceBundle object.ResourceBundle.getBundle() returns either a matching class or an instance ofPropertyResourceBundle corresponding to a matching property file. The algorithm used by getBundle() is based on appending the country and lan‐ guage codes of the requestedLocale to the name of the resource. Specifically, it searches for resources in this order: name_language_country_variant name_language_country name_language name name_default-language_default-country_default-variant name_default-language_default-country name_default-language In this example, when we try to get the ResourceBundle named Message, specific to Locale.ITALY, it searches for the following names (no variant codes are in theLocales we are using): Message_it_IT Message_it Message Message_en_US Message_en Let’s define the Message_it_IT ResourceBundle as a hardcoded class, a subclass of ListResourceBundle: import java.util.; public class Message_it_IT extends ListResourceBundle public Object getContents() return contents; static final Object contents = "HelloMessage", "Buon giorno, world", "OtherMessage", "Ciao.", ; ListResourceBundle makes it easy to define aResourceBundle class; all we have to do is override thegetContents() method. This method simply returns a two-dimensional array containing the names and values of its resources. In this example, contents1 Internationalization 327 a 0 is the second key (OtherMessage), and contents 11 is the corresponding message (Ciao.). Let’s define a ResourceBundle for Locale.US. This time, we’ll take the easy way and make a property file. Save the following data in a file called Message_en_US.properties: HelloMessage=Hello, world OtherMessage=Bye. So what happens if somebody runs your program in Locale.FRANCE and noResource Bundle is defined for thatLocale? To avoid a runtimeMissingResourceException, it’s a good idea to define a default ResourceBundle. In our example, you can change the name of the property file to Message.properties. That way, if a language- or country- specificResourceBundle cannot be found, your application can still run (by falling back to this English representation). Parsing and Formatting Text Parsing and formatting text is a large, open-ended topic. So far in this chapter, we’ve looked at only primitive operations on strings—creation, basic editing, searching, and turning simple values into strings. Now we’d like to move on to more structured forms of text. Java has a rich set of APIs for parsing and printing formatted strings, including numbers, dates, times, and currency values. We’ll cover most of these topics in this chapter, but we’ll wait to discuss date and time formatting until Chapter 11. We’ll start with parsing—reading primitive numbers and values as strings and chopping long strings into tokens. Then we’ll go the other way and look at formatting strings and the java.text package. We’ll revisit the topic of internationalization to see how Java can localize parsing and formatting of text, numbers, and dates for particular locales. Finally, we’ll take a detailed look at regular expressions, the most powerful text-parsing tool Java offers. Regular expressions let you define your own patterns of arbitrary com‐ plexity, search for them, and parse them from text. We should mention that you’re going to see a great deal of overlap between the new formatting and parsing APIs (printf andScanner) introduced in Java 5.0 and the older APIs of thejava.text package. The new APIs effectively replace much of the old ones and in some ways are easier to use. Nonetheless, it’s good to know about both because so much existing code uses the older APIs. Parsing Primitive Numbers In Java, numbers and Booleans are primitive types—not objects. But for each primitive type, Java also defines a primitive wrapper class. Specifically, the java.lang package includes the following classes: Byte, Short, Integer, Long, Float, Double, and Boolean. We talked about these in Chapter 1, but we bring them up now because these 328 Chapter 10: Working with Text a classes hold static utility methods that know how to parse their respective types from strings. Each of these primitive wrapper classes has a static “parse” method that reads a String and returns the corresponding primitive type. For example: byte b = Byte.parseByte("16"); int n = Integer.parseInt( "42" ); long l = Long.parseLong( "99999999999" ); float f = Float.parseFloat( "4.2" ); double d = Double.parseDouble( "99.99999999" ); boolean b = Boolean.parseBoolean("true"); // Prior to Java 5.0 use: boolean b = new Boolean("true").booleanValue(); Alternately, the java.util.Scanner provides a single API for not only parsing indi‐ vidual primitive types from strings, but reading them from a stream of tokens. This example shows how to use it in place of the preceding wrapper classes: byte b = new Scanner("16").nextByte(); int n = new Scanner("42").nextInt(); long l = new Scanner("99999999999").nextLong(); float f = new Scanner("4.2").nextFloat(); double d = new Scanner("99.99999999").nextDouble(); boolean b = new Scanner("true").nextBoolean(); We’ll seeScanner used to parse multiple values from aString or stream when we discuss tokenizing text later in this chapter. Working with alternate bases It’s easy to parse integer type numbers (byte, short, int, long) in alternate numeric bases. You can use the parse methods of the primitive wrapper classes by simply spec‐ ifying the base as a second parameter: long l = Long.parseLong( "CAFEBABE", 16 ); // l = 3405691582 byte b = Byte.parseByte ( "12", 8 ); // b = 10 All methods of the Java 5.0 Scanner class described earlier also accept a base as an optional argument: long l = new Scanner( "CAFEBABE" ).nextLong( 16 ); // l = 3405691582 byte b = new Scanner( "12" ).nextByte( 8 ); // b = 10 You can go the other way and convert a long or integer value to a string value in a specified base using special statictoString() methods of theInteger andLong classes: String s = Long.toString( 3405691582L, 16 ); // s = "cafebabe" For convenience, each class also has a static toHexString() method for working with base 16: String s = Integer.toHexString( 255 ).toUpperCase(); // s = "FF"; Parsing and Formatting Text 329 aNumber formats The preceding wrapper class parser methods handle the case of numbers formatted using only the simplest English conventions with no frills. If these parse methods do not understand the string, either because it’s simply not a valid number or because the number is formatted in the convention of another language, they throw a NumberFor matException: // Italian formatting double d = Double.parseDouble("1.234,56"); // NumberFormatException TheScanner API is smarter and can useLocales to parse numbers in specific languages with more elaborate conventions. For example, the Scanner can handle comma- formatted numbers: int n = new Scanner("99,999,999").nextInt(); You can specify a Locale other than the default with the useLocale() method. Let’s parse that value in Italian now: double d = new Scanner("1.234,56").useLocale( Locale.ITALIAN ).nextDouble(); If theScanner cannot parse a string, it throws a runtimeInputMismatchException: double d = new Scanner("garbage").nextDouble(); // InputMismatchException Prior to Java 5.0, this kind of parsing was accomplished using the java.text package with the NumberFormat class. The classes of the java.text package also allow you to parse additional types, such as dates, times, and localized currency values, that aren’t handled by theScanner. We’ll look at these later in this chapter. Tokenizing Text A common programming task involves parsing a string of text into words or “tokens” that are separated by some set of delimiter characters, such as spaces or commas. The first example contains words separated by single spaces. The second, more realistic problem involves comma-delimited fields. Now is the time for all good men (and women)... Check Number, Description, Amount 4231, Java Programming, 1000.00 Java has several (unfortunately overlapping) APIs for handling situations like this. The most powerful and useful are theString split() andScanner APIs. Both utilize reg‐ ular expressions to allow you to break the string on arbitrary patterns. We haven’t talked about regular expressions yet, but in order to show you how this works we’ll just give you the necessary magic and explain in detail later in this chapter. We’ll also mention a legacy utility,java.util.StringTokenizer, which uses simple character sets to split a 330 Chapter 10: Working with Text a string. StringTokenizer is not as powerful, but doesn’t require an understanding of regular expressions. The String split() method accepts a regular expression that describes a delimiter and uses it to chop the string into an array ofStrings: String text = "Now is the time for all good men"; String words = text.split("\\s"); // words = "Now", "is", "the", "time", ... String text = "4231, Java Programming, 1000.00"; String fields = text.split("\\s,\\s"); // fields = "4231", "Java Programming", "1000.00" In the first example, we used the regular expression\\s, which matches a single white‐ space character (space, tab, or carriage return). Thesplit() method returned an array of eight strings. In the second example, we used a more complicated regular expression, \\s,\\s, which matches a comma surrounded by any number of contiguous spaces (possibly zero). This reduced our text to three nice, tidy fields. With the new Scanner API, we could go a step further and parse the numbers of our second example as we extract them: String text = "4231, Java Programming, 1000.00"; Scanner scanner = new Scanner( text ).useDelimiter("\\s,\\s"); int checkNumber = scanner.nextInt(); // 4231 String description = scanner.next(); // "Java Programming" float amount = scanner.nextFloat(); // 1000.00 Here, we’ve told the Scanner to use our regular expression as the delimiter and then called it repeatedly to parse each field as its corresponding type. TheScanner is conve‐ nient because it can read not only fromStrings but directly from stream sources, such asInputStreams,Files, andChannels: Scanner fileScanner = new Scanner( new File("spreadsheet.csv") ); fileScanner.useDelimiter( "\\s,\\s ); // ... Another thing that you can do with the Scanner is to look ahead with the “hasNext” methods to see if another item is coming: while( scanner.hasNextInt() ) int n = scanner.nextInt(); ... StringTokenizer Even though the StringTokenizer class that we mentioned is now a legacy item, it’s good to know that it’s there because it’s been around since the beginning of Java and is used in a lot of code. StringTokenizer allows you to specify a delimiter as a set of Parsing and Formatting Text 331 a characters and matches any number or combination of those characters as a delimiter between tokens. The following snippet reads the words of our first example: String text = "Now is the time for all good men (and women)..."; StringTokenizer st = new StringTokenizer( text ); while ( st.hasMoreTokens() ) String word = st.nextToken(); ... We invoke thehasMoreTokens() andnextToken() methods to loop over the words of the text. By default, theStringTokenizer class uses standard whitespace characters— carriage return, newline, and tab—as delimiters. You can also specify your own set of delimiter characters in theStringTokenizer constructor. Any contiguous combination of the specified characters that appears in the target string is skipped between tokens: String text = "4231, Java Programming, 1000.00"; StringTokenizer st = new StringTokenizer( text, "," ); while ( st.hasMoreTokens() ) String word = st.nextToken(); // word = "4231", " Java Programming", "1000.00" This isn’t as clean as our regular expression example. Here we used a comma as the delimiter so we get extra leading whitespace in our description field. If we had added space to our delimiter string, theStringTokenizer would have broken our description into two words, “Java” and “Programming,” which is not what we wanted. A solution here would be to usetrim() to remove the leading and trailing space on each element. Printf-Style Formatting A standard feature that Java adopted from the C language is printf-style string for‐ matting. printf-style formatting utilizes special format strings embedded into text to tell the formatting engine where to place arguments and give detailed specification about conversions, layout, and alignment. The printf formatting methods also make use of variable-length argument lists, which makes working with them much easier. Here is a quick example ofprintf-formatted output: System.out.printf( "My name is %s and I am %d years old\n", name, age ); The printf formatting draws its name from the C language printf() function, so if you’ve done any C programming, this will look familiar. Java has extended the concept, adding some additional type safety and convenience features. Although Java has had some text formatting capabilities in the past (we’ll discuss the java.text package and MessageFormat later), printf formatting was not really feasible until variable-length 332 Chapter 10: Working with Text aargument lists and autoboxing of primitive types were added in Java 5.0. (We mention this to explain why these similar APIs both exist in Java.) Formatter The primary new tool in our text formatting arsenal is thejava.util.Formatter class and itsformat() method. Several convenience methods can hide theFormatter object from you and you may not need to create a Formatter directly. First, the static String.format() method can be used to format a String with arguments (like the C languagesprintf() method): String message = String.format("My name is %s and I am %d years old.", name, age ); Next, thejava.io.PrintStream andjava.io.PrintWriter classes, which are used for writing text to streams, have their ownformat() method. We discuss streams in Chap‐ ter 12, but this simply means that you can use this same printf-style formatting for writing strings to any kind of stream, whether it be to System.out standard console output, to a file, or to a network connection. In addition to theformat() method,PrintStream andPrintWriter also have a version of the format method that is actually calledprintf(). Theprintf() method is identical to and, in fact, simply delegates to theformat() method. It’s there solely as a shout-out to the C programmers and ex-C programmers in the audience. The Format String The syntax of the format string is compact and a bit cryptic at first, but not bad once you get used to it. The simplest format string is just a percent sign (%) followed by a conversion character. For example, the following text has two embedded format strings: "My name is %s and I am %d years old." The first conversion character is s, the most general format, which represents a string value; and the second is d, which represents an integer value. There are about a dozen basic conversion characters corresponding to different types and primitives and there are a couple of dozen more that are specifically used for formatting dates and times. We cover the basics here and return to date and time formatting in Chapter 11. At first glance, some of the conversion characters may not seem to do much. For ex‐ ample, the %s general string conversion in our previous example would actually have handled the job of displaying the numeric age argument just as well as %d. However, these specialized conversion characters accomplish three things. First, they add a level of type safety. By specifying%d, we ensure that only an integer type is formatted at that location. If we make a mistake in the arguments, we get a runtime IllegalFormatCon versionException instead of garbage in our string (and your IDE may flag it as well). Second, the format method is Locale-sensitive and capable of displaying numbers, Printf-Style Formatting 333 a percentages, dates, and times in many different languages just by specifying a Locale as an argument. By telling theFormatter the type of argument with type-specific con‐ version characters,printf can take into account language-specific localizations. Third, additional flags and fields can be used to govern layout with different meanings for different types of arguments. For example, with floating-point numbers, you can specify a precision in the format string. The general layout of the embedded format string is as follows: %argument_indexflagswidth.precisionconversion_type Following the literal% are a number of optional items before the conversion type char‐ acter. We’ll discuss these as they come up, but here’s the rundown. The argument in dex can be used to reorder or reuse individual arguments in the variable-length argu‐ ment list by referring to them by number. The flags field holds one or more special flag characters governing the format. The width and precision fields control the size of the output for text and the number of digits displayed for floating-point numbers. String Conversions The conversion characters s represents the general string conversion type. Ultimately, all of the conversion types produce a String. What we mean is that the general string conversion takes the easy route to turning its argument into a string. Normally, this simply means callingtoString() on the object. Since all of the arguments in the variable argument list are autoboxed, they are allObjects. Any primitives are represented by the results of callingtoString() on their wrapper classes, which generally return the value as you’d expect. If the argument is null, the result is theString “null.” More interesting are objects that implement thejava.util.Formattable interface. For these, the argument’s formatTo() method is invoked, passing it the flags, width, and precision information and allowing it to return the string to be used. In this way, objects can control their own printf string representation, just as an object can do so using toString(). Width, precision, and justification For simple text arguments, you can think of the width and precision as a minimum and maximum number of characters to be output. As we’ll see later, for floating-point nu‐ meric types, the precision changes meaning slightly and controls the number of digits displayed after the decimal point. We can see the effect on a simple string here: System.out.printf("String is '%5s'\n", "A"); // String is ' A' System.out.printf("String is '%.5s'\n", "Happy Birthday"); // String is 'Happy' 334 Chapter 10: Working with Text a