Archive for the ‘ISO-8859-1’ Category


August 4, 2011 Leave a comment

I recently encountered a little piece of code which reminded me why it is important for a developer to RTFM.

I saw in some code base the following utility method:

public static String allowOnlyNumbers(String input)
   StringBuilder sb = new StringBuilder(input.length());
   for (int i=0; i<input.length(); i++)
      char c = input.charAt(i);
      if (Character.isDigit(c))
   return sb.toString();

This utility method was used in the following context:

PhoneDetails phone = new PhoneDetails();

So the method had to only allow characters in the range of ‘0’-‘9’. Looks OK, right?
I ran a quick test:


The result was as expected:


I always look at Java Core’s sources to learn, so I looked at isDigit(char) as well to see how they did it, and to my initial surprise, the code was NOT:

public boolean isDigit(c)
   return c>='0' && c<='9';

It was far more complex and seemed to support Unicode as well. So I ran another test on the code:

System.out.println(allowOnlyNumbers("\u06F1\u06F2\u06F3")); // Arabic digits 1, 2 and 3

The result was again as expected (but not as intended):


Needless to say, looking at the JavaDoc it is specified very clearly how this method behaves:

Determines if the specified character is a digit.

A character is a digit if its general category type, provided by Character.getType(ch), is DECIMAL_DIGIT_NUMBER.

Some Unicode character ranges that contain digits:

‘\u0030’ through ‘\u0039’, ISO-LATIN-1 digits (‘0’ through ‘9’)
‘\u0660’ through ‘\u0669’, Arabic-Indic digits
‘\u06F0’ through ‘\u06F9’, Extended Arabic-Indic digits
‘\u0966’ through ‘\u096F’, Devanagari digits
‘\uFF10’ through ‘\uFF19’, Fullwidth digits

Many other character ranges contain digits as well.

So the message here is (as Baz Luhrmann said):

Read the directions even if you don’t follow them.

You can see the code running here

%d bloggers like this: