Encoding, Unicode and all that is between them

March 14, 2011 Leave a comment

I have worked with many developers in the past several years, and I noticed something which I find peculiar – developers are able to understand the flow of a large scale multi-threaded application with more ease than of understanding the concept of Unicode.

After reading a very good article about Unicode I have decided to share that article as I think it is something that we should all be aware of when writing international software.

I recommend – read this article, it will fill in gaps you might not know you have.

Important note for Java developers – Java maintains all characters in memory in UTF-16 format – so it is important to understand that even a simple text like ‘Hello‘ will consume 10 bytes of memory and not 5 bytes.

