SMS Character Set & Sending special charcters
Like they say necessity is the mother of invention, today, i was forced to understand the GSM SMS standards. I was trying to send a range of special characters in my message. After hours of grueling work and debugging, I realized only a handful of special characters are allowed.
So, I dug up the standard for SMS - GSM 03.38. This corresponds to an ISO character set called ISO 8859-1, which is extremely similar to Microsoft's Windows-1252 character set.
Sending a message through Kannel
I was using Kannel, the open source SMS Gateway software to send out the SMSs. Now, Kannel accepts all messages posted over HTTP only in the Windows-1252 encoding.
So if you're using ASP.NET, you must URL encode your text using the Windows-1252 encoding before making the HTTP request to Kannel. Otherwise, the message received on the device on the other end will look like gibberish.
Receiving a message through Kannel (Kannel Post)
When Kannel receives a message, It tries to see if the character encoding matches ISO 8859-1. If it decoding the message fails using the 8-bit character set, it tries 16-bit Unicode Big Endian (UTF-16BE).
If it is configured to post the message to a designated URL, it will first URL Encode the received text using using the determined formatting, and then supply the character set in the URL as a query string parameter.
If you want to receive your messages in ISO 8859-1, it is important to stick to the characters defined in the set. Failing to do so will call your Post URL with Unicode encoded text.
Resources / References:
- Kannel: Open Source WAP and SMS gateway
- GSM 03.38 Character Set: Ref 1, Ref 2, ISO 8859-1 Mapping
- ISO-8859-1 Encoding, Windows 1252 Encoding

