Oracle – NLS


Oracle – NLS : Multilingual support: inserting and viewing strange Character Sets.

Introduction:

This is a manual for multilingual databases and how to insert, select, and update fields in different languages/character sets.

I wrote this manual because I spent way too much time figuring out how this works.

Another big problem is that when working on windows machines, you also have to set the NLS settings in your registry AND you’ll have to change the regional settings for your PC.

This results in numerous possible combinations. This manual will try to explain where you have to pay attention.

 

1. Setting up the database.

Your database should be setup in UTF-8 if you wish to implement ALL languages. The trick is that your database character set should include all wanted languages/character sets. Meaning your character set of your client MUST be a subset of the database character set.
EG: when you want to implement only Western European languages, you can use WE8ISO8859P1 because it supports the following Western European languages:

Table 3-12 WE8ISO8859P1 Example

Catalan

Finnish

Italian

Swedish

Danish

French

Norwegian

Dutch

German

Portuguese

English

Icelandic

Spanish

The reason WE8ISO8859P1 supports the languages above is because they are all based on a similar writing script. This situation is often called restricted multilingual support.

Western European Languages don’t cause that many problems. However if u want to implement also Russian, Polish and Japanese for example this character set will not be adequate. In that case you’ll need to use UTF-8 for your database character set. This situation is often called unrestricted multilingual support.

Please note that each client accesses only data it can process. If Japanese data were retrieved, modified, and stored back by the German client, all Japanese characters would be lost during the character set conversion.

2. The client settings:

Clients are programs you use to connect, insert or update your fields in your database. Meaning this can be numerous different programs. The first one that comes into mind is SQL+. SQL+ is not capable of using 8 bit character sets (like UTF-8) SQL+ (and TOAD as far as I know) are ONLY capable of using 7 bit character sets (like WE8ISO8859-1, which supports Western European Languages or the latin-based scripts.)
There is however a way of inserting strange character sets into an oracle UTF-8 database.
a.) Windows 98,2000, XP
à open regedit à change the ALL NLS_LANG entries to the character set you want to insert/update. The NLS_LANG keys can be found here:
- My Computer/KHEY_LOCAL_MACHINE/SOFTWARE/ORACLE
NLS_LANG

- My Computer/KHEY_LOCAL_MACHINE/SOFTWARE/ORACLE/ALL_HOMES/ID0/
NLS_LANG

- My Computer/KHEY_LOCAL_MACHINE/SOFTWARE/ORACLE/HOME0/

NLS_LANG

Change ALL these 3 registry keys into the character set you want to use.

The syntax for NLS_LANG is:
NLS_LANG=<language_territory>.<charset>
Eample: Dutch
à DUTCH_THE NETHERLANDS.WE8ISO8859P1 or if you want to use Arabic à ARABIC_UNITED ARAB EMIRATES.AR8ISO8859P6

Here is a link for most used NLS_LANG settings:
http://otn.oracle.com/products/reports/htdocs/getstart/docs/A92102_01/pbr_nls.htm#1005625

b) Windows 98,2000, XP à Start à settings à Control Panel à Regional Options:
Set both “Your Locale (location)” AND the default Language settings on the language you want to type. For example if you want to type Arabic: Remember to change both settings to the desired language (locale). Note that reboot is required after this change. Please also note that you will probably need the windows CD to install the new locale (unless you had installed them already).

The combination of these 2 settings (Regional Options + Registry NLS settings) will allow you to use strange character sets.

Please note that your client can vary: you can use SQL+ (or TOAD) but these programs doesn’t allow you to insert 8 bit character sets, it only is capable of handling 7 bit character sets: meaning you’ll have to switch between your input locales (Alt-Shift). For example: if you have set your NLS_LANG settings in your registry AND you have chosen the right system locale in your regional options (let’s take the above example of Arabic) you can set your input locale on English, and just start typing into SQL+ or TOAD: you just start in English: SELECT * FROM "USER" WHERE USERFIRSTNAME=
Now switch between your input locale (Alt-Shift) to Arabic: and continue typing:
ضضصقثقثشصض

This results in something like this: SELECT * FROM "USER" WHERE USERFIRSTNAME='ضغفغغفغض'

As you can see you now are able to use 2 character sets together. Use the same technique to insert strange languages in your database:

INSERT INTO "USER" Values ('00000000.0000.0001','ضصصثضصث','صضثضصق','1000','26-08-2003')

Please note that this way of working is only needed for programs that don’t allow 8 bit character sets. For example: I have a production database in oracle inUTF-8 for my online application, but my offline application is in Filmaker. I have this plugin (PDM SQL Plugin) which allows me to directly connect to SQL / oracle databases using ODBC. Filemaker IS capable of using 8 bit characters, so here you’ll have to change your NLS_LANG Settings in your registry in a different way:

NLS_LANG=POLISH_POLAND.UTF8 

Please note that the program you use to insert / update fields should also be set to 8 bit characters: For filemaker this means using the Unicode font (I just used the standard Arial Unicode MS font). I had to test all different combinations before I found out what the problem was. Here an overview of what went wrong with different settings (in this example it’s polish text with registered and trade mark signs):

After checking many possibilities I found one that works perfectly. First of all, the client from where you are inserting the polish text must be set to polish (I work on a Windows 2000) : --> Control Panel --> Regional Options --> Your Locale (in the general tab) must be set to Polish. Also the " Set Default" must be set to Polish (!!! Restart is required afterwards!!!).
Here are all the combinations + what goes wrong with each one:

1) NLS_LANG=POLISH_POLAND.EE8ISO8859P2 (in the registry) + Arial CE (In Filemaker) --> Polish chars are ok, but ™ (trademark signs) are being replaced by '?' in Oracle
2) NLS_LANG=POLISH_POLAND.UTF8 (in the registry) + Arial CE (in Filemaker) --> l,a,e (polish chars) are being replaced by ² ³ ? in Oracle
3) NLS_LANG=POLISH_POLAND.EE8ISO8859P2 (in the registry) + Arial Unicode MS (in Filemaker) --> l,a,e (polish chars) are being replaced by ² ³ ? in Oracle
4) NLS_LANG=POLISH_POLAND.UTF8 (in the registry) + Arial Unicode MS (In Filemaker) --> l,a,e (polish chars) + ™ (trademarks) are all stored perfectly in Oracle.

Please Note that I wrote this manual based on a Windows 2000 PC and an Oracle 9i database with the UTF-8 character set.


0 Response to "Oracle – NLS"

Post a Comment

Powered by Blogger