start page | rating of books | rating of authors | reviews | copyrights

Book Home Java Servlet Programming Search this book

Appendix E. Charsets

Table E-1lists the suggested charset(s) for a number of languages. Charsets are used by servlets that generate multilingual output; they determine which character encoding a servlet's PrintWriter is to use. By default, the PrintWriter uses the ISO-8859-1 (Latin-1) charset, appropriate for most Western European languages. To specify an alternate charset, the charset value must be passed to the setContentType() method before the servlet retrieves its PrintWriter. For example:

res.setContentType("text/html; charset=Shift_JIS");  // A Japanese charset
PrintWriter out = res.getWriter();  // Writes Shift_JIS Japanese

Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Also, the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.

Table E-1. Suggested Charsets

Language

Language Code

Suggested Charsets

Albanian

sq

ISO-8859-2

Arabic

ar

ISO-8859-6

Bulgarian

bg

ISO-8859-5

Byelorussian

be

ISO-8859-5

Catalan (Spanish)

ca

ISO-8859-1

Chinese (Simplified/Mainland)

zh

GB2312

Chinese (Traditional/Taiwan)

zh (country TW)

Big5

Croatian

hr

ISO-8859-2

Czech

cs

ISO-8859-2

Danish

da

ISO-8859-1

Dutch

nl

ISO-8859-1

English

en

ISO-8859-1

Estonian

et

ISO-8859-1

Finnish

fi

ISO-8859-1

French

fr

ISO-8859-1

German

de

ISO-8859-1

Greek

el

ISO-8859-7

Hebrew

he (formerly iw)

ISO-8859-8

Hungarian

hu

ISO-8859-2

Icelandic

is

ISO-8859-1

Italian

it

ISO-8859-1

Japanese

ja

Shift_JIS, ISO-2022-JP, EUC-JP[1]

Korean

ko

EUC-KR[2]

Latvian, Lettish

lv

ISO-8859-2

Lithuanian

lt

ISO-8859-2

Macedonian

mk

ISO-8859-5

Norwegian

no

ISO-8859-1

Polish

pl

ISO-8859-2

Portuguese

pt

ISO-8859-1

Romanian

ro

ISO-8859-2

Russian

ru

ISO-8859-5, KOI8-R

Serbian

sr

ISO-8859-5, KOI8-R

Serbo-Croatian

sh

ISO-8859-5, ISO-8859-2, KOI8-R

Slovak

sk

ISO-8859-2

Slovenian

sl

ISO-8859-2

Spanish

es

ISO-8859-1

Swedish

sv

ISO-8859-1

Turkish

tr

ISO-8859-9

Ukranian

uk

ISO-8859-5, KOI8-R

[1] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-JP character set by the name EUCJIS, so for portability you can set the character set to EUC-JP and manually construct an EUCJIS PrintWriter.

[2] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-KR character set by the name KSC_5601, so for portability you can set the character set to EUC-KR and manually construct a KSC_5601 PrintWriter.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.