![]() |
|
||
To make it sure that the recipient sees the (X)HTML web page content as intended, it's necessary to indicate what character set was used when the page was created. This is especially true if the page contains characters not in the ISO 8859-1 character set.
The safest way to achieve this is to configure the web server to send character set information automatically along with other HTTP headers. In this article it'll be shown how to do this using Apache web server's .htaccess file mechanism and with PHP scripting language.
Unfortunately, it's not always possible to change the server's settings or to use scripting languages like PHP. This is why most browsers and other user agents honor character set information written into the head section of an (X)HTML document. This meta tag based solution should not be used as a primary mean if the server based solution is available. Occasionally, though, it's quite handy to be able to use both methods at the same time.
In this example we have two XHTML files stored into a web server, chinese.gb.html containing GB2312 coded Simplified Chinese data and chinese.b5.html having Big 5 coded Traditional Chinese characters. Our goal is to make the necessary configurations to correctly indicate what character set is used in these documents and make them as user friendly as possible also for offline use.
We can use Lynx browser or HTTP Header Viewer to check what HTTP headers the web server sends along with these files:
% lynx -head -dump http://www.example.com/chinese.gb.html
HTTP/1.0 200 OK
Date: Thu, 28 Jun 2001 12:09:04 GMT
Server: Apache
Last-Modified: Wed, 27 Jun 2001 12:20:16 GMT
ETag: "800103-1597-3b39cf80"
Accept-Ranges: bytes
Content-Length: 1305234
Content-Type: text/html
Age: 0
Proxy-Connection: close
We can see that the Content-Type: header lacks character set information. To add this piece of data we must create a file called .htaccess to a directory where the files are stored and make it readable by all. The file content can look something like this:
<FilesMatch "\.gb\.html?$">
AddType "text/html; charset=GB2312" .html .htm
</FilesMatch>
<FilesMatch "\.b5\.html?$">
AddType "text/html; charset=Big5" .html .htm
</FilesMatch>
After these modifications all files ending with .gb.html or .gb.htm should have a Content-Type: header containing charset data like here:
% lynx -head -dump http://www.example.com/chinese.gb.html
...
Content-Type: text/html; charset=gb2312
...
and all files ending to .b5.html or .b5.htm should return Content-Type: header containing big5 in charset:
% lynx -head -dump http://www.example.com/chinese.b5.html
...
Content-Type: text/html; charset=big5
...
In PHP HTML-embedded scripting language we can write:
<?php header ('Content-Type: text/html; charset=GB2312'); ?>
<?php echo ('<?xml version="1.0" encoding="GB2312"?>' . "\n"); ?>
...
or
<?php header ('Content-Type: text/html; charset=Big5'); ?>
<?php echo ('<?xml version="1.0" encoding="Big5"?>' . "\n"); ?>
...
to the very top of every XHTML file to mark them as containing GB2312 or Big5 encoded data.
Usually there is no need to anything else! However, if the server cannot be configured or when the web page author wants to make it sure that the pages work correctly in offline use also, it's necessary to do some modifications to HTML files. In the case of chinese.gb.html head section should look like:
<?xml version="1.0" encoding="GB2312"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh" lang="zh">
<head>
<title>Simplified Chinese</title>
<meta http-equiv="Content-Language" content="zh" />
<meta http-equiv="Content-Type" content="text/html; charset=GB2312" />
</head>
...
and in the case of chinese.b5.html
<?xml version="1.0" encoding="Big5"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh" lang="zh">
<head>
<title>Traditional Chinese</title>
<meta http-equiv="Content-Language" content="zh" />
<meta http-equiv="Content-Type" content="text/html; charset=Big5" />
</head>
...
Home | Software | Information | Etsin | Chinese | Christmas Calendars | Site Info |