Home >> Computers >> Software >> Globalization >> Character Encoding


  Arabic
Chinese
CJKV
Cyrillic
Greek
  Hangul
Hebrew
Indic
Japanese
Korean
  Latin
Native American
Unicode
Vietnamese


The character encryption consists of the code that pairs a set of characters (representations of graphemes or grapheme-rather units, like can pop up within an alphabet or syllabary for the communication of a natural language) with the placed of something else, like numbers or electrical pulses, sequentially to help a storage of text in computers and a transmission of text across telecommunication networks. Most common examples include Morse code, which encodes letters of the Latin alphabet as series of long & short depressions of the telegraph key; and ASCII, which encodes letters, numerals, & more symbols, each when integers and when Seven-bit binary versions of those whole number.

Character repertoire

Around the select few contexts, especially storage & communication, it add up to distinguish a character repertoire (the fully placed of abstract characters that the technique supports) from either a coded character placed or even character encryption (which specifies training represent characters from either that placed utilizing the total of whole number codes).

Within earliest times of computing, a introduction of character repertoires like ASCII (1963) & EBCDIC (1964) began the run of standardisation. the limitations of such sets presently became apparent, & a total of ad-hoc methods developed to extend the babies. A require to trend lines multiple writing systems, including the CJK family of East Asian scripts, required trend lines for a far pack of characters & demanded the orderly approach to character encryption like than the former ad hoc approaches.

E.g., a fully repertoire of Unicode encompasses over 100,000 characters. Every one characters has the unique whole number code in the range Cipher to hexadecimal 10FFFF (a little ended Single.One million, and then non totally whole number therein range represent coded characters). More most common repertoires include ASCII & ISO 8859-1, which mirror exactly the number 1 128 & 256 coded characters of Unicode severally.

Encoding forms and encoding schemes

Computer man of science another time overload a term character encryption to mean too how else the specific sequence of bits represent characters. This involves an encryption form which specifies the conversion of the whole number code into a series of whole number code values that help storage within the body that utilizes fixed bit breadth. E.g., whole number greater than 65535 ( hex FFFF) will non harmonise Sixteen bits, and so a UTF-16 encoding form mandates representation of these whole number as a foster pair of whole number, every less than 65536 & non assigned to characters (e.g., hex 10000 becomes the pair D800 DC00). An encryption scheme so converts code values to bit sequences, attentively given to items rather platform-dependent byte order issues (for example, D800 DC00 will get 00 D8 00 DC in an Intel x86 architecture). The character placed or even character map or even code page shortcuts this process by directly mapping abstract characters to specific bit system. [http://www.unicode.org/reports/tr17/ Unicode Technical Report #17] explains this nomenclature within depth & will bring more examples.

Since virtually all applications apply merely the little subset of Unicode, encoding schemes (prefer UTF-8 and UTF-16) and character maps (such as ASCII) provide effective ways to represent Unicode characters inside computer storage or communications by using short binary words. Occasionally one elementary encryption apply data compression techniques to represent the big repertoire by using a little total of codes.

Popular character encodings
ISO 646 ASCII EBCDIC ISO 8859: ISO 8859-1, ISO 8859-2, ISO 8859-3, ISO 8859-4, ISO 8859-5, ISO 8859-6, ISO 8859-7, ISO 8859-8, ISO 8859-9, ISO 8859-10, ISO 8859-11, ISO 8859-13, ISO 8859-14, ISO 8859-15, ISO 8859-16 DOS character sets: CP437, CP737, CP850, CP852, CP855, CP857, CP858, CP860, CP861, CP863, CP865, CP866, CP869 Windows character sets: Windows-1250 Windows-1251 for Cyrillic alphabets Windows-1252 Windows-1253 Windows-1254 Windows-1255 for Hebrew Windows-1256 for Arabic Windows-1257 Windows-1258 for Vietnamese KOI8-R, KOI8-U, KOI7 ISCII VISCII Big5 HKSCS Guobiao GB2312 GB18030 ISO 2022, Shift-JIS, EUC Unicode (and subsets thereof, like a Sixteen-bit 'Basic Multilingual Plane'). View UTF-8

HTML Validation: Using Character Encodings
How to validate HTML documents in various character encodings.

HTML Document Representation
Chapter covering document character sets and encodings in HTML from the World Wide Web Consortium's HTML 4.0 Specification.

World Wide Web Consortium
Covers code tables, Unicode, HTML and XML and links to other resources and discusses internationalization and localization issues relating to character sets.

EKI Letter Database
Query character sets, encoding, codepages and Unicode information in an easy-to-use web form. Held at the Institute of the Estonian Language.

Character Set Issues beyond HTML3.2
Internationalization issues beyond HTML3.2 and ISO-8859-1. Includes information on Baltic encodings.

Dan's Web Tips: Characters and Fonts
Hints and tips about character sets and fonts in web development. Includes links to related resources.

Xceed Binary Encoding Library
A library for Windows developers that allows applications to encode binary data and files into text and vice-versa.

Tutorial: Shady Characters
A tutorial that explains HTML character sets, character encodings and character references from Webreference.com.

Characters and Encodings
A tutorial on character code issues in digital processing and transfer of text data, on the Internet or otherwise. Includes tables and a detailed listing of control codes. In English and Finnish.

An Early History of Character Set Standardization
Covers the beginnings of the ASCII standards from ASCII-1963 onwards and information on Cyrillic, Japanese, Korean, Thai and Vietnamese encoding systems, including various localized versions of EBCDIC. With tables and links to other resources.


Computers: Data Formats
Computers: Data Formats: Document: Text: ASCII
Computers: Data Formats: Markup Languages: XML: Encoding
Computers: Software: Globalization: Fonts





© 2005 GeneralAnswers.org