Valid character sets for supported scripts

Discover the valid character sets for supported non-Latin scripts in Connexion client.

Arabic, CJK, Cyrillic, Greek, and Hebrew

Character sets for these scripts are listed in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media, Code Tables. These MARC-8 character sets are subsets of Unicode characters that are approved for use in MARC 21 cataloging.

Scripts defined by MARC-8 character sets are supported for bibliographic records and for variant name headings in authority records.

The following list defines the scope of valid characters in the Connexion client for Arabic (including Persian), CJK, Cyrillic, Greek, and Hebrew scripts:

Armenian, Bengali, Devanagari, Ethiopic, Syriac, Tamil, and Thai

These scripts are supported for bibliographic records only.

There are no defined MARC-8 character sets for Armenian, Bengali, Devanagari, Ethiopic, Syriac, Tamil or Thai. In addition, Connexion Client also supports Cyrillic characters outside the MARC-8 character set. OCLC implemented the following script identification codes for these scripts based on ISO 15924 Code Lists.

The following list shows the ranges of UTF-8 Unicode characters that define valid characters for these scripts in the Connexion client:

 Note: The client inserts Armn, Beng, Cyrl, Deva, Ethi, Syrc, Taml, or Thai, respectively, in field 066 of a bibliographic record to indicate that the script is used.  If multiple scripts are used, the notations are inserted individually, each in a separate subfield c.

Limitations on using Armenian, Bengali, Cyrillic (outside the MARC-8 character set), Devanagari, Ethiopic, Syriac, Tamil, and Thai scripts

Invalid characters in Connexion client

Any characters that are not included in the above lists of defined characters or that cannot be inserted via Edit > Enter Diacritics (or Enter Diacritics button or <Ctrl><E>) are invalid in the client. To include non-Latin characters that you need but that are invalid in Connexion client, you can:

 Note: Z39.50 access to WorldCat records also supports MARC-8 and Unicode UTF-8 character sets. See Z39.50 Cataloging for information on non-Latin script support in Z39.50.

Multiscripts in a single record are valid

Use as many supported non-Latin scripts as you need anywhere in a record, including within the same field.