Cataloging a world of languages

My university has a mandate to increase our international reach through research collaborations, courses offered, and support for international students.

From the technical services side, this means our catalogers must provide metadata for resources in unfamiliar languages, including some that don’t use the Roman alphabet. A few of the challenges we face include:

Identifying the language of an item (is that Spanish or Catalan?)
Cataloging an item in a language you don’t speak or read (what is this book even about?)
Transliterating from non-Roman alphabets (e.g. Cyrillic, Chinese, Thai)
Diacritic codes in copy cataloging that don’t match your system’s encoding scheme

I’d like to share a few free tools that our catalogers have found helpful. I’ve used some of these in other areas of librarianship as well, including acquisitions and reference.

Language identifiers

Sometimes I open a book or article and have no idea where to start, because the language isn’t anything I’ve seen before.

I turn to the Open Xerox Language Identifier, which covers over 80 different languages. Type or paste in text of the mysterious language, and give it a try. The more text you provide, the more accurate it is.

Language translators

Web translation tools aren’t perfect, but they’re a great way to get the gist of a piece of writing (don’t use them for sending sensitive emails to bilingual coworkers, however).

Google Translate includes over 75 languages, and also a language identification tool. Enter the title, a few chapter names, or back cover blurb, and you’ll get the general idea of the content.

Transliteration tables

If you catalog in Roman script, and you wind up with a resource in Cyrillic or Chinese, how do you translate that so the record is searchable in your ILS? Transliteration tables match up characters between scripts.

The ALA-LC Romanization Tables for non-Roman scripts are approved by the American Library Association and the Library of Congress. They cover over 70 different scripts.

Bibliographic dictionaries

We’re fortunate that librarians love to share: there are quite a few sites produced by libraries that look at common bibliographic terms you’d find on title pages: numbers, dates, editions, statements of responsibility, price, etc.

To share two Canadian examples, Memorial University maintains a Glossary of Bibliographic Information by Language and Queen’s University has a page of Foreign Language Equivalents for Bibliographic Terms.

If you’ve ever seen the phrase “bibliographic knowledge of [language]” in a job posting, this is what it’s referring to—when you’ve cataloged enough material in a language to know these terms, but can’t carry on a conversation about daily life. I have bibliographic knowledge of Spanish, Italian, and Germany, but don’t ask me to go to a restaurant in Hamburg and order a hamburger.

Subject-specific glossaries

Similar to bibliographic dictionaries, these are for terms common to specific subjects.

My university has significant music and map collections, so I often consult the language tools at Music Cataloging at Yale (…and I once thought music was the universal language) and the European Environment Agency’s Terminology and Discovery Service.

Diacritic charts

In order to ensure that accented characters and special symbols display properly in the catalog, it’s important to have the correct diacritic code.

Our system uses Unicode, and we often rely on the Unicode Character Code Chart or Unicode Character Table. Which interface you use is personal preference.

It may also be worth coming up with a cheat sheet of the codes you use most frequently – for example, common French accents if you’re cataloging Canadian government documents, which are bilingual.

Many Integrated Library Systems also have diacritic charts built in, where you can select the symbol you need and click it to place it in the record.

Diacritic guessers

Diacritic charts can be long and involved (the Unicode example above is a bit of a nightmare), so if you’re working with a new language, browsing through them searching for a specific code can be time-consuming. You can see the symbol in front of you, but have no idea what it’s called.

This is where Shapecatcher comes in. This utility allows you to draw a character using your mouse or tablet. It identifies possible matches for the symbol and gives you the symbol’s name and Unicode number.

Have you encountered issues handling different languages when cataloguing? Is there a free language tool you’d like to share? Tell us about it in the comments!

Credits: Image of Pieter Bruegel the Elder’s painting The Tower of Babel courtesy of the Google Art Project. Many thanks also to my colleagues Judy Harris and Vivian Zhang for sharing their language challenges and tools.

6 comments

Dale Swensen

October 1, 2014 at 7:13 PM

I’ve found the multilingual keyboards in Lexilogos to extremely valuable in reproducing cataloging data from nonroman text. Keyboards for most of the alphabetic scripts are laid out following roman equivalents which makes typing easy. Keyboards for syllabic or pictographic script seem fairly logical and intuitive as well. I’ve been able to transcribe data in many nonroman scripts for which I have minimal or no reading ability. The characters are standard Unicode so I can cut and paste text directly from Lexilogos into OCLC. Of course some scripts are more difficult than others and take more time to transcribe, but I’ve been amazed at what I can do.
Lexilogos also includes language dictionaries.
Pingback: Must-Read LITA Blog Post: Cataloging a World of Languages, by Leanne Olson | DipLawMatic Dialogues
Pingback: Cataloging a world of languages | LITA Blog | Veille juridique
Leanne Olson

October 2, 2014 at 4:09 PM

Wow, I’d never seen Lexilogos — this is *fantastic*! I’m going to pass this on to my cataloguers. Thanks for sharing, Dale.
Violet Fox

October 2, 2014 at 10:43 PM

When cataloging any language that uses Latin or Cyrillic scripts, the Google Goggles app is a lifesaver–just take a picture of the text and you get the option to run it through Google Translate automatically, so there’s no need to type it out. It’s great for large chunks of text and getting a sense of what the book is about. I only wished it also worked with Arabic or Asian characters (maybe someday…).
George Apodaca

October 3, 2014 at 3:17 PM

wordreference.com is a really neat forum that I use when translating a document from English to Spanish or vice versa. They have a handful of other languages too, and everyone there is really helpful!

Comments are closed.

6 comments

Dale Swensen

Leanne Olson

Violet Fox

George Apodaca