“For a large enterprise to share information across diverse product lines and functions, a common language or taxonomy is required to classify the information. The best way to develop the common taxonomy is to look at the hierarchies currently in use.”

– David Lamar Smith, Halliburton Global Technical Services Chief


Automated Classification

Use of technology to organize content into groups so it can be retrieved when needed. The result of automatic classification is either a content collection clustered into groups (possibly a candidate taxonomy), or content categorized according to a pre-existing taxonomy. The best results are obtained by defining a business process that combines manual and automated processing so that technology is leveraged and human editorial input is optimized.

Dublin Core

A set of 15 metadata elements (the Dublin Core Metadata Element Set) used to describe and catalog content so it can be discovered and retrieved. The Dublin Core is the de facto standard for cataloging web content.

Information Retrieval Technologies

Automated methods to analyze, classify, search for, and retrieve text. The basic principles of information retrieval or IR are based on research done in the 1940’s and 1950’s. The key observation was that word frequency provides a useful measure of significance. Many refinements have been made to this simple observation utilizing statistics, linguistics, logic, and clever combinations of one or more methods.


A common set of attributes that contain critical information to describe and catalog content. The basic concept behind metadata has been used to organize content since the beginning of clay tablet and papyrus scroll collections 3000 years ago. Card and book catalogs and bibliographic databases have used a commonly understood metadata standard to organize large collections.

Dublin Core metadata example:

Dublin Core Elements
Asset metadata—
The Who, Where and When
Title, Creator, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language
Subject metadata—
The What and Why
Subject, Description, Coverage
Relational metadata—
Links between Assets
Use metadata—
How to Monetize Assets



Overall scheme for organizing content to solve a business problem such as improving search, browsing for content on an enterprise-wide portal, enabling business users to syndicate content, and otherwise providing the basis for content re-use. The basic idea behind taxonomy is to provide a controlled vocabulary for metadata attributes, and to specify relationships between terms in the controlled vocabulary. The simplest relationships are broader, narrower, and related, but relationships can be much more specific and complex. Click here for a glossary of taxonomy terms.

UNSPSC Taxonomy example:

Prepared and preserved foods Broader term
Snack food
Corn chips Narrower term
Popcorn Narrower term
Potato chips Narrower term
Pretzels Narrower term
Beer Related term


XML Schema

Data models expressed in XML. XML schema provide a means for defining and implementing a consistent structure or syntax, and semantics for XML documents that allow machines to carry out rules made by people. A facetted taxonomy provides the names of metadata elements and a consistent set of attribute values or vocabularies for filling the elements in an XML schema.

[Last updated 2012-02-29]

[image above:
Gaussian Scatter from Wikipedia, the free encyclopedia (en.wikipedia.org/wiki/File:

FaceBook Twitter LinkedIn