At Datafusion and later Metacode, the information management start-ups where Taxonomy Strategies was born, we were asked to come up with a general-purpose business taxonomy – something that could be applied to any business and could be delivered out-of-the-box with an enterprise software product. We knew that there were pre-existing general-purpose business taxonomies developed by the publishers who indexed business publications such as ProQuest’s ABI/INFORM (http://www.proquest.com/en-US/catalogs/databases/detail/abi_inform.shtml). We struggled with this one for a long time but never really came up with a set of general-purpose business categories that could be used on a general-purpose business intranet.
Since then, Taxonomy Strategies has built a lot of custom taxonomies for all sorts of organizations that have many different kinds of content collections. In the process, we have found that while there may not be a generic business taxonomy available, there are some common characteristics and practices that have emerged.
Facets. Taxonomies work best when they are faceted, so breaking-up a taxonomy into a few mutually exclusive divisions which we call facets is a good idea. The facets actually constitute a small metadata schema, which is a refinement of the Dublin Core, the ISO standard for resource description (http://www.niso.org/standards/z39-85-2007/). Facets allow for simpler taxonomies with fewer terms in each division to be created and maintained. But facets provide a lot of detail or granularity because they work like a matrix where each facet is an axis. In a faceted taxonomy, categories are defined by any possible combination of terms from the facets. In the following example from my local taqueria, 3 facets which contain only 31 terms can be combined to express as many as 587 categories.
|
Type |
Main Ingredient |
Sides |
|
Tacos |
Plain |
Sour Cream |
|
Quesadillas |
Arroz y Frijoles |
Cheese |
|
Burritos |
Veggie |
Guacamole |
|
|
Grilled Veggie |
Rice |
|
|
Marinated Tofu |
Beans |
|
|
Soyrizo |
Rice & Beans |
|
|
Mole |
Chips & Salsa |
|
|
Chile Verde |
Salsa |
|
|
Carne Asada |
Veggies |
|
|
Pollo |
Tofu or Soyrizo |
|
|
Camarones |
Pollo |
|
|
Prawn |
Carne Asada |
|
|
Pescado |
Shrimp |
|
|
|
Prawn |
|
|
|
Pescado |
Universals. It turns out that there is a common set of facets that recur in almost every taxonomy. Unfortunately most of these vocabularies need to be purpose built for each project, but there are some reusable parts. These universal facets are:
- Content Types. In well-structured environments, these are the templates for each form of content. But typically, we need to define an abstraction that content can roughly be sorted into to answer the question: This is a _____ (brochure, fact sheet, memo, presentation, report, whitepaper, etc.)
- Locations. Much content is related to a geo-spatial location, for example where a photograph was taken.
- Organizations/People. The most important description of what content is about or related to is often the names of important organizations or people.
- Products/Services. For most businesses much content is related to specific products and services.
- Functions. In the best of all worlds, the function or purpose of each content item would be specified when it is created. But typically, we need to define an abstraction that content can roughly be sorted into to answer the question: The purpose of this content is _____ (financial reporting, human resources management, marketing, sales, etc.)
- Attributes. Products (and sometimes services) have certain characteristics that distinguish one item from another such as brand, size, material, color, etc.
Usually, there will be some categories left when all of the universals have been factored out. This residue is any other topic and we usually gather this into a facet called Other Topics.
Filters. In addition to a common set of tags, there are some universal filters that are natural for people to use to sub-divide and group content. Searching for content should be like searching for shoes, and it is frequently like that on intranets. Some universal search filters are:
- Attributes. When shopping, people are familiar with filtering a category by one or more attributes such as brand, size, material, color, etc.
- Date. When searching for content, people are familiar with sorting search results by date. This date is usually when the content is created, not when it was last accessed by the web server. Content may separately be related to an event which has a date or dates associated with it such as Iraq War or January White Sale.
- Location. If content has been tagged with a location, then a map interface can be generated to browse or navigate the index of those geo-spatial coordinates.
- Content Types. It is natural for people to filter their search results by types of content such as images, videos, news, shopping, etc.
Of course the availability and quality of filters depends on whether and how well content has been tagged. And if content has been tagged using more of the universal facets, then more ways to slice and dice content could be available.
Tags: dublin core, faceted taxonomy, metadata, search filters