Buy Celebrex online without prescription, by Ron Daniel, Jr. (rdaniel@taxonomystrategies.com)
If you are creating a new taxonomy, perhaps as part of an intranet redesign, what should you expect to see when you start using it to categorize content. For example, Celebrex from canadian pharmacy, Order Celebrex from mexican pharmacy, if you had 500 articles and 100 categories in your taxonomy, should you expect to get roughly 5 articles in each category?
In a word, japan, craiglist, ebay, hcl, Buy no prescription Celebrex online, no.
The simple fact is that some things are more popular than others. This includes categories, buy Celebrex online no prescription. Boston, Massachusetts. Charlotte, Carolina, We would expect to see more articles on “Dogs” than on “Damselflies” in most collections of content. You should never expect to see the articles evenly divided amongst the categories (what the statisticians call a "uniform distribution").
If you made a plot of the number of articles assigned to each category, Detroit, Michigan, San Jose, California, Baltimore, Maryland. Milwaukee, Wisconsin, and arranged the most popular categories (the ones with the most articles in them) on the left, you should expect to see a long-tailed curve like the red one in figure 1, buy Celebrex without prescription. (The curve shown is what the statisticians call a “Zipf distribution”, which we will talk about soon), buy Celebrex online without prescription. Indianapolis, Indiana, San Francisco, California, The blue line below it is a uniform distribution of five articles per category.
Figure 1: Expected vs. Uniform Distribution of Articles to Categories
This curve shows that a few popular categories have many articles assigned to them, buy generic Celebrex, Köpa Celebrex online, Osta Celebrex online, Jotta Celebrex verkossa, and many categories have only a few articles assigned to them. This curve is an illustration of the well-known 80/20 rule, Celebrex for sale. Australia, uk, us, usa, In this case, roughly 80% of the articles will be classified into 20% of the categories, San Diego, California. Dallas, Texas. San Antonio, Texas. Buy Celebrex online without prescription, Whether we think this 80/20 behavior is a good thing or not is irrelevant; this kind of curve is the typical behavior you should expect to see. 1000mg, 2000mg, You need to find ways to use it instead of fight it. (It is worth noting that the curve above is based on a few assumptions – that we are looking at a random sample of the content, Indianapolis, Indiana, San Francisco, California, Where can i find Celebrex online, and that there are enough categories relative to the content that we don’t have to worry about behavior at the end of the tail. I’ll have more to say on those assumptions later.)
If we know how many articles we have that need to be classified, order Celebrex from mexican pharmacy, Buy Celebrex without prescription, we can make estimates of what this curve will look like before any tagging is done. (The actual results, Fort Worth, Texas. Denver, Colorado, Celebrex from canadian pharmacy, once the content is tagged, will be very different from the rough estimate, Celebrex over the counter, 400mg, 450mg, 500mg, 625mg, but they will be much closer to it than to a uniform distribution like the one in the blue line of figure 1.) The advantage of being able to make these estimates in advance is that it can give us an estimate on two important questions:
How big should my taxonomy be?
How much effort do I have to put into testing it?
I’ll tackle the first question in a future article; for now let’s look at the simpler issue of estimating the testing effort.
First, I should say that the question is backwards, buy no prescription Celebrex online. No matter what the "ideal" level of testing might be, the amount of testing that can be done is almost always set by the budget that is available, buy Celebrex online without prescription. 125mg, 150mg, 200mg, 250mg, So we will actually look at what kind of coverage of the taxonomy we can get for some set budget.
As mentioned above, we expect the distribution of the articles amongst the categories to have a “long tail” and thus to (very roughly) follow the 80/20 rule, online buying Celebrex. Buy Celebrex from canada, This has a huge effect on testing the taxonomy. One way to test is by tagging random samples of the content to see if their subject can be captured with the taxonomy, købe Celebrex online, αγοράζουν online Celebrex. Rx free Celebrex, Assume we have categorized N articles and have just seen a particular category for the first time. Buy Celebrex online without prescription, How many articles do we expect to have to tag before we see that category for the second time. Another N - which means if we want to see a category four or five times in testing, Celebrex for sale, New York. Los Angeles, California, we need a budget to tag 4N..5N articles. That could get expensive.
How many articles do we expect to categorize before we see the next category for the first time? Let M be that number, comprar en línea Celebrex, comprar Celebrex baratos. 5mg, 50mg, 75mg, 100mg, Recall that an 80/20 rule is in operation, so 80% of the articles we tag as we go from N to M will go into the top 20% of the categories – categories that have already had a lot of sample content assigned to them, Chicago, Illinois. Houston, Texas. Order Celebrex online overnight delivery no prescription, In other words, there are diminishing returns when tagging sample content, buy cheap Celebrex. 20% of the categories would consume 80% of our testing budget if we used solely random sampling.
Can we make a quantitative estimate now that we have a qualitative understanding of what to expect, buy Celebrex online without prescription. Jacksonville, Florida, Columbus, Ohio, There are a variety of long-tailed distributions that could be used instead of the simple Zipf distribution. Several of them have multiple parameters that can be adjusted so that one can fit more accurate curves to actual data, Celebrex coupon. Buy cheap Celebrex no rx, I’ll talk a little about those other distributions in my next article. However, Celebrex samples, Purchase Celebrex online no prescription, since we have not done any tagging yet, we don’t have any data to fit or any particular reason to think one model will be more accurate than another, Oklahoma City, Oklahoma. Las Vegas, Nevada. Buy Celebrex online without prescription, We are just trying to get a rough idea of how many categories will be seen and how many times we will see them, if we put in a certain level of tagging effort.
To make life simple, we will use the simplest Zipf distribution: N = K/r, where N is the number of items in category r, K is the number of items in the most popular category, and r is the popularity rank of the category. Kjøpe Celebrex online, bestill Celebrex online, The most popular category is r=1, next most popular is r=2, order Celebrex no prescription, Detroit, Michigan, San Jose, California, etc. This means that the most popular category will have K items in it, El Paso, Texas. Washington, D.C. Seattle, Washington, Acheter en ligne Celebrex, acheter Celebrex bon marché, the second most popular will have K/2, the third most popular will have K/3, where can i order Celebrex without prescription, 0.4mg, 0.5mg, 1mg, 2.5mg, etc.
So, if we want to know how many articles we must tag to see the first occurrence of the category of popularity rank r, farmacia Celebrex baratos, Celebrex online kaufen, Baltimore, Maryland. Milwaukee, Wisconsin, we can set N = 1. Rearranging the equation shows us that K = r, canada, mexico, india. Buy Celebrex without a prescription, In other words, we have to tag enough articles so that the most popular category has r articles in it before we expect to see the r’th most popular category assigned the first time, where can i buy cheapest Celebrex online. So how many articles is that, buy Celebrex online without prescription. Philadelphia, Pennsylvania. Phoenix, Arizona, Let A denote that number of Articles:
That second formula is well-known in mathematics as the harmonic series, buy Celebrex no prescription. Celebrex withdrawal, Here’s a graph showing how many articles to tag to reach a particular category (or, conversely, Boston, Massachusetts. Charlotte, Carolina, Buy Celebrex from mexico, how many categories we expect to see at least once if we tag some number of test samples). This figure also shows how many of those articles end up assigned to the top category, japan, craiglist, ebay, hcl, Celebrex price, and the top 6 categories. Notice that a significant fraction of the effort goes into tagging many samples for the top categories.
So, Nashville-Davidson, Tennessee. Portland, Oregon, Where can i buy Celebrex online, if you have a budget that will let you tag 200 random articles to test your taxonomy, you could expect (very roughly) to see about 45 different categories, purchase Celebrex online, to tag about 45 articles with the most popular category, and to tag about 110 articles with the top 6 categories. Buy Celebrex online without prescription, Take a look at that again – about half of your testing budget would be spent on the first 6 categories. This is what I mean by diminishing returns. We also expect, if the taxonomy were to have 100 categories in it, that tagging 200 randomly-selected items will test less than half of those categories.
If that is the bad news, what is the good news. Because of this expected behavior, getting the most common tags “right” will mean that it is relatively easy to get the tagging right for 70-80% of the content. Not surprisingly, if we have more content we will need to get more categories “right” to stay at the 80/20 place. Here’s a table of examples:
Articles in Collection | Articles in First Category | Expected # Categories for 80% of Articles | Expected # Categories for 100% of Articles |
25 | 9 | 5 | 10 |
100 | 28 | 11 | 29 |
500 | 104 | 30 | 105 |
1000 | 185 | 48 | 186 |
5000 | 775 | 105 | 776 |
10,000 | 1507 | 118 | 1508 |
(Once again, this table assumes the number of categories in the taxonomy is “large”, buy Celebrex online without prescription. The results would differ for a fixed-size vocabulary, such as 50 categories. I’ll talk about those wraparound effects in my next note.)
Note the last two columns. To get 80% of the content tagged “right” requires far fewer categories to be dealt with than the amount of content or even the total number of categories.
So what does this all mean?
- You cannot assume that content will distribute uniformly over the categories. Long experience at libraries and other places has shown that you must expect to see a long tail. Buy Celebrex online without prescription, Without any better information at the start, you can use the simple Zipf distribution to roughly estimate the long-tailed behavior.
- If you have a large collection, you should expect a large number of items in the most popular categories. You could try to split up the most commonly-used categories. That may shift you along the curve, but does not get away from the expected long tail. We believe the most effective way to subdivide those large categories is divide the taxonomy into multiple independent “facets”. A facet is a branch of the taxonomy with its own data field. For example, instead of having categories on “Travel Guides for Belize”, “Travel Guides for Brazil”, …, “Visa requirements for Belize”, “Visa requirements for Brazil”, … you could split that up into two separate lists and store the tagging of an article in two separate fields, buy Celebrex online without prescription. One field for locations (Belize, Brazil, …) and the other for content types (travel guides, visa requirements, photographs, weather history, ...) The tagging in both facets is expected to follow the 80/20 rule, but since the facets are independent, the second facet will divide the results from the first into smaller pieces. If you have 10,000 items in the collection and 1500 in the top category, adding an additional facet can be expected to cut the number in the largest group down to about 200 items. A third facet would cut that down to about 45 items in the largest group. (To be more accurate, this kind of reduction assumes that each of the three facets is “large” and that they are all independent. Buy Celebrex online without prescription, Neither of those is totally true in real life and I’ll take that up in my next article. Nevertheless, facets remain a favored technique even if we relax those assumptions.)
- Testing a taxonomy by tagging samples of content is valuable and you must do it. However, you must account for the diminishing returns. There are a few things to do about that so that testing will be more cost-effective:
- Only tag a modest number (20-100) of randomly-selected samples.
- Use non-random sampling to bring in a deliberate variety of content in areas that are known to be important.
- Use additional testing methods (review by Subject Matter Experts, Card Sorts, Navigation Usability Scenarios, etc.)
Next time I'll write about how to more closely estimate the size of a taxonomy that should be used when we remove some of the assumptions, such as the taxonomy being “large”. We will also look at more accurate models than the simplest Zipf distribution.
Similar posts: Buy Ventolin online without prescription. Baltimore, Maryland. Milwaukee, Wisconsin. Reasons to buy Wellbutrin SR online.
Trackbacks from: Buy Celebrex online without prescription. Buy Celebrex online without prescription. Buy Celebrex online without prescription. Nashville-Davidson, Tennessee. Portland, Oregon. Reasons to buy Celebrex online. Köpa Celebrex online, Osta Celebrex online, Jotta Celebrex verkossa. 400mg, 450mg, 500mg, 625mg. Buy Celebrex online no prescription. Celebrex over the counter. Fort Worth, Texas. Denver, Colorado. Celebrex snort, alcohol iteraction. Purchase Celebrex online. Comprar en línea Celebrex, comprar Celebrex baratos.



