Glossary - C

captcha
Usually a picture representing a group of letters and numbers for a human visitor to enter as a code. This method is used to put a stop to automated submissions , since it's quite difficult for non humans to interpret (possibly distorted) pictures and distil the correct information. In short: captchas are pictures showing passwords used to combat automated blog and comment spamming .

citation analysis
A tool initially deveoped in information science to identify the core (most cited) set of documents for a given topic. In the context of web searching and search engine algorithms the term "link analysis" is more commonly used.

citation count
The number of times a document is referenced by other documents in the same collection . Citation count (or link count) differs from link popularity in that only the number of citations (links) and not the quality of the links is considered.

click tracking
Search engines can track user clicks in order to "learn" from users which pages are most relevant to a query . The best-known example is that of "Direct Hit", a discontinued search engine that not only tracked clicks but also logged the amount of time users spent on pages returned in order to improve relevance .

client
A computer, program or process requesting information from a server. Email programs are sometimes called e-mail clients. They request e-mail messages from pop3 servers. Spiders (like Googlebot ) and browsers (like Internet Explorer and Netscape) are also clients.

click through (click-through; clickthrough)
Referring to the action of clicking through from, for example, a search engine 's results page to a web site. Click through rates are especially useful in Internet advertising where it is an important factor in determining the success of an advertisement.

cloaking
The practice of delivering content based on the IP address of the client . The practice is sometimes defended by saying it's a way of protecting code from theft. It should be noted that the practice of cloaking can get your site banned from the search engines . For a detailed discussion on cloaking and links to cloaking resources, please refer to the Search Engine Yearbook .

closed loop
Used to describe a link ing structure where a group of web pages interlink heavily while there are few or no links to or from pages outside the group. General consensus is that search engines can detect closed loops and penalize pages in closed loops. It is currently unclear exactly where the cut-off point is. Is it only a closed loop if there are no links to or from pages outside the group or also if there are just too few such links? It is generally advisable to have links to outside pages that in turn also link to many outside pages.

cluster
Search results grouped together (to save space on the SERP ), usually based on a shared top-level domain .

clustering
A technique the search engines use to group different pages from the same domain in their search results pages. Without clustering, the top spots for certain search terms are often completely dominated by one site. Clusters usually consist of one or two pages from one domain with a link that says something like "More results from pandecta.com". The term differs from terms like classification , taxonomy building, tagging, etc. in that it is fully automated. Further human intervention is not needed.

code bloat
When a web page or site is so full of code (scripts, font tags, redundant HTML) that it becomes hard to edit, slow to download, and more difficult for search engines to index .

collaborative filtering
Also known as "social filtering". A technique used to improve relevance , it returns documents other users with similar queries found relevant. This technique is also very effective in cross selling, as seen at Amazon.com ("People who bought 'Mary's Guide to Fast Food' also bought 'Jane's Recipes' ")

collection fusion
The practice of combining search results from multiple collections . Meta search engines are faced with the problem of effectively combining & re-ranking results that have already been ranked by different algorithms .

comment
Comment tags (in HTML ) allow the site designer to enter comments explaining the code, making it more understandable for human readers. Comments are not displayed by the browser . Comments are enclosed by the comments tag: <!-- like this -->. The comment tag is also used to enclose scripts, ensuring that the raw code is not displayed on non-compliant browsers. Comment tags are sometimes loaded with keywords to artificially inflate a page's ranking . Loose that sparkle in your eye though… most search engines ignore comment tags completely.

content-based filtering
Filtering documents by extracting some or all of the content contained in each document. Modern search engines all use content-based filtering in combination with either filtering mechanisms. Best known of these other mechanisms is Google 's PageRank system that measures inbound links from other documents.

conversion cost
Total cost per sale, calculated by dividing the total cost of an advertising campaign by the number of resulting sales. For example, if $1000 is spent on an advertising campaign and that campaign results in 20 sales, the conversion cost per sale is $50 ($1000 / 20). That means it costs $50 to generate one sale.

conversion rate (CR)
The percentage of site visitors that deliver the most wanted response (MWR). The CR is an important measure of the effectiveness of the online sales effort. For example, if 4 out of every 100 visitors to a site deliver the MWR, the CR for that site is 4%.

counter / page counter
Typically accompanied by something like "You are visitor number ___ since Oct 2001". Counters count page views , not visitors . The difference is that one visitor can generate many page views by opening many pages on the site. Counters offer a relatively inaccurate way to measure site traffic and are generally considered amateurish. Log files offer far more accurate and comprehensive visitor data.

CPA
Cost per action.

CPC
Cost per click. The total cost of an advertising campaign divided by the resulting number of unique visitors . Sometimes also used as a synonym for PPC .

CPL
Cost per lead. The total cost of an advertising campaign divided by the resulting number of new leads .

CPM
Cost per thousand impressions (M= Roman numeral for 1000). A pricing system often used in the banner advertising industry. Typically a fixed price is offered for 1000 impressions of a banner. The price is usually influenced by the topic of the site (how targeted the audience is) rather than the popularity of the site.

CPS
Cost per sale

crawl
What spiders do. It refers to the action of following links to navigate from page to page and site to site.

cross linking
Referring to links between a family of domains - for example your business site, your personal homepage and your cat's homepage. Cross linking is sometimes used to inflate link popularity . Although not yet proven (to my knowledge), excessive cross linking is widely believed to be penalized by the search engines .