What Little I Know About Spamming
I am not guilty of any of the following. Until I read the Google Help section I had never heard of most of the terms before. Furthermore, I’m an amateur and if you look at my earnings you’ll wonder why I stick around punching keys. It’s because someday I hope to be smart enough and good enough to make some money the fair and legal way, without any of the following tactics some people use.
Google, please don’t close down my exciting new site called http://BestHealthWebSites.blogspot.com Let me tell you why this site is so important to everyone, especially families that have medical conditions or concerns, all 300 million Americans.
With this new site folks can make just one stop for all of their medical, weight loss, beauty creams, detoxifying, health and exercise needs. What an innovation – one stop shopping! It’s called http://BestHealthSites.blogspot.com and the idea for this web site came from a book by Dr. Arlan R. Weinberg, M.D. called Dr. Weinberg’s Guide to the Best health Resources on the Web.
I hope it’s a huge success, attracting thousands of visitors daily so that I can use AdSense and other affiliate advertising and make a ton of money. That may be overly optimistic because I refuse to build traffic via Gestapo methods – things that look like spam to a crawler.
My first idea was to work with book author Dr. Steinberg, but I’m glad he didn’t respond. His site is limited to medical sites only, while mine will include the other five disciplines listed above. I emailed him to say I would be happy to work for or with him in placing his site on the web, either as his employee or partner. To his detriment he did not respond.
So I started organizing my own web without plagiarizing his book or anyone else’s, avoiding the direct copying of his “resources” or the web sites he used. That doesn’t mean I won’t ever run into some of them, because the best ones are presumably in his book and that’s the quality I want to build into The Best Health Site. What I am doing is using an outline similar to his, jumping on my Google Search Engine for applicable sites. Like recipes, subject outlines and titles are not copyrightable.
There are a lot of sites under each category. Then came this notice that Google thought I was spamming. I’m not smart enough to spam anyone, but that can’t be my excuse. I got a notice like this before on another site and after the Web Crawler investigated they gave me a clean bill of health. I think they will on this one too.
To create this “filter” web site, I need a lot of web site addresses on my site, and onto each I will add hyperlink code so that when my readers see something they want to study they can just click a link and be there, making navigating easier. I’ve got to have some advantages to justify my site’s existence. If there’s something wrong with this, then I plead guilty.
But I am not guilty of any of the items found under “Help” on Google which define spamming:
According to Google, Spamdexing (also known as search spam or search engine spam) involves a number of methods, such as repeating unrelated phrases, to manipulate the relevancy or prominence of resources indexed by a search engine, in a manner inconsistent with the purpose of the indexing system. Some consider it to be a part of search engine optimization, though there are many search engine optimization methods that improve the quality and appearance of the content of web sites and serve content useful to many users. I’m finding that search engines use a variety of algorithms to determine relevancy ranking. Some of these include determining whether the search term appears in the META keywords tag, others whether the search term appears in the body text or URL of a web page. Many search engines check for instances of spamdexing and will remove suspect pages from their indexes. Also, people working for a search-engine organization can quickly block the results-listing from entire websites that use spamdexing, perhaps alerted by user complaints of false matches. The rise of spamdexing in the mid-1990s made the leading search engines of the time less useful.
Google bombing is another form of search engine result manipulation, which involves placing hyperlinks that directly affect the rank of other sites. Google first algorithmically combated Google bombing on January 25, 2007.
The problem arises when site operators load their Web pages with hundreds of extraneous terms so search engines will list them among legitimate addresses. The process is called “spamdexing,” a combination of spamming — the Internet term for sending users unsolicited information — and “indexing.” 
More Kinds of Spam
This involves the calculated placement of keywords within a page to raise the keyword count, variety, and density of the page. This is useful to make a page appear to be relevant for a web crawler in a way that makes it more likely to be found. Example: A promoter of a Ponzi scheme wants to attract web surfers to a site where he advertises his scam. He places hidden text appropriate for a fan page of a popular music group on his page, hoping that the page will be listed as a fan site and receive many visits from music lovers. Older versions of indexing programs simply counted how often a keyword appeared, and used that to determine relevance levels. Most modern search engines have the ability to analyze a page for keyword stuffing and determine whether the frequency is consistent with other sites created specifically to attract search engine traffic. Also, large web pages are truncated, so that massive dictionary lists cannot be indexed on a single webpage.
2, Hidden or invisible unrelated text
Disguising keywords and phrases by making them the same color as the background, using a tiny font size, or hiding them within HTML code such as “no frame” sections, ALT attributes, zero-width/height DIVs, and “no script” sections. However, hidden text is not always spamdexing: it can also be used to enhance accessibility. People screening websites for a search-engine company might temporarily or permanently block an entire website for having invisible text on some web pages.
3. Meta tag stuffing
Repeating keywords in the Meta tags, and using meta keywords that are unrelated to the site’s content. This tactic has been ineffective since 2005.
“Gateway” or doorway pages
Creating low-quality web pages that contain very little content but are instead stuffed with very similar keywords and phrases. They are designed to rank highly within the search results, but serve no purpose to visitors looking for information. A doorway page will generally have “click here to enter” on the page.
Scraper sites, also known as Made for AdSense sites, are created using various programs designed to ‘scrape’ search-engine results pages or other sources of content and create ‘content’ for a website. The specific presentation of content on these sites is unique, but is merely an amalgamation of content taken from other sources, often without permission. These types of websites are generally full of advertising (such as pay-per-click ads), or redirect the user to other sites. It is even feasible for scraper sites to outrank original websites for their own information and organization names.
5. Link spam
Davison defines link spam (which he calls “nepotistic links”) as “… links between pages that are present for reasons other than merit.”  Link spam takes advantage of link-based ranking algorithms, such as Google‘s PageRank algorithm, which gives a higher ranking to a website the more other highly ranked websites link to it. These techniques also aim at influencing other link-based ranking techniques such as the HITS algorithm.>
6. Link farms
Involves creating tightly-knit communities of pages referencing each other, also known humorously as mutual admiration societies
7. Hidden links
8. “Sybil attack”
This is the forging of multiple identities for malicious intent, named after the famous multiple personality disorder patient “Sybil” (Shirley Ardell Mason). A spammer may create multiple web sites at different domain names that all link to each other, such as fake blogs known as spam blogs.
9. Spam blogs
Spam blogs, also known as splogs, are fake blogs created solely for spamming. They are similar in nature to link farms.>
10. Page hijacking
This is achieved by creating a rogue copy of a popular website which shows contents similar to the original to a web crawler but redirects web surfers to unrelated or malicious websites.
11. Buying expired domains
Some link spammers monitor DNS records for domains that will expire soon, then buy them when they expire and replace the pages with links to their pages. See Domaining. However Google resets the link data on expired domains.
Some of these techniques may be applied for creating a Google bomb, this is, to cooperate with other users to boost the ranking of a particular page for a particular query.
12. Cookie stuffing
This involves placing an affiliate tracking cookie on a website visitor’s computer without their knowledge, which will then generate revenue for the person doing the cookie stuffing. This not only generates fraudulent affiliate sales, but also has the potential to overwrite other affiliates’ cookies, essentially stealing their legitimately earned commissions.
13. Using world-writable pages
Web sites that can be edited by users, such as Wikis, blogs that allow comments to be posted, etc. can be used to insert links to spam sites if the appropriate anti-spam measures are not taken.
14. Spam in blogs
This is the placing or solicitation of links randomly on other sites, placing a desired keyword into the hyperlinked text of the inbound link. Guest books, forums, blogs, and any site that accepts visitors’ comments are particular targets and are often victims of drive-by spamming where automated software creates nonsense posts with links that are usually irrelevant and unwanted.
15. Comment spam
Comment spam is a form of link spam that has arisen in web pages that allow dynamic user editing such as wikis, blogs, and guestbooks. It can be problematic because agents can be written that automatically randomly select a user edited web page, such as a Wikipedia article, and add spamming links.
16. Wiki spam
Using the open editability of wiki systems to place links from the wiki site to the spam site. The subject of the spam site is often unrelated to the wiki page where the link is added. In early 2005, Wikipedia implemented a default ‘nofollow‘ value for the ‘rel’ HTML attribute. Links with this attribute are ignored by Google’s PageRank algorithm. Forum and Wiki admins can use these to discourage Wiki spam.
When someone accesses a web page, i.e. the referee, by following a link from another web page, i.e. the referrer, the referee is given the address of the referrer by the person’s internet browser. Some websites have a referrer log which shows which pages link to that site. By having a robot randomly access many sites enough times, with a message or specific address given as the referrer, that message or internet address then appears in the referrer log of those sites that have referrer logs. Since some search engines base the importance of sites by the number of different sites linking to them, referrer-log spam may be used to increase the search engine rankings of the spammer’s sites, by getting the referrer logs of many sites to link to them.
Other types of spamdexing
Hosting of multiple websites all with conceptually similar content but using different URLs. Some search engines give a higher rank to results where the keyword searched for appears in the URL.
Cloaking refers to any of several means to serve a page to the search-engine spider that is different from that seen by human users. It can be an attempt to mislead search engines regarding the content on a particular web site. Cloaking, however, can also be used to ethically increase accessibility of a site to users with disabilities or provide human users with content that search engines aren’t able to process or parse. It is also used to deliver content based on a user’s location; Google itself uses IP delivery, a form of cloaking, to deliver results. Another form of cloaking is code swapping, i.e., optimizing a page for top ranking and then swapping another page in its place once a top ranking is achieved.
To report spamdexed pages
Search engine help pages for webmasters
Other tools and information for webmasters
- AIRWeb series of workshops on Adversarial Information Retrieval on the Web
- Online tool that detects spam techniques on web pages
- A list of open proxy and bot IP’s. Ban IP’s on this list to prevent comment spam. Updated weekly.
- Protecting Your Wiki From Spam