Type to search

Pushing Bad Data- Google’s Latest Black Eye

General News

Pushing Bad Data- Google’s Latest Black Eye


Google stopped counting, or as a minimum, publicly showing, the number of pages it indexed in September of 05, after a college-yard “measuring contest” with rival Yahoo. That, remember, topped around 8 billion pages before being removed from the homepage. News broke lately via diverse SEO forums that Google suddenly added some other few billion pages to the index over the past few weeks. This may sound like a motive for a birthday party, but this “accomplishment” would not replicate well on the search engine that completed it.

Pushing Bad Data- Google's Latest Black Eye 1

What had the search engine optimization network humming become the character of the sparkling, new few billion pages. They had been blatant junk mail- containing Pay-Per-Click (PPC) advertisements and scraped content, and they were displaying up properly within the search outcomes in many cases. They drove out a long way older, extra setup websites. A Google representative responded via boards to the difficulty by calling it a “bad records push,” which met with diverse groans during the search engine marketing network.

How did a person dupe Google into indexing such many pages of junk mail in one of these brief periods? I’ll provide a high-level assessment of the method; however, don’t get too excited. Like a diagram of a nuclear explosive isn’t going to educate you on how to make the real element, you are not going a good way to run off and do it yourself after studying this article. Yet it makes for an interesting tale illustrating the unpleasant problems cropping up with ever-increasing frequency inside the Global’s most popular search engine.

A Dark and Stormy Night

Our tale begins deep within the coronary heart of Moldova, sandwiched scenically among Romania and Ukraine. In heading off neighborhood vampire attacks, an enterprising local had an exquisite concept and ran with it, probably far away from the vampires… His idea changed to take advantage of how Google treated subdomains, not only a bit but in a massive way.


The coronary heart of the difficulty is that currently, Google treats subdomains a lot the identical way because it treats complete domain names- as particular entities. This approach will add the homepage of a subdomain to the index and go back sooner or later to do a “deep crawl.” Deep crawls are the spiders following hyperlinks from the area’s homepage deeper into the website online until they find everything or give up and derive later for greater.

Briefly, a subdomain is a “third-degree domain.” You’ve likely seen them earlier than they appear something like this: subdomain.Domain.Com. For instance, Wikipedia uses them for languages; the English version is “en.Wikipedia.Org,” and the Dutch version is “nl.Wikipedia.Org.” Subdomains arrange big websites instead of multiple directories or even separate domain names.

So, we have a web page Google will index honestly, “no questions asked.” It’s a surprise no person exploited this case quickly. Some commentators believe this “quirk” change was introduced after the latest “Big Daddy” replacement. Our Eastern European pal collectively got some servers, content scrapers, spambots, PPC debts, and a few all-essential, very inspired scripts and combined them like this…

Five Billion Served- And Counting…

First, our hero here crafted scripts for his servers that could. At the same time, GoogleBot dropped away and started producing an endless variety of subdomains, all with an unmarried web page containing keyword-rich scraped content, keyworded hyperlinks, and PPC commercials for the one’s key phrases. Next, spambots are despatched to place GoogleBot at the fragrance via referral and comment spam to tens of thousands of blogs around the sector. The spambots provide a huge setup; getting the dominos to fall doesn’t take tons.

GoogleBot finds and follows the spammed links into the network, as is its reason in lifestyles. Once GoogleBot is sent to the web, the scripts going for walks the servers surely maintain generating pages- page after page, all with a unique subdomain, all with keywords, scraped content material, and PPC ads. These pages get listed, and unexpectedly, you have a Google index of three billion pages heavier in below 3 weeks. Reports imply, at the start, the PPC commercials on those pages had been from Adsense, Google’s very own PPC carrier. The final irony is Google’s financial advantages from all the impressions charged to AdSense users as they seem throughout those billions of spam pages. The AdSense revenues from this undertaking were the point, despite everything. Cram in such a lot of pages that, through sheer force of numbers, humans would discover and click on the advertisements in the one’s pages, making the spammer a pleasing income in a very brief time.

Billions or Millions? What is Broken?

Word of this achievement spread like wildfire from the DigitalPoint boards. It unfolds like wildfire in the SEO community, to be particular. The “trendy public” is out of the loop and could probably continue to be so. A reaction employing a Google engineer regarded on a Threadwatch thread approximately the topic, calling it a “terrible facts push.” Basically, the corporation line becomes they have no longer, in reality, added 5 billion pages. Later claims consist of assurances the difficulty can be constant algorithmically. Those following the situation (via tracking the recognized domains the spammer changed into using) see that Google is getting rid of them manually from the index.

Answers to Frequently Asked Questions About Google Classroom - The Tech Edvocate

The tracking is performed using the “website:” command. Theoretically, a command presentation is the total range of indexed pages from the site you specify after the colon. Google has already admitted there are issues with this command, and “5 billion pages” appear to be claiming is merely any other symptom of it. These troubles increase beyond simply the web page command, but the display of various outcomes for many queries, which some experience are especially misguided and fluctuate wildly in a few cases. Google admits they have indexed a number of these spammy subdomains but, up to now, haven’t supplied any exchange numbers to dispute the three billion showed initially through the website command.

Over the past week, the quantity of spammy domain names & subdomains listed has steadily diminished as Google employees put off the listings manually. Unfortunately, there’s been no professional announcement that the “loophole” is closed. This poses the plain trouble that, because of the manner shown, some copycats dash to cash in earlier than the algorithm is modified to cope with it.


There are, at minimum, two matters broken right here. The website: command and the difficult-to-understand, a tiny bit of the algorithm that allowed billions (or at least hundreds of thousands) of spam subdomains into the index. Google’s cutting-edge precedence ought to possibly be too close to the loophole before they’re buried in copycat spammers. The troubles surrounding the use or misuse of AdSense are just as troubling for those probably seeing little go back on their advertising budget this month.

Do we “preserve the religion” in Google inside the face of those activities? Most likely, yes. It isn’t so much whether or not they deserve that religion; however, most people will never recognize this befell. Days after the story broke, there may be minimal mention within the “mainstream” press. Some tech sites have stated it, but this isn’t always the form of a story to become on the nightly news, broadly speaking, because the heritage know-how required to apprehend it is going past what the common citizen can do to muster. Instead, the tale might emerge as an interesting footnote in that most esoteric and Neoteric of worlds, “SEO History.”

Mr. Lester has served for 5 years as the webmaster for ApolloHosting.Com and previously labored within the IT industry for 5 years, acquiring knowledge of website hosting, layout, etc. Apollo Hosting provides many customers with website hosting, e-commerce website hosting, VPS hosting, and internet layout services. Established in 1999, Apollo prides itself on the very best ranges of customer support.

Jacklyn J. Dyer

Friend of animals everywhere. Problem solver. Falls down a lot. Hardcore social media advocate. Managed a small team training dolls with no outside help. Spent high school summers creating marketing channels for Elvis Presley in Minneapolis, MN. Prior to my current job I was donating wooden trains in Hanford, CA. Spent the 80's getting my feet wet with accordians in Jacksonville, FL. Spent the 80's writing about crayon art in Africa. Managed a small team getting to know inflatable dolls in Gainesville, FL.