If you have a lot of urls that you don’t want in Google anymore, you can make the pages return a 404 and wait for Googlebot to recrawl/reindex the pages. This is often the best way. You can also block out an entire directory or a whole site in robots.txt and then use our url removal tool to remove the entire directory from Google’s search results.
What I would not recommend is sending tons (as in, thousands or even tens of thousands) of individual url removal requests to the url removal tool. And I would definitely not recommend making lots (as in, dozens or even more) of Webmaster Central accounts just to remove your own urls. If we see that happening to a point that we consider excessive or abusive, we reserve the right to look at those requests and responding by e.g. broadening, cancelling, or narrowing the requests.
So if you’re sending huge numbers of requests to our url removal tool, it might be a good idea to take a step back and ask whether you should be removing at the directory level instead.
Hi Matt,
Interesting, since I just posted last week this Q at webmaster central, where I said that we have removed thousands of webpages using the removal tool. We were bound to do it since the old CMS (OsCommerce) had propagated hundreds of new URLs to the each category, and google did not remove them from the index (and still is showing links from pages do not exist BTW)
Besides what you wrote here, is there any impact on ranking or the domain authority due to mass url removal?
Thanks!
What about the practice of mass 301 redirects instead of URL removal?
While it leaves the web cluttered with dead URLs, it does sooth peoples paranoid fear of losing pico-PR.
I have a populare classified ads website. People are really happy to have their page indexed quickly.
I often get the complain that their page is removed not quickly enough from Google’s index and that they still get unwanted calls (I do not use “no archive”).
Possible work around:
A: Add a “no archive” header/meta
Would be perfect solutions:
B: Being able to ping a google server with “to be removed urls”
C: Extending the sitemap.xml with “to be removed” urls
What do you think?
NB: 404 is properly served. Problem is only the lantency.
What I would love to see is a way to remove some of the 15,000 new gmail accounts that are reported to our antispam system each day, many of them very active and confirmed as being so. To date, every email to google about the amount of blog and forum spam being hosted on gmail, has been ignored.
If youre serious about spam, why not engage people that can help?
We practise similar to what Fred proposes with his A. Even though we have roughly 20-25 million urls, we keep a very up-to-date sitemap that we ping to Google. I have no real proof of this, but since keeping the sitemap well updated, we’ve had no problems with dead-urls calls to support from our customers. By just stopping pushing them to the xml, the problem seems to resolve.
The problem is that it’s impossible to remove a file, including all parameters, with a single removal request.
For example, if I want to remove viewtopic.php?t=1, viewtopic.php?t=2, …, viewtopic.php?t=9999, viewtopic.php?t=10000, I have to send 10000 removal requests. Waiting for Googlebot to reindex these pages is unacceptable because Googlebot crawls only ~150 pages a day.
I have never yet had to use the “removal tool”.
A permanent redirect (HTTP 301 response) to a “Sorry this is no longer available” page has always removed the “offending” page from Google’s index and the other SEs in a relatively short space of time.
Adding a
to the page should keep it from being listed in site: queries.
While links exist to a page SE bots will continue to request the URL and by necessity a 404 response is treated as a temporary problem. If you take a positive step and instruct ALL user agents that the page is NOT going to be available ever again.
SEs help you by sending visitors, why not reciprocate by helping the SEs “get it right” for your pages?
This bit:
“Adding a to the page should”
should be
Adding a “robots meta of noindex” to the page should
I guess code delimiters are not allowed at all π
Hi Matt
What if you want the page still visible to users, is noindex the best way to have it removed from the index?
Thanks
Mark Collier
I’m have the same question as Andrew Heenan:
“What about the practice of mass 301 redirects instead of URL removal?”
+1 For the question: βWhat about the practice of mass 301 redirects instead of URL removal?β
The problem with mass 301 redirects is that they don’t necessarily work in all cases.
For an old site with content that needs to be redirected to a new one with similar content, yes, absolutely. Or when old content can be redirected to newer content, then yes, that too. But it also depends on whether or not there is a newer destination.
What happens when you have a case where an entire product category is removed? Or a manufacturer/distributor line where the manufacturer is no longer in business? Where do you redirect that to? The home page? Another category that might not be similar? I think those are the types of cases being referred to here.
So maybe the question isn’t “what about the practice of mass 301 redirects instead of URL removal”, but rather, “in which circumstances would you prefer the URL removal tool be used, and what alternatives would you prefer people to use before using the URL removal tool?”
The Clarity Police will be at Matt’s door in about 4 minutes..
Unless you are building HUGE sites where you plan on having more than 10k url’s adding directories simply isn’t a best case in a smart phone, mobile, short URL world. adding directories/folders really doesn’t give you any advantages. So while this may not be a problem google had a big hand in creating, google has to accept Facebook, twitter, and an increasingly mobile online world google is forced to deal with the problems companies create that are in the best interests of THEIR users.
While it may not be in words the spirit was clearly do what’s best for the uses … ALL THE USERS … Not just the ones who’s interests align with googles
Matt, not sure what the problem is requesting multiple removals? Robots.txt and noindex can take months to take effect if the page isn’t visited that often by Googlebot. I recently had to manually remove 50+ directories using the removal tool as Roberts.txt and noindex were taking too long.
Jon
added … When I last spoke with some people from Google Analytics they said they had to “see” the 404 multiple times, before it was actually removed from the index. The logic was sites go offline for maintenance, unexpected hosting issues etc. Removing pages at the first 404 was “over-reactionary”. Those weren’t the exact words they used but a summation of the concepts.
I have seen many so called expert SEO wanting to do URL removals due to fear of the consequences of Panda update.
Their theory is that they should remove low quality pages because Google is not going to rank them anymore.
They hope that Google bots give the so called crawl budget to other pages that Google should rank better.
Is there any principle of truth in this theory, or is it yet another wild-guessing theory of those so called expert SEOs that do not really know what the Panda update means in practice?
Hi Matt, that’s great idea. But in my opinion, if we do nothing then Google also can do it automatically. I mean, if we have pages that does not exist again whether because has been removed by us, then let time do it for you. In the next crawl update, that pages will not exist again on Google.
Thank Matt.
I also +1 βWhat about the practice of mass 301 redirects instead of URL removal?β
PANDA = GAME OVER IN SEO.
Nothing really works on new domains
I’d also love to know your opinion on the “What about the practice of mass 301 redirects instead of URL removal?” issue π
“I have seen many so called expert SEO wanting to do URL removals due to fear of the consequences of Panda update. Their theory is that they should remove low quality pages because Google is not going to rank them anymore.”
Ummm….because Google said so. So the “so called experts” you are trying to make fun of, beat you on that department and are paying attention. But I say sell the worst of your pages to the big sites and they will rank just fine, the Huff Posts, eHows and Yahoo Answers will rank #1 even for a sentence. Panda LOVES big sites.
Matt, how long before we see an improvement, assuming that the ‘bad pages’ caused the problem and that the pages have been removed. Throw us a bone, it’s not a trade secret :)/ Days, weeks, months, should we get a McD job? Entire sites have been blacklisted by Panda, essentially no visitors.
hmmm…I think this post has raised an important question Matt.
I have always been told that over using the 301 redirect can upset Google … what really is the situation with this?
Should I be worried when my programming guys always have a code based “answer” for anything I ask them to do? For example, the use of a 301 as an “easy fix” instead of going to the trouble of changing actual site structure is one that has bothered me for a long time… am I just getting old, or is this a problem for Google?
I had a ton a bunch of WordPress plugins being indexed so I roboted them out but they did not go away after a few months so i then used the url removal tool. it was no more than a couple dozen though.
Also, all the other crawl errors which I could not find a good 1:1 301 for I let the wordpress plugin “link juice keeper” redirect them to my home page. I hope this is acceptable, I figure it is better than a user getting an error page, right!?
April Fools. π
It’s very nice to remove particular url from the Google search by using robots.txt. But if I have so many URL’s in different directory then how to remove each URL from Google search? I have to write each URL’s in robots.txt or there in only one tool to block all URLs from different directories.
Please reply…
Thanks,
Umesh Kumar
Thanks for the advice Matt. I actually didn’t know about the URL removal tool.
It’s a bit more polite to serve a 410 Gone than a 404 Not Found for content that used to be there. A 404 might mean “you’ve mistyped the URL”, whereas a 410 is a reassurance that you’re looking in the right place, but the content is no longer available.
RFC2616 Section 10
Thanks for this advice Matt
Thanks for the update. As you all know very well that 301i is for redirecting a page to other link so i think it has to do nothing with URL removal. But i am not clear with the statement “You can also block out an entire directory or a whole site in robots.txt and then use our url removal tool to remove the entire directory from Googleβs search results.” We first have to include robots.txt then use URL removal tool or we can directly use removal tool for removing URL’s from Google’s search results? Kindly throw some light on this.
I have used the URL removal tool in the past but I believe it removed the URL for a specific duration of time. Is that still the case?
Damn, so a 404 error tells google not to index the page anymore. My host went down a few weeks ago and was giving my site a 404 error for 2 mins. what if Googlebot crawled my site at that exact second? Would my site be delisted?
We get crawled nearly everyday but there are months old caches of pages still in the index that have the noindex property set on them but they haven’t been recrawled. Those URLs are not in a directory but do match a pattern. We have a sitemap that has all of the urls think should be indexed. Is there a way to just say that the sitemap is definitive?
Matt, I think Andrew and Multi-Worded Adam both hit the nail on the head here: essentially we’re looking for guidance on when to post 301 redirects versus a 404 response for a given page.
In my case, I have thousands of products that are unavailable on my web site… should I posts 404s for those or 301 redirect them up to their parent category or brand? The SEO side of my brain says to 301 redirect them upwards to related content so as to preserve authority. The human side says to 404 them if they no longer contain a useful experience.
Any thoughts? π
I find it quite a laborious task if I do have to remove multiple URLs. It would be great to have the option to upload an excel file or something similar. Any chance of something like this appearing soon? Thanks.
We had a very old Christmas forum on the site with posts dating to 1996 and 1997. The Christmas message board was quite popular – 4,500 plus pages where our users and customers traded Christmas collectibles. Instead of 404 those pages we did 410 via .htaccess on the whole directory tree to be removed permanently in hopes it would automate removing those URLs from all search engine indexes in one swoop as crawl bots encounter the 410 served. I learned something new today, I was not aware of remove the entire directory feature in WMT. On the other hand I think the 410 takes care of Google Bot and all the other bots in one swoop. Is 410 an acceptable approach?
Adrian Drysdale – yes, you will have to rebuild your whole site, everything from scratch π
Kidding. Google is not so simple/stupid to let 2 minutes down-time erase a lot of pages.
Performing individual 301 redirects, regex redirects, placing canonical tag systems and having an up to date robots.txt have all worked like a charm for me. I would take Matt’s directory approach for the robots.txt file or look for a way to use regex.
I’ve often wondered what if any damage is done by leaving those dead URLs. Obviously the bot can see that there is nothing there and if it comes back and keeps checking it doesn’t change the user experience at all. We all want cleanly coded sites, but in this situation the issue isn’t the site it’s G-bot not wanting to let go of the past.
@fred the tag you are looking to remove your classified ads is called “unavailable_after”
See the official description on googleblog.blogspot.com
Hi Matt, could you tell me whether we have to give Noindex For Categories, Archives and Tag Archives in a WordPress blog.
This is probably out of topic but I couldn’t find a better place to ask this question. π
Hi Matt, your link above to “remove the entire directory…” leads to a page which details how to remove single pages using the URL removal tool as opposed to a directories. How do you recommend removing pages at a directory level?
Thanks for giving us the heads up on this.
Hi Matt,
Could you let us know how much time Google does actually take to remove the URLs.
I’ve used both 404 redirect and submitted few URLs for removal but even after 4-6 months after removal request URLs still show up in the Webmaster Centre error report as not found.
404 is File Not Found.
301 is a permanent redirect.
In other words, 404 isn’t a redirect. That might well explain your problem.
i try robot txt, meta robot to no index, but my index just gone wild now it have 1,500,000 indexed pages, i feel like content farmer, but all i want it to just index 1000 or original page.
Damn you google panda, now i have too much visitor
@Ajay: if you are doing redirect and also send 404, there may be some issue with the status (3xx or 4xx) seen by the bot.
To solve/check the issue:
1.- go to Google webmaster tools (https://www.google.com/webmasters/tools/)
2.- select the website having the issue
3.- use the tool from the menu “Diagnostics > Fetch as Googlebot”
Do you know how many days Google needs to remove a page when you use URL Removal Tool? Is URL Removal Tool faster solution than 404 when the website is updated rarerly?
Thanks!!!
Matt,
The problem with returning a 404, or using robots.txt or any of those type of solutions, is that they assume you want the entire page to be removed. In the case of a page full of content where one part of it needs to be removed, these solutions don’t work. Sure, we could change the entire url of the page, but that means that anyone who bookmarked that page can no longer access it and, further, it breaks the url architecture of the site.
For example, lets say we had some user generated content that our moderators removed, but after google has crawled it. The user posts, say in response to a news article (or even like this page which is in response to your blog post) are displayed one after another on a page. I don’t want to have to 404 this page if I update, edit or remove the content or comments.
That doesn’t make sense. And, if the page is not recrawled often, there’s no other way to get the search result containing the old and removed content out of google’s index. Do you have a suggested solution for that?
Is there nothing specific that can be done to increase the speed of de-indexing pages? I know I’ve simply let the robots recrawl and find the 404, but that often takes a very long time. In the case I have a custom on-site Google search, this can be frustrating when you want to be sure non-existent pages don’t show for people searching the site.
Is there anything reliable that is faster than a pure 404 crawl?
Matt,
Has Google contemplated a non-directory oriented bulk removal tool based on regular expressions?
I suspect that something like that would benefit a lot of webmasters making good faith efforts to respond to Panda’s implications, while reducing the temptation to use the dubious removal techniques described.
Without such a tool, there isn’t a rapid, reliable means of removing URLs that aren’t in a directory. For example, we had a bunch of URLs on our site that followed the form:
http://www.job-search-engine.com/alternate?k=company%3AAccountemps&alt_id=0000000048fu4t&l=
They served a totally legitimate purpose – ironically, suppressing duplicates in our own search results – and were therefore ‘low quality/thin’, almost by definition. But, because they’re not in a directory, we’ve struggled to get them removed from the index, even before Panda hit. Even with 404s and Robots.txt restrictions in various combinations, we were seeing some very old uncrawled links sitting around the SERPS indefinitely. We eventually changed the whole URL structure, noindexed the new pages, and placed a generic page with a noindex tag at all of the old URLs. The count in the index is dropping, but it has been a month, and there are still millions floating around… which is frustrating and not beneficial to anyone.
Beyond the ‘no directory’ example, a removal tool with a bit more flexibility could be a huge help for sites looking to purge other low quality pages, which are often inadvertently created and comingled with high quality pages in the same directory. In those cases, using directory removal causes massive collateral damage. So, I’m sure I’m not the only one who’d appreciate the opportunity to cut pages with more precision, rather than hacking away with a cleaver and taking off limbs in the process…
I realize there are probably complexities to offering more flexible removal options, but hope you’ll give it consideration.
Thanks.
+1 for posting this one. I didn’t know about the URL removal tool as well.
This reminds me of the good old days of SEO, this is something we did ages ago:)
the point raised by Andrew Heenan is pretty much what i was wondering about…i have one site with truck loads of dead link somehow…the best i can do is a 301 redirect…
Matt, first of all I have been a long time reader of your blog but this is the first time I have made a comment. Thank you for all the insights and making your blog interesting with personal posts as well.
I would have to agree with Euan that a bulk removal tool would be very benefiicial and maybe I just lack the expertise, but he issue we are running into is there are hundreds and hundreds of pages on our sites that were indexed by Google (our fault for not having the noindex on them) but it looks like we are now being penalized for this. I would have thought the Google bot would have realized that these pages are to be ignored as we set a parameter setting in Google Webmasters to “Leg Google Decide”. Some pages on our site use this parameter and are realavant to be indexed while the others are not.
On the pages we added the NoIndex NoFollow parameter in the meta data but from reading your blog and several others, it appears this might take a while to be dropped. Note that the pages mentioned are due to Pagination parameters and therefore the only way Google will find the other pages is through the pagination. If we have a NoFollow on these pages, will Google never be able to find these pages and therefore not find the NoIndex?
I plan to post this in the Webmasters Forum but just thought I would share this with you since I agree a bulk removal tool would be very beneficial for circumstances like this. Also is there a way to remove pages that all have the same page name but different parameter? If I submitted a page to be removed, would that remove all pages with that parameter? Another thought to include in the removal tool.
Thank you for your time in reading my comment and hopefully we can all hear some of your additional thoughts on the matter of bulk removal.
Hi Matt,
Since you recommended not sending massive or URL removal requests, I thought I’d also ask about sending spam reports. I’ve recently found a vast link farm (hundreds, if not thousands of identical pages on different domains) Some of them google has clearly caught as they are not indexed, but some linger on. I have reported some, but I did not want to bog down an already ravaged spam team. Is there a better way to report the issue?
Many thanks,
Abi
Yes Matt,
i was confused on how to remove my url , Now i got it clear from your Post, i will better be adding some rules in robots.txt file and making 404 Pages for needed Urls.
Euan, the problem with bulk removal by regexp is that there’s no (practical and reliable) way for Google to verify that all URLs matching the regexp have actually been removed. So there is a danger of such a tool being used maliciously. I don’t think this problem is completely insurmountable, but it would be a considerable hassle for something that from Google’s point of view might be of only limited value.
Hey,
I removed some urls with the google tool. They have 404 and disallow via robots.txt but STILL they comeback as not found in my webmaster tools.
Anybody who can help me?
Greetings from Holland
Hi Matt, Can a removed URL re-index again ?
+1 For the question: βWhat about the practice of mass 301 redirects instead of URL removal?β
But I am also curious since the Panda update, I have noticed that my sites with a lot of 301’s seem to have dropped in rankings. So is the new recommendation to not have too many 301’s and if so,what is considered too many?
Matt,
I seem to remember Google recommending the use of HTTP “410 Gone” for this sort of thing. This makes more sense than “404 Not Found”.
This is also explained in http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
(see section 10.4.5).
Nice tip. Will this help to remove all the crawl errors I’m getting on my webmasters dashboard? How easy is this to fix for an SEO newbie?
Thanks
Nick
Mat,
It is better to use URL removal or 404? Which one is faster?
How long does it take the URL’s to be removed? Is it instant?
Good day everyone,
We have used for many years a CMS for our website and now it’s in plan to change it.
What happens with all the indexed pages from old portal? We’re talking about 5.000 pages.
There is any way to develope the new website (using a new CMS) without having an huge number of error pages in Google SERP after the old one will be closed?
Thank you.
This is why planning your website before the build is important. Make URL changes can be a huge problem for large sites, but sometimes can be easily done with url rewrites.
well sometimes unfortantely i find URLS that were requested to be removed long time ago again and again and are still indexed on googles index.
we are trying to do both removing the URL, making a 404 and using the url removal tool,
though we still find these pages.
as an SEO firm i still think google is got to improve on this one.
Today Used url removal tool at Google. Well I want to ask you that why you are using WordPress blog you can use blogspot. Do you feel WP is better than BlogspoT ???
Hi Matt,
What will really happen if I send many individual urls to remove? Get banned? Or just narrowing request?
Thanks.
I never knew that there was a directory removal option. That will definately come in handy when I redesign sites, and just can’t keep the original structure. Thanks.
Hi
I am having a serious issue with googlebot crawling pages on my website that have not existed since 2009 . Thousands of urls that I had blocked and removed suddenly reappeared as error 404 and the referring urls are other nonexistent urls from my domain. I have requested removal and also blocked via robots.txt but I very worried that this will effect my site negatively as new nonexistent urls have been popping up daily in webmaster tools. I cannot understand where these old urls are being referred from as my entire site and navigation were completely updated in 2009 and references to these urls have not existed since then . I have also crawled my site with zenu and my sitemap generator crawl does not show any reference to these urls so my cms is not generating them. Do I keep asking for removal in webmaster tools and blocking with robots.txt ? or is there another way to find out where these urls are being referred from and stop it that way ? Help please π I am in a slight panic over this and have searched exhaustively for answers ….
Thanks
Marissa
Hi Matt ,
I’ve bought website few moths ago that had almost 40 – 50 K pages indexed . Because it has a nice domain name and i didn’t care that almost 99 % of it’s pages broke almost every quality guideline basic principles because i’ve thought that i could remove all of them except index and start over.
The problem was that it had no folders ( http://www.example.com/folder/ ) it had a lot pages that didn’t had a specific starting pattern ( http://www.example.com/([a-z0-9-]+) and no specific ending ( .php , .html , … ) . To remove all pages from google’s index will take months to accomplish using a noindex meta tag. So …
I have a few suggestions for google url removal tool such as :
1. To be able to remove all urls except index.
2. To be able to remove all urls that have a parameter : /index.php?id=
3. To be able to remove all urls that have a starting pattern : /somenthing-([a-z0-9-]+)
Thank you.
Instead of that I would love to use .htaccess and redirect old pages to home so that the old pages will not crawled by search engines.
Thank You for Nice Tip Matt Sir π
@nindrianto- If your URLs are in different category or in different location, you can send removal requests from single Webmaster Tools Account.
If you have multiple dynamic urls, for instance:
http://www.somesites.com/forums/thread.php?threadid=12345&sort=date
http://www.somesites.com/forums/thread.php?threadid=67890&sort=date
http://www.somesites.com/forums/thread.php?threadid=13579&sort=date
is it possible to include a * to remove them all or do you have to individually post each one?
Hello Matt, due to the “Panda” strike here in german speaking countries in August I have removed over 40.000 articles (duplicate content or thin content) from my site pressemeldungen.at and can’t see a real improvement in visibility in rankings within several weeks. Of course thousands of 404-errors appear in my webmaster account and I have also been using the removal tool for some pages but no real effect on the whole. I am redirecting deleted articles to new posts per topic as well to make it easier for Google to index.
Search experts told me at an AdSense publisher event in Spring in Zurich that there was no duplicate content penalty. But there is obiously a Panda logarithm issue here. How long can it take to clean the index and make my website rank better since hundreds of self-written unique articles don’t rank like they used to due to “Panda”.
THANKS.
I removed lot of urls but slowly and in webmaster Google still show index pages higher than i have.