Archive for October, 2005

Update Jagger: Contacting Google

Okay, Brett Tabke decided to call it Update Jagger. Here are the ways that I’d use to contact us if you have feedback on Google search results:

Reporting spam in Google’s index
I especially want to hear about webspam that you see in Google. The best place to do that is to go to http://www.google.com/contact/spamreport.html . In the “Additional details:” section, I would use the keyword “jagger1″ (that’s “jagger” and the number one with no spaces in between).

Reporting non-spam issues or problems in Google’s index
Do the search that you’re interested in on google.com, then click the “Dissatisfied? Help us improve” link at the bottom right of the page. Again, fill in details and use the keyword jagger1 so that folks at Google can separate out feedback specifically about this update.

You think your site has been penalized
If your site is not showing up at all, and you recently had something like hidden text or hidden links on your pages, I would recommend doing a reinclusion request. I wrote up my advice on the best way to do a reinclusion request. Note that a reinclusion request won’t make much difference if our algorithms/scoring are what is affecting your site though.

You see a low-quality site that is running AdSense
If you run across a site that you consider spammy and it has AdSense on it, click on the “Ads by Goooooogle” link and click “Send Google your thoughts on the ads you just saw”. Enter the words spamreport and jagger1 in the comments field.

You want to talk about data center IP addresses amongst friends, or “update speculate”
Lots of search-related discussion goes on at WebmasterWorld, but bear in mind that you won’t be able to mention specific urls or searches on WMW. If you want to mention specifics to Google, I’d go with one of the ways above.

Hope that helps to let people know where to send their feedback based on what they see.

Comments (207)

More info on updates

I’ve already talked about index updates some in the past. These days rather than having a large monolithic update, Google tends to have smaller (and more frequent) individual launches. So I think my Sept. 8th, 2005 post on the subject of non-updates was just mentioning that some new backlinks were visible, but not much in the way of algorithms/scoring/metadata had changed. After that, some people might have noticed some changes around Sept. 22, 2005. Or some CJK folks might have noticed some changes around Oct. 5, 2005. Or a few European folks might have noticed some changes around Oct. 7, 2005. Or some people might have noticed some changes this past weekend.

My point is that more than ever, we are constantly working to improve our algorithms and scoring. Some changes are hardly noticed at all. Some changes (e.g. user interface improvements) are more visible. Some changes have nothing to do with spam, such as the changes for Chinese and Europe that I mentioned above. Some changes do try to decrease spam or increase core quality.

Just to give you a heads-up, I think a new set of backlinks (and possibly PageRank) will probably be visible relatively soon; I’m guessing within the next few days. I still expect some flux after that though, just to let you know.

Update: Just to clarify, these days with lots of smaller and larger changes happening at different points in time, it’s a little arbitrary to decide when to call something an update. That decision has usually fallen on Brett Tabke’s shoulders over at WebmasterWorld (WMW also chooses what name they want to call it when Brett decides enough has changed to call it an update.) Given that there should be new PageRank/backlinks visible in a few days (assuming no issues at our end), I wouldn’t be surprised if Brett slaps a name on it pre-emptively, even though there will still be some flux to come.

Comments (155)

Keep it coming, Gmail

Gmail is starting to grow on me. Its email spam detection is pretty dang good. My favorite new feature is the auto-save of draft emails. Mutt is single-threaded, so you pretty much have to start and finish one email at a time. And if your connection dies, you’re hosed. I’ve been using the GNU screen program to get around that, but auto-save is really nice for long emails, like working on an email interview on and off for a few days.

More people probably care about the free POP access or the new ability to export contacts easily, but I’m sticking with auto-save being my favorite new feature. Now if I could just easily skin things to color-code emails the way I’d like, import procmail filters, and score emails like mutt does. There’s probably other stuff I’d want too. Sigh. One of the most important things to remember when you work at Google is that you are rarely a typical user.

What feature would you like to see in Gmail that currently isn’t there?

Comments (86)

Moving to a new web host

Several people have asked for the recommended way to move to a new webhost or IP address without having problems in Google. I tested this method this past Sunday and everything worked fine for me. I’ll walk you through my example, which is moving mattcutts.com from one IP address to another IP address by changing hosts. This is not an example of moving mattcutts.com to someotherdomain.com. I’ll talk about that a bit at the end.

If you have a static site or can afford a day or so where your site can be in limbo between two IP addresses, life will be easier. If you have a dynamic site with databases and such, it’s trickier, even though the idea is the same.

Step 1. Find a good web host and sign up for an account.

Step 2: Make a back-up of your site at the new webhost.

Step 3: Change DNS to point to your new web host.

Step 4: Wait for the DNS change to propagate through the net.

Step 5: Once you are sure people or Googlebots are fetching from the new webhost/IP address, you’re done. You can shut down the old site.

Let’s talk through these in a little more detail.

Step 1. Find a good web host and sign up for an account.

Research + references should help you find a good host. I liked my current webhost (csoft.net) quite a bit and I did a lot of research, but the site readership was growing faster than I expected. I asked a (non-SEO) friend who runs a heavily-trafficked site what he uses, and he uses pair.com. In this example I’ll refer to things by IP addresses, and we’ll be moving from csoft with an IP address of 63.x.x.x to pair.com with an IP address of 65.x.x.x. Just as a reminder, DNS is the system that maps pretty names like www.google.com to an actual Internet Protocol (IP) address that a machine can use, such as 66.102.7.147.

Step 2. Make a back-up of your site at the new webhost.

If you have a static website, this isn’t that bad; just copy the entire file structure over to the new webhost and you’re done. Harder is something like a blog, which usually has a MySQL or other database for storing posts. Harder still is some e-commerce site that has to have its database kept in a sync’ed state. In that case, you might have to set up database replication between the old location and the new location while you are doing the transition.

But let’s take the example of a WordPress blog with a MySQL database that can be down for a few hours without too much trouble. Assume that you’ve already used tar or FTP to copy the static files from one webhost to another. First, you want to create a new MySQL database at the new web host. Ideally, you can make it have the same database name and user name. If not, you’ll want to tweak the WordPress wp-config.php at the new location to update the database/username/password/etc.

Now that you’ve got the MySQL database ready to copy over, dump the old MySQL database, copy it to the new webhost, and load your database at the new location. Those three commands would look like this:

mysqldump –add-drop-table -uoldusername -poldpassword olddatabase > mysqlbackup.20051009.sql

scp mysqlbackup.20051009 user@newhost:~/

mysql -unewusername -pnewpassword -hnewdatabasehost newdatabase < ~/mysqlbackup.20051009.sql

Bear in mind that you have a username/password to login to the old and new webhosts, but you also have separate username/password for the databases at each location as well. You might even have the MySQL database stored on a different host, which is why I showed the -h (host) option when restoring the database. Again, if the new host has different options for your database, you’ll need to edit your wp-config.php file or WordPress won’t be able to access your database at the new webhost.

Now you have identical copies of your site at two different locations. If you’re just running a blog with a comment or two a day, it’s not a big problem if someone posts a comment or otherwise changes your database while you’re doing the transition to a new web host. If you run a big, industrial-strength forum or e-commerce site, you’ll need to do extra work to keep the two databases and/or file systems synchronized.

Step 3: Change DNS to point to your new web host.

This is the actual crux of the matter. First, some DNS background. When Googlebot(s) or anyone else tries to reach or crawl your site, they look up the IP address, so mattcutts.com would map to an IP like 63.111.26.154. Googlebot tries to do reasonable things like re-check the IP address every 500 fetches or so, or re-check if more than N hours have passed. Regular people who use DNS in their browser are affected by a setting called TTL, or Time To Live. TTL is measured in seconds and it says “this IP address that you fetched will be safe for this many seconds; you can cache this IP address and not bother to look it up again for that many seconds.” After all, if you looked up the IP address for each site with every single webpage, image, JavaScript, or style sheet that you loaded, your browser would trundle along like a very slow turtle.

You can actually see the TTL for various sites by using the “dig” command in Linux/Unix:

% dig mattcutts.com

; < <>> DiG 9.2.1 < <>> mattcutts.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER< <- opcode: QUERY, status: NOERROR, id: 37526
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1

;; QUESTION SECTION:
;mattcutts.com. IN A

;; ANSWER SECTION:
mattcutts.com. 3572 IN A 65.181.152.150
...

In this case, the Time-To-Live for the IP address for mattcutts.com is 3572 seconds (a little less than an hour).

Time-To-Live is an important factor for a site’s DNS. Some sites like google.com, yahoo.com, and msn.com have really short DNS TTL settings like 300 to 900 seconds. Why? Well, if you have multiple data centers, you might want to take one data center down so that the data center mechanics can sprinkle fresh, magical index data onto the machines. With a short TTL, you could pull a data center’s IP address out of the rotation in just a few minutes.

That also helps explain the “Google Dance” of days gone by. The Google Dance would last for about a week, and people would see both old and new results, depending on which data center they happened to hit. The underlying reason was that each data center was brought down, loaded with new data or algorithmic settings, and then brought back up again. It took several days to switch the data at all data centers. During that time, webmasters used to love to check www2.google.com and www3.google.com because those DNS aliases usually pointed to the newest data centers. These days our production system is better equipped to switch things around quickly instead of over several days.

There, a little easter egg for the people who care about DNS. :) Okay, where were we? Right, switching DNS and Time-To-Live. You should care about TTL because if someone loads your website in their browser just before you update your DNS settings, and your TTL is one day, then that person’s browser will try to use your old IP address all that day.

In fact, it’s even worse. DNS is hierarchical. At the top of DNS there are 13 root servers that can handle any DNS lookup for a .com domain, but DNS caches flow all the way down to ISPs like Comcast or Cox. If someone on Comcast looked up your IP address just before you changed your DNS settings, all of Comcast would use the old IP address until the Time-To-Live expired.

So the upshot is that if you can make your TTL short (like an hour) instead of long (like a day), you’ll be in much better shape. Everyone will move to your new IP address in short order instead of having a mish-mash where some people are using the old IP address for hours.

The actual switchover process is pretty easy. Your new webhost will give you a pair of nameservers to use as the primary nameservers. If you have a domain registered with GANDI (the delightful French registrar that I happen to use), you go to account settings and switch from the old webhost’s nameservers to the new webhost’s nameservers. GANDI (and probably other registrars) is smart enough to recognize nameservers that are already present in the DNS system, so it can make the change pretty much immediately. If you’re going with a nameserver that no one has ever heard of before, you might have to wait 24 hours or so for things to percolate into the system.

Step 4: Wait for the DNS change to propagate through the net.

This is mostly a function of TTL and whether you’re switching to nameservers that are already present in DNS. Remember that DNS is hierarchical, and you have to wait for DNS caches to be flushed as Time-To-Live is exceeded. If you are using a smart registrar and a well-known set of new nameservers, the switch at the root level of DNS can be pretty quick. To verify that the root servers have the new nameserver, you can use the “dig +trace domain” command in Linux/Unix. The “+trace” option tells dig to go all the way up to the DNS root servers for the lookup:

dig +trace mattcutts.com

; < <>> DiG 9.2.1 < <>> +trace mattcutts.com
;; global options: printcmd

mattcutts.com. 172800 IN NS ns00.ns0.com.
mattcutts.com. 172800 IN NS ns176.pair.com.
;; Received 111 bytes from 192.5.6.30#53(A.GTLD-SERVERS.NET) in 84 ms

Above you can see that my nameservers have switched to pair.com nameservers. After that, you just have to wait for TTLs to expire for your new nameserver (and thus IP address) to wend its way out to everyone. If you are on a Windows XP system, you can use the command “ipconfig /flushdns” to flush your machine’s DNS cache, but it probably won’t do much good by itself. Remember that DNS is cached at each level, so your ISP probably has cached the previous IP address until the TTL expires.

Step 5: Once you are sure people or Googlebots are fetching from the new webhost/IP address, you’re done. You can shut down the old site.

When you ping your domain and see your new IP address, you know that you’re getting close. Previous visitors might still be using the old IP address from their DNS cache, but new visitors are getting the new IP address. It’s still a good idea to give a day or so in case anyone had a long Time-To-Live set, but most TTLs are a day or a few hours or less. After a day or so, it should be safe to deactivate the hosting at the old location. If you want to be ultra-safe, check your logs. When you see Googlebot fetching from the new webhost and no more visitors in your logs at the old location, it’s okay to turn off your old webhost.

Moving to a different domain

Now let’s talk for a minute about moving from mattcutts.com to someotherdomain.com. All other things being equal, I would recommend to stay with the original domain if possible. But if you need to move, the recommended way to do it is to put a 301 (permanent) redirect on every page on mattcutts.com to point to the corresponding page on someotherdomain.com. If you can map mattcutts.com/url1.html to someotherdomain.com/url1.html, that’s better than doing a redirect just to the root page (that is, from mattcutts.com/url1.html to someotherdomain.com). In the olden days, Googlebot would immediately follow a 301 redirect as soon as it found it. These days, I believe Googlebot sees the 301 and puts the destination url back in the queue, so it gets crawled a little later. I have heard some reports of people having issues with doing a 301 from olddomain.com to newdomain.com. I’m happy to hear those reports in the comments and I can pass them on to the crawl/indexing team, but we may be due to replace the code that handles that in the next couple months or so. If it’s really easy for you to wait a couple months or so, you may want to do that; it’s always easier to ask crawl/index folks to examine newer code than code that will be turned off in a while.

Comments (102)

Welcome to the mosh pit!

Looks like Joe Morin has joined the blogosphere with a bang. Check out one of his first posts, titled Danny Sullivan and Brett Tabke embrace spam. You have to check out the post to understand. :)

Update: I was mistaken. It turns outs that Danny and Brett were crushing spam between each other. You can actually see the spam buckling from their powerful grip. Sounds like Pubcon London was fun. :)

Comments (7)

Next entries » · « Previous entries