Search Results for: seo analysis

SEO Advice: linkbait and linkbaiting

(just a quick post.)

On a meta-level, I think of “linkbait” as something interesting enough to catch people’s attention, and that doesn’t have to be a bad thing. There are a lot of ways to do that, including putting in sweat-of-the-brow work to generate data or insights, or it can be as simple as being creative. You can also say something controversial to generate discussion (this last one gets tired if you overuse it, though). Sometimes even a little bit of work can generate a reason for people to link to you.

Example 1: Danny Sullivan actually sat down and checked the spam filtering accuracy of SpamCop, Yahoo Mail, and Gmail. And not once or twice, but three different times. Personally, counting false positives in your Spam folder would annoy me to death, but putting in that work generates insights on the differences between the competing services. Admittedly, the results will vary by individual, but as the great Fred Brooks would remark, often some data is better than no data at all. Now Danny doesn’t need any more links than he already has, but it’s producing info-laden content that makes a site or blog well-known over time.

Example 2: You can be creative. I’m happy to link to Marc Hil Macalua for a creative app that he wrote in which you can vote on head-to-head battles between SEOs. The Ning service attempts to make it easy for people to write social web apps, so this is a really easy app to create: just drop in your own photos and you’re good to go. Now did Marc do a ton of work? Well, a little bit, but not a ton. But he had that creative insight of something that would grab people’s attention and generate discussion. By the way, it looks like someone is click-spamming on DaveN’s behalf. 😉

Example 3: Saying something controversial. You can be cheeky, like Threadwatch, or you can be incredibly earnest. I give the creator of Google Watch credit for staking out the “anti-Google” territory way before anyone else. Later, Andrew Orlowski probably realized that taking potshots at Google or blogs was a way to generate lots of discussion. By the time it trickles down to sites like FuckedGoogle or whatever, it gets to be “done”–that niche is starting to be tapped out. So how do you take a new approach?

Example 4: Back to something creative: the Google: Evil or Not? site. The site takes RSS feeds that mention Google, lets people vote between Real Good or Real Evil, and adds a graph. It took a little bit of work, but probably not a ton. How much work would it be to extend that to another subject, like graphing the mojo levels at the Yahooplex as it waxes or wanes?

Linkbaiting sounds like a bad thing, but especially if it’s interesting information or fun, it doesn’t have to have negative connotations. I hereby claim that content can be both white-hat and yet still be wonderful “bait” for links (e.g. Danny’s spam email analysis). And generating information or ideas that people talk about is a surefire way to generate links. Personally, I’d lean toward producing interesting data or having a creative idea rather than spouting really controversial ideas 100% of the time. If everything you ever say is controversial, it can be entertaining, but it’s harder to maintain credibility over the long haul.

SEO advice: url canonicalization

(I got my power back!)

Before I start collecting feedback on the Bigdaddy data center, I want to talk a little bit about canonicalization, www vs. non-www, redirects, duplicate urls, 302 “hijacking,” etc. so that we’re all on the same page.

Q: What is a canonical url? Do you have to use such a weird word, anyway?
A: Sorry that it’s a strange word; that’s what we call it around Google. Canonicalization is the process of picking the best url when there are several choices, and it usually refers to home pages. For example, most people would consider these the same urls:

  • www.example.com
  • example.com/
  • www.example.com/index.html
  • example.com/home.asp

But technically all of these urls are different. A web server could return completely different content for all the urls above. When Google “canonicalizes” a url, we try to pick the url that seems like the best representative from that set.

Q: So how do I make sure that Google picks the url that I want?
A: One thing that helps is to pick the url that you want and use that url consistently across your entire site. For example, don’t make half of your links go to http://example.com/ and the other half go to http://www.example.com/ . Instead, pick the url you prefer and always use that format for your internal links.

Q: Is there anything else I can do?
A: Yes. Suppose you want your default url to be http://www.example.com/ . You can make your webserver so that if someone requests http://example.com/, it does a 301 (permanent) redirect to http://www.example.com/ . That helps Google know which url you prefer to be canonical. Adding a 301 redirect can be an especially good idea if your site changes often (e.g. dynamic content, a blog, etc.).

Q: If I want to get rid of domain.com but keep www.domain.com, should I use the url removal tool to remove domain.com?
A: No, definitely don’t do this. If you remove one of the www vs. non-www hostnames, it can end up removing your whole domain for six months. Definitely don’t do this. If you did use the url removal tool to remove your entire domain when you actually only wanted to remove the www or non-www version of your domain, do a reinclusion request and mention that you removed your entire domain by accident using the url removal tool and that you’d like it reincluded.

Q: I noticed that you don’t do a 301 redirect on your site from the non-www to the www version, Matt. Why not? Are you stupid in the head?
A: Actually, it’s on purpose. I noticed that several months ago but decided not to change it on my end or ask anyone at Google to fix it. I may add a 301 eventually, but for now it’s a helpful test case.

Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized?
A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID).

Q: Let’s talk about the inurl: operator. Why does everyone think that if inurl:mydomain.com shows results that aren’t from mydomain.com, it must be hijacked?
A: Many months ago, if you saw someresult.com/search2.php?url=mydomain.com, that would sometimes have content from mydomain. That could happen when the someresult.com url was a 302 redirect to mydomain.com and we decided to show a result from someresult.com. Since then, we’ve changed our heuristics to make showing the source url for 302 redirects much more rare. We are moving to a framework for handling redirects in which we will almost always show the destination url. Yahoo handles 302 redirects by usually showing the destination url, and we are in the middle of transitioning to a similar set of heuristics. Note that Yahoo reserves the right to have exceptions on redirect handling, and Google does too. Based on our analysis, we will show the source url for a 302 redirect less than half a percent of the time (basically, when we have strong reason to think the source url is correct).

Q: Okay, how about supplemental results. Do supplemental results cause a penalty in Google?
A: Nope.

Q: I have some pages in the supplemental results that are old now. What should I do?
A: I wouldn’t spend much effort on them. If the pages have moved, I would make sure that there’s a 301 redirect to the new location of pages. If the pages are truly gone, I’d make sure that you serve a 404 on those pages. After that, I wouldn’t put any more effort in. When Google eventually recrawls those pages, it will pick up the changes, but because it can take longer for us to crawl supplemental results, you might not see that update for a while.

That’s about all I can think of for now. I’ll try to talk about some examples of 302’s and inurl: soon, to help make some of this more concrete.

Google incorporating site speed in search rankings

(I’m in the middle of traveling, but I know that a lot of people will be interested in the news that Google is incorporating site speed as one of the over 200 signals that we use in determining search rankings. I wanted to jot down some quick thoughts.)

The main thing I want to get across is: don’t panic. We mentioned site speed as early as last year, and you can watch this video from February where I pointed out that we still put much more weight on factors like relevance, topicality, reputation, value-add, etc. — all the factors that you probably think about all the time. Compared to those signals, site speed will carry much less weight.

In fact, if you read the official blog post, you’ll notice that the current implementation mentions that fewer than 1% of search queries will change as a result of incorporating site speed into our ranking. That means that even fewer search results are affected, since the average search query is returning 10 or so search results on each page. So please don’t worry that the effect of this change will be huge. In fact, I believe the official blog post mentioned that “We launched this change a few weeks back after rigorous testing.” The fact that not too many people noticed the change is another reason not to stress out disproportionately over this change.

There are lots of tools to help you identify ways to improve the speed of your site. The official blog post gives lots of links, and some of the links lead to even more tools. But just to highlight a few, Google’s webmaster console provides information very close to the information that we’re actually using in our ranking. In addition, various free-to-use tools offer things like in-depth analysis of individual pages. Google also provides an entire speed-related mini-site with tons of resources and videos about speeding up websites.

I want to pre-debunk another misconception, which is that this change will somehow help “big sites” who can affect to pay more for hosting. In my experience, small sites can often react and respond faster than large companies to changes on the web. Often even a little bit of work can make big differences for site speed. So I think the average smaller web site can really benefit from this change, because a smaller website can often implement the best practices that speed up a site more easily than a larger organization that might move slower or be hindered by bureaucracy.

Also take a step back for a minute and consider the intent of this change: a faster web is great for everyone, but especially for users. Lots of websites have demonstrated that speeding up the user experience results in more usage. So speeding up your website isn’t just something that can affect your search rankings–it’s a fantastic idea for your users.

I know this change will be popular with some people and unpopular with others. Let me reiterate a point to the search engine optimizers (SEOs) out there: SEO is a field that changes over time, and the most successful SEOs embrace change and turn it into an opportunity. SEOs in 1999 didn’t think about social media, but there’s clearly a lot of interesting things going on in that space in 2010. I would love if SEOs dive into improving website speed, because (unlike a few facets of SEO) decreasing the latency of a website is something that is easily measurable and controllable. A #1 ranking might not always be achievable, but most websites can be made noticeably faster, which can improve ROI and conversion rates. In that sense, this change represents an opportunity for SEOs and developers who can help other websites improve their speediness.

I know that there will be a lot of discussion about this change, and some people won’t like it. But I’m glad that Google is making this step, both for the sake of transparency (letting webmasters know more about how to do better in Google) and because I think this change will make the web better. My takeaway messages would be three-fold: first, this is actually a relatively small-impact change, so you don’t need to panic. Second, speeding up your website is a great thing to do in general. Visitors to your site will be happier (and might convert more or use your site more), and a faster web will be better for all. Third, this change highlights that there are very constructive things that can directly improve your website’s user experience. Instead of wasting time on keyword meta tags, you can focus on some very easy, straightforward, small steps that can really improve how users perceive your site.

PageRank sculpting

People think about PageRank in lots of different ways. People have compared PageRank to a “random surfer” model in which PageRank is the probability that a random surfer clicking on links lands on a page. Other people think of the web as an link matrix in which the value at position (i,j) indicates the presence of links from page i to page j. In that case, PageRank corresponds to the principal eigenvector of that normalized link matrix.

Disclaimer: Even when I joined the company in 2000, Google was doing more sophisticated link computation than you would observe from the classic PageRank papers. If you believe that Google stopped innovating in link analysis, that’s a flawed assumption. Although we still refer to it as PageRank, Google’s ability to compute reputation based on links has advanced considerably over the years. I’ll do the rest of my blog post in the framework of “classic PageRank” but bear in mind that it’s not a perfect analogy.

Probably the most popular way to envision PageRank is as a flow that happens between documents across outlinks. In a recent talk at WordCamp I showed an image from one of the original PageRank papers:

Flow of PageRank

In the image above, the lower-left document has “nine points of PageRank” and three outgoing links. The resulting PageRank flow along each outgoing link is consequently nine divided by three = three points of PageRank.

That simplistic model doesn’t work perfectly, however. Imagine if there were a loop:

A closed loop of PageRank flow

No PageRank would ever escape from the loop, and as incoming PageRank continued to flow into the loop, eventually the PageRank in that loop would reach infinity. Infinite PageRank isn’t that helpful 🙂 so Larry and Sergey introduced a decay factor–you could think of it as 10-15% of the PageRank on any given page disappearing before the PageRank flows along the outlinks. In the random surfer model, that decay factor is as if the random surfer got bored and decided to head for a completely different page. You can do some neat things with that reset vector, such as personalization, but that’s outside the scope of our discussion.

Now let’s talk about the rel=nofollow attribute. Nofollow is method (introduced in 2005 and supported by multiple search engines) to annotate a link to tell search engines “I can’t or don’t want to vouch for this link.” In Google, nofollow links don’t pass PageRank and don’t pass anchortext [*].

So what happens when you have a page with “ten PageRank points” and ten outgoing links, and five of those links are nofollowed? Let’s leave aside the decay factor to focus on the core part of the question. Originally, the five links without nofollow would have flowed two points of PageRank each (in essence, the nofollowed links didn’t count toward the denominator when dividing PageRank by the outdegree of the page). More than a year ago, Google changed how the PageRank flows so that the five links without nofollow would flow one point of PageRank each.

Q: Why did Google change how it counts these links?
A: For one thing, some crawl/indexing/quality folks noticed some sites that attempted to change how PageRank flowed within their sites, but those sites ended up excluding sections of their site that had high-quality information (e.g. user forums).

Q: Does this mean “PageRank sculpting” (trying to change how PageRank flows within your site using e.g. nofollow) is a bad idea?
A: I wouldn’t recommend it, because it isn’t the most effective way to utilize your PageRank. In general, I would let PageRank flow freely within your site. The notion of “PageRank sculpting” has always been a second- or third-order recommendation for us. I would recommend the first-order things to pay attention to are 1) making great content that will attract links in the first place, and 2) choosing a site architecture that makes your site usable/crawlable for humans and search engines alike.

For example, it makes a much bigger difference to make sure that people (and bots) can reach the pages on your site by clicking links than it ever did to sculpt PageRank. If you run an e-commerce site, another example of good site architecture would be putting products front-and-center on your web site vs. burying them deep within your site so that visitors and search engines have to click on many links to get to your products.

There may be a miniscule number of pages (such as links to a shopping cart or to a login page) that I might add nofollow on, just because those pages are different for every user and they aren’t that helpful to show up in search engines. But in general, I wouldn’t recommend PageRank sculpting.

Q: Why tell us now?
A: For a couple reasons. At first, we figured that site owners or people running tests would notice, but they didn’t. In retrospect, we’ve changed other, larger aspects of how we look at links and people didn’t notice that either, so perhaps that shouldn’t have been such a surprise. So we started to provide other guidance that PageRank sculpting isn’t the best use of time. When we added a help page to our documentation about nofollow, we said “a solid information architecture — intuitive navigation, user- and search-engine-friendly URLs, and so on — is likely to be a far more productive use of resources than focusing on crawl prioritization via nofollowed links.” In a recent webmaster video, I said “a better, more effective form of PageRank sculpting is choosing (for example) which things to link to from your home page.” At Google I/O, during a site review session I said it even more explicitly: “My short answer is no. In general, whenever you’re linking around within your site: don’t use nofollow. Just go ahead and link to whatever stuff.” But at SMX Advanced 2009, someone asked the question directly and it seemed like a good opportunity to clarify this point. Again, it’s not something that most site owners need to know or worry about, but I wanted to let the power-SEOs know.

Q: If I run a blog and add the nofollow attribute to links left by my commenters, doesn’t that mean less PageRank flows within my site?
A: If you think about it, that’s the way that PageRank worked even before the nofollow attribute.

Q: Okay, but doesn’t this encourage me to link out less? Should I turn off comments on my blog?
A: I wouldn’t recommend closing comments in an attempt to “hoard” your PageRank. In the same way that Google trusts sites less when they link to spammy sites or bad neighborhoods, parts of our system encourage links to good sites.

Q: If Google changed its algorithms for counting outlinks from a page once, could it change again? I really like the idea of sculpting my internal PageRank.
A: While we can’t ever say that things will never change in our algorithms, we do not expect this to change again. If it does, I’ll try to let you know.

Q: How do you use nofollow on your own internal links on your personal website?
A: I pretty much let PageRank flow freely throughout my site, and I’d recommend that you do the same. I don’t add nofollow on my category or my archive pages. The only place I deliberately add a nofollow is on the link to my feed, because it’s not super-helpful to have RSS/Atom feeds in web search results. Even that’s not strictly necessary, because Google and other search engines do a good job of distinguishing feeds from regular web pages.

[*] Nofollow links definitely don’t pass PageRank. Over the years, I’ve seen a few corner cases where a nofollow link did pass anchortext, normally due to bugs in indexing that we then fixed. The essential thing you need to know is that nofollow links don’t help sites rank higher in Google’s search results.

My 2008 traffic stats

I published traffic stats for my blog for 2006 and 2007, so it’s time for the 2008 statistics.

2008 Traffic stats

The rough summary is:
2006: 1.7M visits and 2.9M pageviews
2007: 2.3M visits and 4.8M pageviews, plus 31K RSS readers
2008: 3.4M visits and 5.7M pageviews, plus 46K RSS readers, 7986 followers on my Twitter stream, and 1607 subscribers on FriendFeed.

My most popular posts had nothing to do with search engine optimization (SEO). The top traffic-driving posts of 2008 were:
– My Gmail power tips post.
– My “Best Business Card Ever” post.
– The series of blog posts about Chrome that I did in September 2008.
– My two posts about my Halloween costume and Google’s anti-zombie robots.txt on Halloween.

In addition, my how to hack an iphone article was posted in Sept. 2007 but continued to drive especially strong traffic. If visitors were all I wanted, I’d write about nothing but the iPhone. 🙂

Almost as interesting were my traffic sources:

2008 Traffic sources

Google and direct visits were a large fraction of my traffic, but so were sites such as Digg, StumbleUpon, Google Image Search, Techmeme, delicious, and Twitter. It’s a good reminder that social media sites and places like image search can drive quite a bit of traffic.

All of this data is courtesy of FeedBurner and Google Analytics, which make this sort of analysis quite easy. What do your 2008 traffic stats look like?

css.php