Search Results for: nofollow

Beware of fake Matts leaving comments

A lot of the time, I dispel misconceptions by leaving comments on blogs. That works great, except for the rare occasion when someone pretends to be me and leaves a rude, fake, or otherwise untrue blog comment. Over the previous decade, I’ve only seen 4-5 times where someone impersonated me. But in the last month, I’ve seen at least three nasty comments written by “fake Matt Cutts” impersonators.

The first fake-Matt comment I remember was over Marketing Pilgrim around November 14th, 2011. When Frank Reed checked out the fake comment, it came from 74.120.13.132, which is an exit router for Tor. That means someone went to some trouble to hide their tracks.

The second not-Matt comment was on November 18th, 2011. The impersonator wrote:

Normally we do not comment on ranking methods but I’ll explain a misconception: input from manual raters is used only in the rarest of cases when a non-brand cracks the top ten for high value money terms.

The tone (and content) of the comment was so far off that Matt McGee questioned whether it was really me, and I was quickly able to clarify that I never wrote that comment.

The third one I’ve seen was just a few days ago on Search Engine Journal, and included gems like

[Google is] very transparent. Some sites do not even have an address listed, yet we have everything, including the credit card numbers for adword advertisers. That is a strong signal for us to list them ahead in organic search as well.

The claim that “Google ranks AdWords advertisers higher in our search results” is fake and untrue; it was one of the first myths I debunked when I got online.

The web isn’t built to prevent impersonation. On many places around the web, anyone can leave a comment with someone else’s name. So if you see a comment that claims to be from me, but makes crazy claims (e.g. that we preference AdWords advertisers in our search results), let me know. I’m happy to verify whether I wrote a comment, e.g. with a tweet. Thanks.

An interesting essay on search neutrality

(Just as a reminder: while I am a Google employee, the following post is my personal opinion.)

Recently I read a fascinating essay that I wanted to comment on. I found it via Ars Technica and it discusses “search neutrality” (PDF link, but I promise it’s worth it). It’s written by James Grimmelmann, an associate professor at New York Law School. The New York Times called Grimmelmann “one of the most vocal critics” of the proposed Google Books agreement, so I was curious to read what he had to say about search neutrality.

What I discovered was a clear, cogent essay that calmly dissects the idea of “search neutrality” that was proposed in a New York Times editorial. If you’re at all interested in search policies, how search engines should work, or what “search neutrality” means when people ask search engines for information, advice, and answers–I highly recommend it. Grimmelmann considers eight potential meanings for search neutrality throughout the article. As Grimmelmann says midway through the essay, “Search engines compete to give users relevant results; they exist at all only because they do. Telling a search engine to be more relevant is like telling a boxer to punch harder.” (emphasis mine)

On the notion of building a completely transparent search engine, Grimmelmann says

A fully public algorithm is one that the search engine’s competitors can copy wholesale. Worse, it is one that websites can use to create highly optimized search-engine spam. Writing in 2000, long before the full extent of search-engine spam was as clear as it is today, Introna and Nissenbaum thought that the “impact of these unethical practices would be severely dampened if both seekers and those wishing to be found were aware of the particular biases inherent in any given
search engine.” That underestimates the scale of the problem. Imagine instead your inbox without a spam filter. You would doubtless be “aware of the particular biases” of the people trying to sell you fancy watches and penis pills–but that will do you little good if your inbox contains a thousand pieces of spam for every email you want to read. That is what will happen to search results if search algorithms are fully public; the spammers will win.

And Grimmelmann independently hits on the reason that Google is willing to take manual action on webspam:

Search-engine-optimization is an endless game of loopholing. …. Prohibiting local manipulation altogether would keep the search engine from closing loopholes quickly and punishing the loopholers–giving them a substantial leg up in the SEO wars. Search results pages would fill up with spam, and users would be the real losers.

I don’t believe all search engine optimization (SEO) is spam. Plenty of SEOs do a great job making their clients’ websites more accessible, relevant, useful, and fast. Of course, there are some bad apples in the SEO industry too.

Grimmelmann concludes

The web is a place where site owners compete fiercely, sometimes viciously, for viewers and users turn to intermediaries to defend them from the sometimes-abusive tactics of information providers. Taking the search engine out of the equation leaves users vulnerable to precisely the sorts of manipulation search neutrality aims to protect them from.

Really though, you owe it to yourself to read the entire essay. The title is “Some Skepticism About Search Neutrality.”

PageRank sculpting

People think about PageRank in lots of different ways. People have compared PageRank to a “random surfer” model in which PageRank is the probability that a random surfer clicking on links lands on a page. Other people think of the web as an link matrix in which the value at position (i,j) indicates the presence of links from page i to page j. In that case, PageRank corresponds to the principal eigenvector of that normalized link matrix.

Disclaimer: Even when I joined the company in 2000, Google was doing more sophisticated link computation than you would observe from the classic PageRank papers. If you believe that Google stopped innovating in link analysis, that’s a flawed assumption. Although we still refer to it as PageRank, Google’s ability to compute reputation based on links has advanced considerably over the years. I’ll do the rest of my blog post in the framework of “classic PageRank” but bear in mind that it’s not a perfect analogy.

Probably the most popular way to envision PageRank is as a flow that happens between documents across outlinks. In a recent talk at WordCamp I showed an image from one of the original PageRank papers:

Flow of PageRank

In the image above, the lower-left document has “nine points of PageRank” and three outgoing links. The resulting PageRank flow along each outgoing link is consequently nine divided by three = three points of PageRank.

That simplistic model doesn’t work perfectly, however. Imagine if there were a loop:

A closed loop of PageRank flow

No PageRank would ever escape from the loop, and as incoming PageRank continued to flow into the loop, eventually the PageRank in that loop would reach infinity. Infinite PageRank isn’t that helpful 🙂 so Larry and Sergey introduced a decay factor–you could think of it as 10-15% of the PageRank on any given page disappearing before the PageRank flows along the outlinks. In the random surfer model, that decay factor is as if the random surfer got bored and decided to head for a completely different page. You can do some neat things with that reset vector, such as personalization, but that’s outside the scope of our discussion.

Now let’s talk about the rel=nofollow attribute. Nofollow is method (introduced in 2005 and supported by multiple search engines) to annotate a link to tell search engines “I can’t or don’t want to vouch for this link.” In Google, nofollow links don’t pass PageRank and don’t pass anchortext [*].

So what happens when you have a page with “ten PageRank points” and ten outgoing links, and five of those links are nofollowed? Let’s leave aside the decay factor to focus on the core part of the question. Originally, the five links without nofollow would have flowed two points of PageRank each (in essence, the nofollowed links didn’t count toward the denominator when dividing PageRank by the outdegree of the page). More than a year ago, Google changed how the PageRank flows so that the five links without nofollow would flow one point of PageRank each.

Q: Why did Google change how it counts these links?
A: For one thing, some crawl/indexing/quality folks noticed some sites that attempted to change how PageRank flowed within their sites, but those sites ended up excluding sections of their site that had high-quality information (e.g. user forums).

Q: Does this mean “PageRank sculpting” (trying to change how PageRank flows within your site using e.g. nofollow) is a bad idea?
A: I wouldn’t recommend it, because it isn’t the most effective way to utilize your PageRank. In general, I would let PageRank flow freely within your site. The notion of “PageRank sculpting” has always been a second- or third-order recommendation for us. I would recommend the first-order things to pay attention to are 1) making great content that will attract links in the first place, and 2) choosing a site architecture that makes your site usable/crawlable for humans and search engines alike.

For example, it makes a much bigger difference to make sure that people (and bots) can reach the pages on your site by clicking links than it ever did to sculpt PageRank. If you run an e-commerce site, another example of good site architecture would be putting products front-and-center on your web site vs. burying them deep within your site so that visitors and search engines have to click on many links to get to your products.

There may be a miniscule number of pages (such as links to a shopping cart or to a login page) that I might add nofollow on, just because those pages are different for every user and they aren’t that helpful to show up in search engines. But in general, I wouldn’t recommend PageRank sculpting.

Q: Why tell us now?
A: For a couple reasons. At first, we figured that site owners or people running tests would notice, but they didn’t. In retrospect, we’ve changed other, larger aspects of how we look at links and people didn’t notice that either, so perhaps that shouldn’t have been such a surprise. So we started to provide other guidance that PageRank sculpting isn’t the best use of time. When we added a help page to our documentation about nofollow, we said “a solid information architecture — intuitive navigation, user- and search-engine-friendly URLs, and so on — is likely to be a far more productive use of resources than focusing on crawl prioritization via nofollowed links.” In a recent webmaster video, I said “a better, more effective form of PageRank sculpting is choosing (for example) which things to link to from your home page.” At Google I/O, during a site review session I said it even more explicitly: “My short answer is no. In general, whenever you’re linking around within your site: don’t use nofollow. Just go ahead and link to whatever stuff.” But at SMX Advanced 2009, someone asked the question directly and it seemed like a good opportunity to clarify this point. Again, it’s not something that most site owners need to know or worry about, but I wanted to let the power-SEOs know.

Q: If I run a blog and add the nofollow attribute to links left by my commenters, doesn’t that mean less PageRank flows within my site?
A: If you think about it, that’s the way that PageRank worked even before the nofollow attribute.

Q: Okay, but doesn’t this encourage me to link out less? Should I turn off comments on my blog?
A: I wouldn’t recommend closing comments in an attempt to “hoard” your PageRank. In the same way that Google trusts sites less when they link to spammy sites or bad neighborhoods, parts of our system encourage links to good sites.

Q: If Google changed its algorithms for counting outlinks from a page once, could it change again? I really like the idea of sculpting my internal PageRank.
A: While we can’t ever say that things will never change in our algorithms, we do not expect this to change again. If it does, I’ll try to let you know.

Q: How do you use nofollow on your own internal links on your personal website?
A: I pretty much let PageRank flow freely throughout my site, and I’d recommend that you do the same. I don’t add nofollow on my category or my archive pages. The only place I deliberately add a nofollow is on the link to my feed, because it’s not super-helpful to have RSS/Atom feeds in web search results. Even that’s not strictly necessary, because Google and other search engines do a good job of distinguishing feeds from regular web pages.

[*] Nofollow links definitely don’t pass PageRank. Over the years, I’ve seen a few corner cases where a nofollow link did pass anchortext, normally due to bugs in indexing that we then fixed. The essential thing you need to know is that nofollow links don’t help sites rank higher in Google’s search results.

Gone to PubCon and SXSW + Lots of Videos!

Expect light blogging for a week or so because I’m traveling. I posted my 2009 travel schedule, but I’m doing a keynote at PubCon in Austin and then I’ll stick around for South by Southwest. It’s my first time at SXSW, so if you see me, say howdy!

For the PubCon keynote, we’re going to try something different. I’ll talk for 20-30 minutes, but we’ll also do a question and answer session where we take questions from the audience, from Twitter, and from this Google Moderator page.

If you can’t attend PubCon, we’ll still feed your search-info addiction with some videos. Peter Linsley just posted his recreated Google Image Search presentation from SMX West. I took questions recently and so you can watch three different videos that I did. For example, here’s a video about nofollow:

We’ll be releasing one new video each weekday for a while, so keep your eyes on the new Google webmaster videos channel on YouTube.

DroboCare from Drobo: bleah

I bought a Drobo about a year ago. Recently I got this pop-up window:

DroboCare warranty service by Drobo

Wait a second — I bought this storage device, and now want me to extend my license “to continue to receive the latest updates”? If you go to the url mentioned in the pop-up, you see that for $49 for a year’s coverage, you get

Continued access to software updates to Drobo Dashboard, Drobo Firmware and DroboShare Firmware including performance enhancements and new features.

Both the program pop-up and the web page imply that I need to pay $49 to continue to get firmware updates. That’s extremely uncool. The ironic part is that apparently Drobo changed their mind in February and they won’t make you pay for firmware updates now. It’s been a month; why does the DroboCare web page still imply that you have to pay $49/year for firmware updates? They need to fix that ASAP, because people appear to be confused by the language.

Let me tell you a little story: back in the 90s, before eBay existed, I was a poor college student who wanted to connect a CD recorder to his computer. I bought a used Adaptec SCSI card over Usenet. When the SCSI card arrived, the four floppy disks of driver software were scrambled. When I tried to get drivers from Adaptec, I learned that Adaptec charged for the drivers for that SCSI card. I never bought another Adaptec product again. The End.

So I don’t plan to buy any more Drobos in the future. Why would I buy from a company that tried to charge me for firmware updates for my consumer hardware? Sure, Drobo changed their mind after people complained, but the fact that Drobo even considered it will make me avoid them in the future.

Update: read this blog comment by Jillian Mansolf from Drobo. Evidently Drobo’s Sarbanes-Oxley auditors classified Drobo as a software company. The auditors “wanted us to recognize revenue for Drobo over the ‘life’ of the warranty (forever) if we included performance enhancements through free software updates.” DroboCare only pertains to hardware as of January, so a future version of the DroboDashboard software will remove this pop-up. Read the comment for more about this from Drobo.

css.php