Talk like a Googler: parts of a url

Let’s dissect the parts of a URL (uniform resource locator). I’ll tell you how we typically refer to different parts of a URL at Google. Here’s a valid URL which has lots of components:

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s

Here are some of the components of the url:

  • The protocol is http. Other protocols include https, ftp, etc.
  • The host or hostname is video.google.co.uk.
  • The subdomain is video.
  • The domain name is google.co.uk.
  • The top-level domain or TLD is uk. The uk domain is also referred to as a country-code top-level domain or ccTLD. For google.com, the TLD would be com.
  • The second-level domain (SLD) is co.uk.
  • The port is 80, which is the default port for web servers. Other ports are possible; a web server can listen on port 8000, for example. When the port is 80, most people leave out the port.
  • The path is /videoplay. Path typically refers to a file or location on the web server, e.g. /directory/file.html
  • This URL has parameters. The name of one parameter is docid and the value of that parameter is -7246927612831078230. URLs can have lots parameters. Parameters start with a question mark (?) and are separated with an ampersand (&).
  • See the “#00h02m30s”? That’s called a fragment or a named anchor. The Googlers I’ve talked to are split right down the middle on which way to refer it. Disputes on what to call it can be settled with arm wrestling, dance-offs, or drinking contests. :) Typically the fragment is used to refer to an internal section within a web document. In this case, the named anchor means “skip to 2 minutes and 30 seconds into the video.” I think right now Google standardizes urls by removing any fragments from the url.

What is a static url vs. a dynamic url? Technically, we consider a static url to be a document that can be returned by a webserver without the webserver doing any computation. A dynamic url is a document that requires the webserver to do some computation before returning the web document.

Some people simplify static vs. dynamic urls to an easier question: “Does the url have a question mark?” If the url has a question mark, it’s usually considered dynamic; no question mark in the url often implies a static url. That’s not a hard and fast rule though. For example, urls that look static like http://news.google.com/ may require some computation by the web server. Most people just refer to urls as static or dynamic based on whether it has a question mark though.

Explaining algorithm updates and data refreshes

A thread on WMW started Dec. 20th asking whether there was an update, so I’m taking a break from wrapping presents for an ultra-quick answer: no, there wasn’t.

To answer in more detail, let’s review the definitions. You may want to review this post or re-watch this video (session #8 from my videos). I’ll try to summarize the gist in very few words though:

Algorithm update: Typically yields changes in the search results on the larger end of the spectrum. Algorithms can change at any time, but noticeable changes tend to be less frequent.

Data refresh: When data is refreshed within an existing algorithm. Changes are typically toward the less-impactful end of the spectrum, and are often so small that people don’t even notice. One of the smallest types of data refreshes is an:

Index update: When new indexing data is pushed out to data centers. From the summer of 2000 to the summer of 2003, index updates tended to happen about once a month. The resulting changes were called the Google Dance. The Google Dance occurred over the course of 6-8 days because each data center in turn had to be taken out of rotation and loaded with an entirely new web index, and that took time. In the summer of 2003 (the Google Dance called “Update Fritz”), Google switched to an index that was incrementally updated every day (or faster). Instead of a monolithic monthly event, the Google would refresh some of its index pretty much every day, which generated much smaller day-to-day changes that some people called everflux.

Over the years, Google’s indexing has been streamlined, to the point where most regular people don’t even notice the index updating. As a result, the terms “everflux,” “Google Dance,” and “index update” are hardly ever used anymore (or they’re used incorrectly :) ). Instead, most SEOs talk about algorithm updates or data updates/refreshes. Most data refreshes are index updates, although occasionally a data refresh will happen outside of the day-to-day index updates. For example, updated backlinks and PageRanks are made visible every 3-4 months.

Okay, here’s a pop quiz to see if you’ve been paying attention:

Q: True or false: an index update is a type of data refresh.
A: Of course an index update is a type of data refresh! Pay attention, I just said that 2-3 paragraphs ago. :) Don’t get hung up on “update” vs. “refresh” since they’re basically the same thing. There’s algorithms, and the data that the algorithms work on. A large part of changing data is our index being updated.

I know for a fact that there haven’t been any major algorithm updates to our scoring in the last few days, and I believe the only data refreshes have been normal (index updates). So what are the people on WMW talking about? Here’s my best MEGO guess. Go re-watch this video. Listen to the part about “data refreshes on June 27th, July 27th, and August 17th 2006.” Somewhere on the web (can’t remember where, and it’s Christmas weekend and after midnight, so I’m not super-motivated to hunt down where I said it) in the last few months, I said to expect those (roughly monthly) updates to become more of a daily thing. That data refresh became more frequent (roughly daily instead of every 3-4 weeks or so) well over a month ago. My best guess is that any changes people are seeing are because that particular data is being refreshed more frequently.

css.php