Let’s dissect the parts of a URL (uniform resource locator). I’ll tell you how we typically refer to different parts of a URL at Google. Here’s a valid URL which has lots of components:
http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s
Here are some of the components of the url:
- The protocol is http. Other protocols include https, ftp, etc.
- The host or hostname is video.google.co.uk.
- The subdomain is video.
- The domain name is google.co.uk.
- The top-level domain or TLD is uk. The uk domain is also referred to as a country-code top-level domain or ccTLD. For google.com, the TLD would be com.
- The second-level domain (SLD) is co.uk.
- The port is 80, which is the default port for web servers. Other ports are possible; a web server can listen on port 8000, for example. When the port is 80, most people leave out the port.
- The path is /videoplay. Path typically refers to a file or location on the web server, e.g. /directory/file.html
- This URL has parameters. The name of one parameter is docid and the value of that parameter is -7246927612831078230. URLs can have lots parameters. Parameters start with a question mark (?) and are separated with an ampersand (&).
- See the “#00h02m30s”? That’s called a fragment or a named anchor. The Googlers I’ve talked to are split right down the middle on which way to refer it. Disputes on what to call it can be settled with arm wrestling, dance-offs, or drinking contests. 🙂 Typically the fragment is used to refer to an internal section within a web document. In this case, the named anchor means “skip to 2 minutes and 30 seconds into the video.” I think right now Google standardizes urls by removing any fragments from the url.
What is a static url vs. a dynamic url? Technically, we consider a static url to be a document that can be returned by a webserver without the webserver doing any computation. A dynamic url is a document that requires the webserver to do some computation before returning the web document.
Some people simplify static vs. dynamic urls to an easier question: “Does the url have a question mark?” If the url has a question mark, it’s usually considered dynamic; no question mark in the url often implies a static url. That’s not a hard and fast rule though. For example, urls that look static like http://news.google.com/ may require some computation by the web server. Most people just refer to urls as static or dynamic based on whether it has a question mark though.
P.S. I hope that wasn’t too boring, but I wanted to mention how we deal with fragments/named anchors, and to lay the groundwork for at least one other post.
Fragment is a new one to me. I’ve always called them named anchors, and I’ve known some people who referred to them as “bookmarks”.
Ah, see, “bookmarks” is new to me. Maybe it will be like hoagie vs. grinder vs. poboy; it depends on where you come from. 🙂
I could talk like a Googler, or I could talk like an implementor of RFC’s 2616 and 2396 instead 😉
protocol -> scheme
“Although many URL schemes are named after protocols, this does not imply that the only way to access the URL’s resource is via the named protocol.”
parameters -> query
The word “parameter” is used for those parts of the URI which follow a semicolon; the parts which follow a question mark are called the “query” portion.
And “fragment” is correct.
And everything between the ? and the # is the query string — or is it not referred to at Google?
Matt, isn’t a dynamic URL one that can change each time the page is downloaded? A static URL would always stay the same.
I find most confuse dynamic pages with dynamic URLs.
Well yeah that was boring. Lets do “Hello World” in HTML next time. Joke aside, I am sure few in your diverse audience will find that new.
One more thing, when you say “how we call X at Google” the reader anticipates something new as in this case special names for URL parts but these are all standard terms 🙂
Of course, this again opens the age old concern…..
Generally, do Static Urls stand a better chance or reaching higher on the Serps – all factors remaining the same.
How may dynamic url parameters are considered safe – at what point does it start to degrade the rankings – all factors remaining the same
Does the TRUST rank of a site make it more immune to any degradation from having many parameters (a few years go when Amazon changed to static urls, there page rankings skyrocketed extremely)
There does appear to be one interesting twitch on Google – whereupon certain dynamic urls in some blogs will inherent the pagerank of the static homepage.
Also on Google Serps, Search Queries are included from various sites (this is NOT necessarily a bad thing because it has opened up an avenue to more search options – sort of like suggestions)
I call it the fragment or fragment identifier. Page 14 of Uniform Resource Identifiers
(URI): Generic Syntaxsays:
On Page 28, the address
http://www.ics.uci.edu/pub/ietf/uri/#Related is broken down as
follows:
http scheme
http://www.ics.uci.edu authority
/pub/ietf/uri/ path
Related fragment
In any case, named anchor would refer to the anchor, not the name. Anchor name would make more sense.
I’ve always used anchors its what O’Reily (and i think the w3c) calls them.
Interesting that Google discards anchors as I have a site that has multiple courses on one page referenced from the front page with anchors.
like http://www.foo.com/bouncer-traning.html#door-supervisor-training
I’me trying to use the anchor text to give extra info to saerch engines about what the course is – ca you confirm if it’s not worth puthing key phrases in anchor text ?
I’ve always called it an anchor and I’m available for arm wrestling any day 🙂
The original RFC that defines a URL makes it clear that the bit after the # can be a fragment or an anchor id.
http://www.ietf.org/rfc/rfc1738.txt
The choice is yours, and you really should not strongarm people into using just your choice.
But, either way, the effect is the same: the f/a id is info to a user agent telling it where the “start” is the supplied data stream (“start” for purposes of displaying, anyway — not rendering).
So Google is quite right in discarding the f/a aspect of a URL in order to identity the actual content.
Google needs to truncate fragments (yes, fragment is the canonical term because it points to a fragment of the resource and name anchors are outdated in favour of DOM IDs) because most Web servers return a 404 on request of an URL with a fragment. All search engines do that, and all user agents.
Whether or not Google saves the fragment to assign the linked page area (block or element) a higher relevance for the anchor text of the link or perhaps even keywords in the DOM-ID (formerly name anchor), that is the interesting question.
I must apologise for this quite-off topic comment, but I’ve recently come across a bug in Firefox, whereby if I put a named anchor/anchor ID/fragment etc in the URL it affects the rendering of the page. Possibly due to the inconsistency in naming of this element I’ve struggled to google for a result. Anyone come across this before? Without the fragment everything is fine, with a fragment that doesnt exist everything is fine, but with a fragment that exists as an id= somewhere on the page, it blows one of the margins near the top of the page.
Cheers!
Matt, I assume rewritten URLs are treated as static, and also are seen as A Good Thing? It’s something I’ve begun to look into so excuse me if it’s a bit of a naive question.
I would be rewriting something like http://www.mydomain.co.uk/getproduct.aspx?category=a?id=1234 to http://www.mydomain.co.uk/products/product-type-a/1234. Is this good practice as far as Google is comcerned? I’m assuming so!
Beyond that should I then also make sure the dynamic pages are excluded using robots.txt?
When logging out from Webmaster Tools (to switch account if logged as privat and should login with Webmaster Tools account) logout redirects to https://www.google.com:80/…
problem is with port 80, https is at 143 right?
Anyway just checked it seems that you have fixed that bug. As far as I know only because of some older safari browsers it was good to point out which port exactly should be browser redirected.
If you’ve built pages in Dreamweaver et al, you’ve seen the icon of the ship’s anchor for named anchors/fragments.
Also, I’ve ordered a sub or a footlong but never a grinder, poboy, or hoagie…of course my So Cal self washed it down with a soda and not a pop or a coke 🙂
Hi Matt.
i wanna ask you about this part of the URI: “&hl=en”
W3C requires us to put a special char (&) and not simple ampersand.
i see that you have considered it ok, to put a simple ampersand.
i will appreciate your opinion on that subject.
“Fragments”? I’d never heard the term – thought this was something that has come up in the last few years until I heard They are on A tags (which I thought originally stood for “Anchor” tag?) and I have never heard them being called anything other than Anchors (or Bookmarks – I think that is a MSOffice thing) … but I may be outdated, I’m talking early 90s here (pre the article cited above).
Dave (original) :
I think you are getting confused with the HTTP spec for when to use POST vs GET (Basically POST should be used whenever the content is expected to change on every request, for example when appending items to a database, GET whenever else – hence GET requests can be cached by a proxy, which is why we need to append a unique ID sometimes.)
Good overview. Would like to see more in this series.
By the way, “… 27612831078230&hl=en#00h02m30s” refers to an illegal anchor, if there’d be an anchor in the page (I suppose in the Google Video example there isn’t), because names may not start with a number.
“ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens (“-“), underscores (“_”), colons (“:”), and periods (“.”).”
says the W3C.
http://www.w3.org/TR/html4/types.html#type-cdata
wow. cool! i always wondered what a URL meast…
Reminded me of the parse_url php function.
I was thinking about fragments (anchors) and how Ive never seen one in a SERP. Some pages, like those at w3c for example are often full of these little references within long pages.Any plans to do anything with these, as in make them searchable via some operator? Could save a little scrolling or CTRL-F’ing 😀
I think the terms “dynamic” and “static” are a bit last century. A very large percentage of all “static” web pages require computation and are generated using some kind of database or CMS. Especially with mod_rewrite being so popular these days, a lot of dynamic pages are disguised as static pages.
A better distinction would be that URL’s containing a question mark depend on user input, and are built specifically for that user; URL’s that don’t are generic but can still be dynamic. This solves your news.google.com contradiction as well.
correction to my respond: the special char was “& a m p ;” and not simple “&”
I’m curious on why google makes a distinction between static and dynamic url’s? I guess not why, but *how* google makes that distinction? If it isn’t simply based on the query string how does google know whether or not behind the scenes something dynamic is happening?
Nice definitions. The only iffy part for me is the static/dynamic thing. I mostly use parameters to indicate paid search sources. In paid search, these are often called tags. They don’t imply any difference to page content, though. So identifying all parameters as implying dynamic rather than static is wrong, at least for paid search. Indeed, strip the parameters from almost every client that I have, and you’ll mostly end up at the same web page as before.
The URLs where you don’t? Those are the dynamic ones. Annoyingly, there’s nothing intrinsic to the parameter or the way it is encoded to specify that it does or does not affect the page you reach. IME, the dynamic URLs are either ageing CMSs that weren’t designed to be SE friendly or are shopping sites, typically product or category pages.
IME, users will gleefully bookmark pages reached with long parameter lists. This occasionally results in Google indexing two results for one real page – which does worry me, because of the duplicate content implications, and a diminished PageRanking (because the pages are treated as separate).
This isn’t because I was jibing at you/Google about (mis)treating paid search tags as differentiating pages a few weeks ago, is it? See http://blog.merjis.com/2007/07/16/click-fraud-google-adwords-and-gclid/ and look for the H3 “Spiders and the gclid of doom”. 🙂
Cheers, JeremyC.
Myself and the people around me always referred to “fragments” as “internal bookmarks.”
I’ll win this battle with a kick-flip, transition to a windmill, and a unique stall. -Nick
Did you mean to write it URl instead of URL?
I think the correct way to write it is “URl” but I am not sure why.
One little discussed aspect of URL(I)’s is the impact they have on inbound links
from other sites, blogs, within email, etc., as well how longer or clumsy URLs have increased the usage of URL shortening services (which have their own downside). And woe to the site that re-deploys their content using a new delivery system that changes the URLs from what they used to be, rendering all inbound links useless unless steps have been taken to recapture them. I watched in horror as a large cable TV channel took their site with several thousand pages of content and several thousand inbound links to that content and had every inbound link return a 404 because they didn’t prepare for their URL change at all.
Eric
“Bookmark” is the terminology Front Page has always used for anchors; that’s where I first saw it.
Ok – here’s a burning question. How does a Googler pronounce “URL” ? I’ve always said it as “earl”. My partner who sits next to me only refers to it as “yurl”. The title of this post, using ‘a’ instead of ‘an’ seems to indicate that Googlers spell it out – U-R-L.
Nice post Matt, actually a nice reference and can even be used as a training tool for new people looking into the online world.
You can always tell a lot about a webpage from the URL pre-any mod-rewrite, and you can even add other bits of info such as the way the page is coded, .html, .php, .xml, .swf etc.
Like you said, as long as the page doesn’t end with a .exe, I’m sure its got a chance to hit the SERPs.
Matt, speaking of URL, I’ve always been wondering about this question and I’ve seen different people feeling strongly on different answers:
What’s better, subdomains or subdirectories?
Example:
http://shirts.clothingstore.com/ vs http://www.clothingstore.com/shirts/
Does Google treat subdomains as a separate domain within its own, and all the rankings the homepage has won’t influence the subdomain except with all of its cross linking? Or does Google understand “shirts.clothingstore.com is a part of clothingstore.com” and treat it like a subdirectory anyway?
Thanks,
–Paul.
Matt,
So we can easily tell an URL is potentially dynamic if there are parameters (I.e if there’s a “?”), but if there’s not does Google – or other Search engines for that matter – use other methods to determine dynamic content or otherwise classify it?
I believe it would be nearly impossible for a crawler to determine whether or not a page was “static” vs. being a rewritten URL.
P.S. All URLs are URIs, but URIs may not necessarily be URLs.
Dave’s question is the crux of any confusion about dynamic vs static. Outside of visible parameters in the URL is there are way to know if the page has been “computed” by the web server (or application server)? But the real question is: does it matter? I am hoping the the post on which this post will be based will shed light on this bit of voodoo.
Matt,
As you talk about the different parts of the URL, I was wondering if Googlebot breaks down a URL as it crawls it into its components. I wonder because as I was reviewing analytics on my site about the pages that Googlebot is crawling, it is going to pages that both do not and have never existed. There are pieces of the URL that are correct, like the subdomain, the domain and the path, but just not in the combination that Googlebot is trying to access.
There is some talk on the Google Webmaster Help group calling it Googlebot active imagination. But if Googlebot breaks up the URL into pieces, then maybe it is putting it back together in a different way to access other pages? I just don’t see how Googlebot can go to a page that has no link to it and reading your post made me wonder if Googlebot is doing something with the parts of the url you wrote about.
So this article does not has an end. It does not even mentions Apache Mod Rewrite, though it’s really popular. It’s a pitty there is no good CMS with statical pages, these would work faster.
It never hurts to share with your readers insights on how Googlers talk about the Web. Your post may well become a standard reference for many people who want teach the basic terminology of URLs to staff members and clients. After all, we have to discuss these things, too.
Great Post Matt. I’ve been using anchors on my site to tell the AJAX code what to display, but I didn’t want Google to think that it was a different URL. It’s good to hear that I’m not doing anything improper.
It’s ans anchor. I am willing to settle any disputes with a match in Guitar Hero 80’s edition.
Matt – to spice your post a little bit here is a regex to break a URL into its parts.
http://www.foad.org/~abigail/Perl/url3.regex
The author created a Perl program to do this. http://mjtsai.com/blog/2003/08/18/url_regex_generator/
He basically extracted the BNF grammars from the relevant RFCs and turned them into regular expressions. Cool, isn’t it?
I used some of the logic a few years ago to create a function that extracts second level domains from any URL.
Great post.
And just so I’m clear, you are saying that Google likes (and reads) urls with parameters trailing “?” but essentially ignores fragments after “#”? Do you think this is likely to change, given that in the web2.0y world more and more folks are using fragments to help deliver unique content within a single page?
When you place parameters such as tags for web reporting like WebTrends, such as ?WT.mc_t=abc&WT.mc_n=blogsignup, is that looked at as a dynamic page and more importantly, does that have a negative affect on showing up in SERPs?
Hi Matt:
Why would Google want to discard the “fragment” on the URL you said:
“Typically the fragment is used to refer to an internal section within a web document. In this case, the named anchor means “skip to 2 minutes and 30 seconds into the video.” ”
Unless I’m totally confused – wouldn’t the URL be indexed more accurately if it included the instructions to “skip to 2 minutes and 30 seconds into the video” as that part of the video might be highly specific. Suppose I’ve created a web page – where I’m looking to direct a user to a specific piece of information in that video which happens to be at “2 minutes and 30 seconds into the video”. Just a thought……
You should rewrite/translate this post when it’s “Talk like a Pirate Day” … 😉
P.S. Ditto what others said that “anchors” is what I (and W3C, etc.) use.
Well, video.google.co.uk it’s the FQDN hostname. video is a hostname, in domain google.co.uk. co.uk it’s a domain, uk TLD, google IS a subdomain, video a hostname 🙂
Some hosters defines video as subdomain, but AFAIK a subdomain contains separate nameservers that responds to interrogations of DNS from that subdomain.
Or video.google.co.uk is a CNAME, it hasn’t domains servers defines, so, you can’t say that is a proper subdomain 🙂
smart:~$ host -t ns google.co.uk
google.co.uk name server ns1.google.com.
google.co.uk name server ns2.google.com.
google.co.uk name server ns3.google.com.
google.co.uk name server ns4.google.com.
smart:~$ host -t ns video.google.co.uk
video.google.co.uk is an alias for video.google.com.
video.google.com is an alias for video.l.google.com.
QED.
I like the sound of “named anchor” but I’ve been calling it the fragment identifier because of specifications I’ve read.
I’d add more detail, but I think excessive use of italics tags prevented my previous replies from showing up.
All this debate about static URLs and dynamic URLs is a bit of a red herring I think.
The original discussions, about a decade ago, were all about dynamic content and static content.
Then things become more clear.
I use “named anchors” and those should be dropped from the stored/indexed URL data by any bot that sees them in on-page links in any of the documents they have spidered.
In a web browser, when such a link is clicked, the named anchor part of the URL isn’t even transmitted to the web server. It is purely used inside the browser to jump to a part of the document – after that document has been returned and rendered.
Matt, for humans, surely it’s as simple as; if the URL changes with each download, then it’s dynamic, if it never changes, then it’s static. It might be different for robots, I don’t know.
That’s not necessarily true, though, Dave. In the case of most well-built shopping carts, for example, the URL of a particular product never changes, but the content may be retrieved from a database or something similar.
I think to simplify this: if server-side processing is required to generate content, the URL is dynamic. If no server-side processing is required, the URL is static.
Matt, if you’re gonna mod your post to use this, I get $0.50 per page view. 😉
re: “Parameters start with a question mark (?) and are separated with an ampersand (&).”
There’s a w3c recommendation that CGI implementors support semicolons as well as ampersands as parameter separators:
http://www.w3.org/TR/REC-html40/appendix/notes.html#ampersands-in-uris
You don’t see it much in the real world though…
big
Hi Matt,
This post is so informative. However, i have a small doubt. If a page is created using include files will be treated as static or dynamic page?
Example: products.php where the it has three include files like header.php, menu.php and footer.php. Will this be called as dynamic url or static url?
Hmmmm, if that’s true MWA “dynamic URL” isn’t the correct term and misleading (who coined that phrase?). Anything that is dynamic is changing by defintion. A URL that doesn’t change is static by definition.
Maybe part of your readers don’t know that an URL type like http://domain.com/article-name is more easy for index by search engines and reports more benefits if your URL is like http://domain.com/post-number.
Best Regards 😉
I wonder if Matt is going to tell us if there is any ranking differences between the dynamic and static url? I personally believe it is, but Ill see if he answers.
dud just now
I see the big different between ? and #
!
I’d like to throw in my 2c here 🙂
It is clearly difficult to define what exactly we mean by ‘static’ (and why it is useful to anyone). Personally I think that anything with a ? is way to broad and will mark some static URL’s as dynamic.
A definition of something like ‘another user requesting the same page in the short term will receive the same response from the server’. This includes the content of the page as well as the headers (Last-Modified etc). The short term could be something like 2-8 hours.
I help out google by sending these headers based on when my data has changed and then rewrite php to html so that all the pages appear to the outside world as html. If there is a dynamic/static flag in google then I am sure my pages rank as static even though they are regenerated each time. Heck, if you include the headers as part of the page, then most pages are regenerated on each request.
Rules of thumb…
Static URLs don’t contain a question mark – dynamic URLs do.
Static content stored on the server in the same format it is delivered to the browser – dynamic content is not.
Static URLs can address static or dynamic content.
Dynamic URLs tend to address dynamic content, but may address static content.
Regarding parts of a URL, see:
http://www.ietf.org/rfc/rfc2396.txt
which is the “Uniform Resource Identifiers (URI): Generic Syntax”. I love this document. 🙂
> That’s called a fragment or a named anchor. The Googlers I’ve talked to are split right down the middle on which way to refer it. Disputes on what to call it can be settled with arm wrestling, dance-offs, or drinking contests.
The above RFC calls it a fragment identifier (the fragment being part of the content, not part of the URI). Does this settle the argument? Or do Googlers just like any excuse for an arm wrestle.
The RFC also discusses path parameters, Matt, which are used by many CMSs but don’t feature in your example URL above. A post on how Google treats path parameters would be interesting.
“The protocol is http” – that would be the “url scheme”, if you want to get technical.
You can also judge whether the content is dynamic by the file extension. My main pages use shtml extensions so the current date could be displayed.
Thanks for all the useful info!
That’s true, and that isn’t true all at the same time.
The URL itself “changes” in the sense that certain parameters within it change, depending on the dynamic content being generated. In the case of a cart which makes uses of custom 404s, a lot of the URLs themselves don’t really exist…they’re created on the fly and then the 404 sends the user to the correct content.
In other words, it’s dynamic in the sense that it’s “created by a system based on certain parameters.”
That’s not 100% accurate. 404 pages can be used in conjunction with .html “pages” that are generated on the fly (see the example above). An .html “page” can be dynamically created and generated, if one has the mad programming skillz to do so.
“You can also judge whether the content is dynamic by the file extension. My main pages use shtml extensions so the current date could be displayed.”
A little mod_rewrite magic on Apache can fix virtually any dynamic URL woe :D. You can also use it to render file extensions meaningless :-P.
… Comes in very handy when you have lots of variables being passed around or have built some sort of include structure but still want your site’s URLs to appear static and/or ‘clean’.
Matt discussed mod_rewrite in one of his videos, which brings me to a question: Does Google frown on the use of mod_rewrite for search engine friendliness? I gotta ask just to be sure :-p.
Hi Matt,
Sorry if this spoils all the fun with the arm wrestling, dance-offs and drinking contests (yeah… right).
I’ve mostly heard these “fragment” or a “named anchors” referred to as “fragment identifiers”.
See W3.org notes:
http://www.w3.org/DesignIssues/Fragment.html
They’re a challenge for search engines, I think… because AJAX rich sites often use these “fragment identifiers” to trigger additional features and content… which you guys then cannot crawl and index. Something to put on your wishlist perhaps 😉
Cheers,
Radzster
My head hurts 🙂
Surely a Dynamic URL is simply one that can and does change when accessed? If the URL never changes and always returns the same page then surely it’s not dynamic.
For example, some forum produce Thread URLs are dynamic (can change), where-as some are always the same (static).
Are you people alternatively referring to webpages as URLs and addresses as URLs? Because if you are, please stop.
Matt, a bit off topic, but was is your take on this;
http://www.seomoz.org/blog/the-art-of-buying-links-under-the-radar
Barry, the above is a URL that leads to a Webpage 🙂
Matt: “The domain name is google.co.uk.”. As far as I can tell, this is wrong, or at least misleading. Just because video.google.co.uk cannot be register from a registrar does not mean it is not a domain name.
Also, I think parameters can be separated by semicolons (instead of ampersands), although I can’t seem to find a normative reference about this right now.
Hey all, I’m out of town and mostly out of touch this weekend, but I’ll try to take a stab at replying to comments when I get back.
Matt I am surprised that you do not put canonicalization (is that what ya’ll call it?) in the list. Although I don’t think it was intended to be a complete list, I do think a mention about it as part of an url that googlers have their own created name for.
Thanks Matt! It helps when you clear things up like this.
Also the video was quite enjoyable, its always nice to see your cheerful smile 🙂
Your friend Mike
I’ve heard URLs do that sometimes. Please confirm or deny, Dave. 😀
Matt- I think this was a really helpful and great post.
Now can you video tape the arm wrestling or drinking contest for the winner of fragment vs. named anchor. Ill root for fragment!
Have a great weekend and fun with all the SES’ers
Shouldn’t this post be titled “Talk like an Internet professional”? Google has absolutely nothing to do with the creation of the Internet protocol, and if anything, Googlers are merely using the correct terminology laid down by smarter people who built the Internet.
Nothing against Googlers, but giving credit where it’s due rather than trying to appropriate someone else’s kudos!
This is sound information for anyone trying to explain what a URL is to an SEO or PPC prospect. Or anyone for that matter who has no clue what a URL is.
Im still unclear on what paramaters are within the URL. Can you explain that some more?
Thanks
Dana
London SEO, take a chill pill pleeeeeeeeeeeeeease.
hey Dave; I don’t know what Matt’s take is on that seomoz article on “link buying” and making sure you buy links that are “undetectable” by Google, but I would think his take “and” Google’s take on it would be that it’s “link spam”. Afterall; he is saying that you need to buy links for the good visitors you may get, but on the other hand you need to buy links that Google cannot detect.
I call that spam. Period.
sheesh. I accidentally hit the submit before finished.
Thanks for clarifying the “fragment/named anchor” part of a url Matt. I’ve told countless sites you are better off creating new pages instead of pointing to a section of one page with named anchors. This confirms why.
Um, the whole static/dynamic thing… surely it breaks into two parts? Pages and the URL that leads to them. Each of these can be static or dynamic.
The technology of how a URL delivers a web page is, I think, irrelevant. I can configure an Apache server to respond to the requests for index.shtml, index.asp, index.htm in exactly the same way – and it’s up to me to choose how that content is served – whether from a file, included files or completely out of a database. So I can (and do) gleefully deliver index.asp from a completely static file, or index.html as a purely database driven page, indistinguishable from a file.
Static pages don’t change significantly with previous behaviour (e.g. referrer info, or pages previous seen) or the content of any parameter. Dynamic pages have content that changes, for example, as a result of the search or advert that leads to the page, or a consequence of asynchronous processes such as RSS inclusion of news feeds. Refreshing a dynamic page could lead to new content – typical also of AJAX pages.
Static URLs and Dynamic URLs differ in that one or more parameters affects the page that you end up seeing. For many (most?) web stores and some (usually older CMSs), you select a page or product by keeping a root URL the same, and changing a parameter; changing “?pageid=XXXXX” or similar. You get a different page delivered as a consequence of changing *only* a parameter.
If changing or removing parameters leads to the same page content (modulo dynamic parts such as AJAX), then the parameters do not make the URL dynamic. If changing or removing parameters leads to different content (e.g. you see the root of the web store, or a different product) then it is a dynamic URL. If a cookie modifies the content you asked for… that’s dynamic content, too.
The question, I think, for a search engine, is that you need to take users back to what you found. So truncating all parameters will be a problem – some pages need parameters to identify the content. However, some parameters do not change the content that you reach – the same page content can be found through multiple URLs, differing only in inconsequential parameters that do not affect content.
Hmm. Thinking about this, there may be an extension to HTML usage that would help Google and other SE’s to disambiguate. If there was, for example, a meta tag that offered the canonical page URL. If that doesn’t include the parameters, then Google can strip all parameters from any index reference to that page, removing the possibly calamity of duplicate pages and improving ranking for pages that otherwise have multiple index entries.
It’s years since I read the Dublin Core… Isn’t that the “Identifier”?
Matt – you’re testing the acceptability of asking webmasters to move closer to the Dublin Core? There’s definitely some stuff in there that could help improve indexes for co-operating sites.
Cheers, JeremyC.
Thanks for breaking it down like that Matt
Doug Heil, as they say, if it looks like spam, smells like spam and tastes like spam, it’s probably spam.
I’m really APPALLED that Randfish, who was recently invited to Googleplex, has written a piece that details a POSSIBLE way to damage the relevancy of the Google SERPs.
The guy obviously has no morals and could not careless who suffers, so long as he profits.
Hey original Dave, a standard technique in InfoSec is to allow open discussion of security protocols. It is regarded as generally leading to better security practice. If legitimate researchers were to publish webspam techniques, but without the code/software to do so, that surely *helps* Google and other SE’s by demonstrating what they need to do to defend against people using the techniques? This has been taken as standard procedure in InfoSec, with pretty well established protocols for informing the vendor. Those don’t appear to exist for search, yet, but it is a different type of industry, relatively less mature… it’ll probably come 🙂
Again, from InfoSec, a fairly widely held idea is that if you aren’t precise about how things work, the only people who really know the procedures are the policy owner and those trying to defeat the security. Google, for commercial reasons, keeps the ranking technology as a secret. So the guys outside Google who best know how ranking works will necessarily uncover ways to break it. Is it better to tell everyone, or just let the Bad Guys use it without blowing the whistle? It’s a real question of ethics, and I don’t think it’s as black and white as you paint it 🙂
Google could make it easier by providing a forum, or listening to a forum, in which spam researchers can reveal techniques that prejudice rank. Maybe they do, and I don’t know it. Personally, I’m as much concerned by the way in which Google ranks, as the efforts of black hatters. There are, perhaps unconscious, biases in the ranking system – at the simplest, the equation of informational content, link references and user interest is not always true, and I have cases in paid search where I can demonstrate that PageRank brings up pages of results that most users do not want. There is no way to communicate that to Google, that I know of. And even if I did, why would I want to… what’s my incentive to play nice? And… Google says that paid search does not affect organic rank. But what if paid search offers the best insight into user needs for certain classes of search? Perhaps even a way to lose (some) spammy content?
Man, this is off-topic. Matt – apologies. Feel free to delete. I’ll probably pick up this topic on my blog, anyway. The ethics and economics of search. Juicy stuff 🙂
Cheers, JeremyC.
yeah, it’s off-topic but important.
You miss the point Jeremy. We are not saying that education is a bad thing. We are saying that a so-called whitehat SEO who is “out there” is promoting the implementation of se spam regarding the buying of links. What gets frustrating in the industry is the constant promotion of SEO’s no matter how much they hurt the reputation of the industry. Google is NOT a part of the industry. If you are not a designer/SEO type, you are not a part of the industry. We have guidelines to follow from Google as we all want our sites and clients to do well in Google. Spam hurts that effort as it hurts the entire industry. People who promote spam hurt the entire industry as well. YES: we need some “best practice” standards, but in the meantime, we all need to call a spade a spade when appropriate so people ‘trying’ to learn know se spam when they see it….. and they also know who is promoting it.
Not boring but not really what I would like to know. More interested in knowing Matt Cutts ideas of how small business people such as Realtors, Mortgage Brokers, Contractors, …. can best promote their individual business websites without doing the typical reciprocal link thing and beyond blogging.
Thanks,
From here in Morristown New Jersey
Morristown NJ Real Estate Guy
Aaron Wall has posted recently an article which might interest you:
How to: Buy Links Without Being Called a Spammer
yeah, it’s off-topic but important.
You miss the point Jeremy. We are not saying that education is a bad thing. We are saying that a so-called whitehat SEO who is “out there” is promoting the implementation of se spam regarding the buying of links. What gets frustrating in the industry is the constant promotion of SEO’s no matter how much they hurt the reputation of the industry. Google is NOT a part of the industry. If you are not a designer/SEO type, you are not a part of the industry. We have guidelines to follow from Google as we all want our sites and clients to do well in Google. Spam hurts that effort as it hurts the entire industry. People who promote spam hurt the entire industry as well. YES: we need some “best practice” standards, but in the meantime, we all need to call a spade a spade when appropriate so people ‘trying’ to learn know se spam when they see it….. and they also know who is promoting it.
Not boring but not really what I would like to know. More interested in knowing Matt Cutts ideas of how small business people such as Realtors, Mortgage Brokers, Contractors, …. can best promote their individual business websites without doing the typical reciprocal link thing and beyond blogging.
Thanks,
From here in Morristown New Jersey
yeah, another type of article writing that talks about “link buying” to trick Google.
I feel it’s almost high time for Google to start de-valuing ALL links, paid or not. At least not counting them. It’s a damn circus out there now. It’s just not like the internet use to be at all. It use to be totally democratic with no other intentions involved but linking to a site because you wanted to do so to “help” your own visitors as that site is a good resource for your visitors. It’s not like that anymore, sadly. It’s more like a children’s playground these days with the biggest kids being OUR industry. Digg is doing very good by not liking our industry and keeping out the SEO types when they can. ALL other social networks need to do the same as our industry is the most Unprofessional industry I’ve ever been involved in going on 50 years now.
Google is “not” a part of the industry I’m talking about, but they play a big part in how it shapes and moves forward, along with all major engines.
JeremyC, what Doug said and this;
The “buying of links under the radar” is a black hat method that hurts Google, it’s users, other sites, the SEO industry and quite likely the site using the attempted method of manipulating PR (votes).
There will be MANY who believe it’s just fine to go outside the Google guidlines and cheat their way up the SERPs (Don’t ever play cards with these types). In a way, it is fine if have low ethics, like to sleep with one-eye open and DON’T scream foul when wake up one morning to find your traffic has halved over-night. Trouble is, they don’t, they scream loud and long on how Google is out to get em while denying going outside the guidelines.
Make no mistake, randfish is a black hat who is out for personal gain at anyones expense.
I sometimes wished Google would index “Named anchor”, “Document fragment”, “Hash” whatever just so that it would be easier to index AJAX and Flash sites on a “page” by “page” basis but then I think about what would happen were Google to do that and I am glad they don’t. 😉
Sure it would work great for AJAX/Flash where the “hash” can be used to identify a “page” but then if it were used as it normally is, to identify a fragment of a given page, it would then appear as duplicate content where more than one anchor existed on the same page or actually, where any indexed page existed for which there was another “page” identified by the original URL plus the hash value.
Keniki, dude! Don’t leave six comments in a row. You wouldn’t go to a party and talk 6x more than anyone else, right? If you’ve got a grudge with someone, take it to your own blog or some other forum. Plus your comments have nothing to do with the post topic; deleting..
I doubt anyone would invite Kenki to a “party”. I hear the barman at his local even asks him to leave during happy hour 🙂
A URL is made up of several parts. The first part is the protocol, which tells the web browser what sort of server it will be talking to in order to fetch
Matt, typing in your URL from memory brought me to mattcuts.com today, a useless site full of clickable ads. I was surprised to see, that even an expert like you can be molested by domain grabbers. I guess, like most of us, before you started your web project, you did not think of registering a few possible variations and typos of your domain name.
I think it would be of advantage to many legitimate webmasters and their visitors if you would like to write a little warning about this kind of phenomenon, that I feel is hurting the web and it’s users, and how to avoid it. Of course, in a better world no web ad company would buy or market advertising on this type of site…
Very informative post, I can see how this can be applied to other URLs both at Google and elsewhere. The anchors look very cool and handy 🙂
I’ve seen the same thing as Michael. matcutts.com has it as well.
I never thought of domain squatting as molestation, though. That’s new.
A website that is integrated with database driven content can be a dynamic site and if web page utilizes databases which can insert content into a webpage by way of a dynamic script like PHP or JavaScript is dynamic page.
Generally, dynamic path would be:
http://www.domain.com/forums/thread.php?threadid=12345&sort=date
By using web application software’s you can convert dynamic path to static one.
And static web page path would become like:
http://www.domain.com/forums/thread.php/threadid/123.html
Yes, you will see the boost in SERP and Search engine visibility of your web page. So, static web pages do recommended than dynamic ones.
You heard wrong, the barman said to my Horse, “Why the long face”.
He’s not even on the same continent as Naylor. They’re about as polar opposite as two people can get…and I mean that quite literally.
Where did you get that idea, Keniki?
Yep, that right, I live in Antarctica.
I getting vision now of Kenki scrambling for a map 🙂
Same place Kenki gets all his ideas from, his imagination!
One vote for “named anchor”
Thanks; this was a very educational post. I never know that the “?” is what makes someone (generally) dynamic. Thanks!
RE: I think right now Google standardizes urls by removing any fragments from the url.
V.Interesting. That was on my test plan anyway so you saved me some work and have opened up a world of possibilities for cool tools.
@Keniki – Yes I believe that to be the case 😉
Interesting and very detailed. Do you prefer sub-domains or folders?
Kenki, you are still dribbling. I have to ask you, are you in some way retarted? You seem to have no control over yourself.
Kenki, for someone who has continually hurled insults at me, continue to excuse me of things I have never done and even threatened me with physical violence, you sure are sensitive!
Ever heard the saying: “those in glass houses shouldn’t throw stone”?
My question was actually a serious one, based on your history here.
Ummmm…riiiiiiiight. If you’re referring to whatever I put in the URI field, I’ve used two (a blog that I created and another site). Liz is up around 7 now.
If you’re referring to the nickname, what difference does that make?
You’re copping out now because you got caught in your own BS and you know you don’t have an answer. So just drop it, or explain with proof (not anything conjectural or heresay) how Dave (original) = Dave Naylor. I’m serious, too. If you’re going to make such a statement, back it up and don’t come up with some silly-assed excuse about how you think you’re better than me because of X, Y, and Z. The question’s a legit question, and you’re ducking it.
Nope. Not a thing. 1 site was created after I started posting comments here, so I just switched it.
You’re stretching now, and more importantly you’re stalling on the Naylor thing. Where’s the proof? Come on…quit stalling and produce it or stop accusing others of things they clearly aren’t.
Kenki, the current heavy weight World champion has a black hat, I hear!
MWA, you won’t get much from hollow man.
Hi Matt,
I’m not sure who else to ask, so can I ask a very simple question?
What should you do with multiple domain names to not be penalised?
i.e. you buy aadvark.com and you also buy aadvark.co.uk
Then you also buy iLoveMyAadvark.com etc. because you don’t want a competitor taking a domain too similar to your own, and you really love your aadvark 😉
Or say you have lots of topics that are dear to your heart and think each deserves an appropriate domain name so it gets found easier.
Now, lets say you can’t manage all these sites, only 1 of them., say with a blog with categories.
What should you do?
Put a single page on each satellite linking to the central blog?
Put a redirect (n what way) to make the same blog appear on each domain (is that regarded as mirror spam content?)
Or what?
What is the best most ethical and SE friendly way of covering all these domains with the content you have?
I know in any ideal would you would have unique content for all, but life isn’t always that easy 😉 I’m not trying to spam or dominate or mislead. I just want to put my content in one place, so I can manage it, but I need the domains to attract people with those requirements. The domain names aren’t false teasers. All that content really is on that central site.
Many thanks
Peter
Matt´s, i think for this use the best way is using the HTTP REWRITE..
with my site it´s work well…
Thanks for this tip!
and what about parked domains? in my hosting company when i park a domain it’s also created a subdomain in the main domain. How do i have to manage this situtation?
Sweet. Am off to print this out and give it to some affiliate managers who need to know the difference between a URL and domain. Cheers
Could anyone settle an argument…in the UK I come across people now and then who pronounce URL as a word, like “earl”….it drives me mad!…if I could quote someone like Matt Cutts confirming my view that it should be “U” “R” “L”, it would make my day!!
This is great info. Thanks for the post!
It’s U. R. L. since it is an abbreviiation.
The other sounds like people saying “hurl”…..
Hi,
What does it effect if i place characters like ~ ^ * + % in url. Can any body tell me about this.
I tried google.cn, and it came out with chinese interface, and however, google.com.cn, this domain redirected me to google English. anyone got idea?
Fragment is a new one to me. I’ve always called them named anchors, and I’ve known some people who referred to them as “bookmarks”.
So do Google have a way to distinguise between ISAPI or Mod-Rewritten parameterised URL’s over their ? = versions?
If so can you please explain how Google treats these?
Its funny but I have always tried to find an easier way to explain static vs. dynamic URL’s (as easy as that sounds) but the way you explained the difference was perfect! Your slick Matt!!
I’m still wondering which type of url that search engine mostly like, static url or dynamic url? how it affects the seo? any explaination here?
why some times the url (not a directory)ends with the / mark like http://www.mattcutts.com/blog/seo-glossary-url-definitions/
i type http://www.mattcutts.com/blog/seo-glossary-url-definitions and redirect too http://www.mattcutts.com/blog/seo-glossary-url-definitions/
What ‘s the diffrents?
How important is sub domain URL name in SEO ranking?
The end of a sub domain is .html or nor extension for example. Is there a difference how to end sub domain URL in terms of SEO?
Would it be possible for someone to indicate how important it is to have such separators in things like directories within a domain (for example widgets.com/widget-information) – how much would a site suffer if it just used directories without such characters (widgets.com/WidgetInformation/)?
Dear Matt
Could you please let me know if I am rewriting my dynamic url’s into static, would it make any changes in my rankings??
Thanks in advance
Hi Matt
Re: Fragment Identifiers and Server vs Client Side Session State
I’m subscribed to your blog in Reader and it’s a good read. Thank you.
I’ve been developing in AJAX for some time and want to explore
bundling my entire site content and functionality into a single client-
side application.
Google did this with GMail etc. The problem: this is deep-web stuff.
Googlebot is blind to it. Noscript tags can’t be used to expose the
content since the app would use fragment identifiers and Googlebot
would index everything against one url.
So I want to create a stripped site-map meets plain-text version of
the content for the Googlebot user-agent. But then this would clash
with the Google webmaster guidelines?
What can we do?
Thanks tremendously,
Joran
Matt, first off I am big fan of your work. Had a question regarding url’s and the negative effects of dashes, for example how does Google view the two following urls, custombanners.c versus custom-banners.c any insight into this issue would be most appreciated. thank you in advance for your time to respond.
Hi Matt,
I have spent this evening reading some of your blogs, interesting reading I must say… Our developers have put a comma in some of our sites url’s… is this a good or a bad idea…?
Many thanks… Eamonn
I have read that Google loves static URLs. But I am NOT rewriting my dynamic url’s into static. I don´t want to loose my ranking.
Hi Matt,
We knew the difference between subdomain and top domain from your blog.
and how’s about “http://name.subdomain.a.com” vs “http://subdomain.a.com/name/”?
what about sub.sub.exaple.com domains…how could i do that?
Hello Matt,
Thanks for the breakdown, I remember having to sit through a lecture on this! It’s definately not the most riveting of topics!
I just thought I’d clarify that both static and dynamic url’s can contain question marks. I know that some people may read this article and think that a question mark within the url almost certainly means that the url is dynamic, when this is not the case.
Take the two sites:
http://www.seo-angels.co.uk/services.asp?service_no=12
http://www.seo-angels.co.uk/index-inclusion
You may be thinking that the first is a dynamic website and the second a static one, but it is possible and often the case that static url’s can be database driven and dynamic looking url’s can be static.
WordPress and other blogging platforms that are database driven have dynamic url’s but because there is a facility to edit them, it is impossible to detect just by looking at the url itself.
For example a page on our website…
http://www.seo-angels.co.uk/seo/free-website-analysis
…is a dynamic url, although it would appear to be static using the ? mark rule. I just thought I’d clear that one up.
Nick
l understood what u mean but l am confisued about how to remove old urls from search engines, if l configute the .htaccess and redirect all old urls via htaccess or make costum 301 or 404 pages will stop duplicate content, but what about the indexed dublicates from different google datasencers?
l still have many iidot links comes from different google datacenters.
well nice done, thanks.
Hi Matt, how has this post been impacted by the latest updates by Google with in-site links? I was reading a post about using named anchors on the Google blog in September that pretty much says # characters are now ok to use.
So dynamic and static URLs are indexed with the same priority?
Does google consider .co as columbia or is it treated like .com I have a client who has a .co address.
please is it possible to have a SUBDOMAIN of another subdomain….
meaning…
http://www.subdomain1.subdomain2.domain.edu.uk.
please how do i go about it if it is possible?
What Robert Brewer said back in 2007 is right.
Tantek Celik mentioned how hard it is these days to determine what everyone is talking about when dealing with URLs in a recent post.
It would be nice of you to edit your post to match RFC documents. “http” is just the scheme; the protocol would be “http:”. Likewise, “parameters” are not mentioned in standard documents; what you mean is “search” (DOM); but ideally, you should talk about the “query”, which doesn’t include the leading question mark. Finally, “#00h02m30s” isn’t a fragment; it is either a fragmentId (URL RFC), a hash (DOM), or “00h02m30s” is the fragment.
I know your editing the post won’t restore the Tower of Babel, but it might at least help to get in a place where each name has only one meaning.
My preferred combination is HTTP’s ( scheme / host / port / path / query / fragment ), because it strips down the URL to the least number of characters; but any combination that uses terms found in RFCs and W3C documents is welcome.
Please confirm me . Which one is better . http://www.seo.com/seo or http://www.seo.com/seo.html or http://www.seo.com/seo.php in point of view of good ranking in Google search.