Fun with zip codes

Noticed via Tara Calashain’s excellent ResearchBuzz, here’s a great site (maps.huge.info) that lets you view zip code outlines in Google Maps. It’s pretty addictive to type in zip codes (assuming you’re in the U.S., and you have the particular bent of mind that makes you enjoy stuff like Google Maps). For the lazy among you, here’s how it looks:
Zip codes in New York
Notice that it’s doing something cool: after you’ve typed in the first zip code, you can type in more zip codes and it will draw multiple zip codes at once.

Suppose you wanted to do stuff like this yourself. First you’d want the Google Maps API. Next, you’d need a list of all the zip codes in the U.S. This is harder than it should be. From http://www.usps.com/ncsc/faq/:

Q: Where can I get a database or directory of all ZIP Codes and/or ZIP+4 codes along with the corresponding city, state, county, etc.? Is there an FTP site for downloading?
A: The information you are seeking is not available via download but is provided through our National Customer Support Center at (800) 238-3150 ….
[and then later] The Postal Service does not maintain any ZIP Code maps. The only related product we have at this time is the TIGER/ZIP+4 File.

And “TIGER/ZIP+4 File” is a link to a 404 page. Oy, thanks a lot! Grrr. Okay, so pop on over to the U.S. Census Bureau. For example, this page has a link to many good resources, including http://www.census.gov/tiger/tms/gazetteer/zips.txt which has a list of zip codes along with the city, state, latitude, and longitude of each zip code (the lat/lon is for the zip code center).

Technically, the Census Bureau calls their data Zip Code Tabulation Areas or ZCTAs. Why? As they put it,

This new entity was developed to overcome the difficulties in precisely defining the land area covered by each ZIP Code.

For example, zip codes represent postal routes and can’t always be represented with polygons. You can almost feel the contempt seething from the Census Bureau toward the U.S. Postal Service. I can believe that when Census and Postal personnel get in the same room, arguments break out and tempers flare. Sort of like Herbert Kornfeld, the Accounts Receivable Supervisor at The Onion that is always rumbling with Accounts Payable.

Okay, where were we? Postal Service: not helpful at all. Census Bureau: a great, easy-to-parse file. Okay, how about the boundary info with the shape of a zip code/ZCTA? How about this: http://www.census.gov/geo/www/cob/zt_metadata.html Oh, snap! Census Bureau 2, Postal Service 0!

Shall we delve a little deeper? Let’s do it; zip codes are fun. By the way, if you’re not a complete and utter nerd, or you don’t have hours to kill digging into files, the huge.info site sells a DVD with cleaner data in an easier format.

Okay, let’s examine the zip code for 94043 (where Google’s headquarters are located). Go to http://www.census.gov/geo/www/cob/z52000.html#ascii and save the file for California and decompress the zip file. The files are available in three formats, but we’re sticking with straight ASCII. Inside the zip file there are two files, a tiny one and a big one. In the tiny one, look for the zip code you’re interested in (94043 in this example):

2465
“94043″
“94043″
“Z5″
“5-Digit ZCTA”

This entry means that in the big file, the zip code 94043 is represented with an ID of 2465. If you look in the big file, you’ll find the data for that polygon in latitude/longitude format:

2465 -0.122065371904532E+03 0.374225573395062E+02
-0.122077973000000E+03 0.374482390000000E+02
-0.122077373000000E+03 0.374500390000000E+02
…. ….
-0.122077973000000E+03 0.374482390000000E+02
END

The first coordinate appears to be the center of the ZCTA, so chop off the first line and the END line and put the resulting lines in a file called 94043. Now fire up gnuplot and type these commands:

set terminal png
set output "94043.png"
set nokey
set title "Zip code: 94043"
set xlabel "Longitude"
set ylabel "Latitude"
set grid
plot '94043' with lines

Alrighty, let’s see how we did. Here’s the zipcode on the original application:

Original map of 94043 zip code

And here’s our Gnuplot map:
Gnuplot plot of 94043 zip code

There’s just too many fun things to do in the world. :)

Update: If you enjoy playing with zip codes, Gary Price pointed me to http://www.melissadata.com/Lookups/index.htm where you can look up neat things like demographics and business counts.

UI fun: Remove result

One request we sometimes hear is for the ability to modify Google results, especially to block unwanted sites. A few eagle-eyed people may have noticed a user-interface experiment on Google that adds the ability to remove results. Here’s what you’d see. Imagine that you did the search [lynx paw clipart], and you notice one particular result that looks spammy:
search snippet
You check the cached page, and you notice that if you turn off Cascading Style Sheets, there’s a bunch of spammy text:
spammy text

At that point, your options would normally be to 1) ignore that result, or 2) report the url to Google via our spam report form.

But if you’re in this experiment, you’ll have newfound powers. Click the “Remove result” link and with one click you can drop that url from your search results. It looks like this:
blocking a result
By default, it will only block that url for that particular search. If you’re really annoyed, you can click “More options” and you’ll get two more choices: block this url from all future searches and (my personal favorite) the ability to block the entire host from all future searches. Here’s what it looks like:
More blocking options

Before I tackle some questions, I want to send mad props to the people who worked on this in our New York office. This is one of my favorite UI experiments; it’s neat to give users the ability to modify some parts of Google, and I especially like the Ajax-y aspects of it. Okay, let me try to anticipate some questions.

Q: Why don’t I see “Remove result”? I want it! I want it baaad!
A: Dude, did you not read the last paragraph? It’s an experiment; if we showed it to everyone, it would be a beta. ;) Oh, alright. Here’s a tip. Go sign up for My Search History/Personalized Search. You don’t have to store your searches, but you will need a Google Account.

Q: How exactly do I try this out?
A: If you have an email address like johnpublicuser@gmail.com, you should be able to go to http://www.google.com/psearch and sign in with “johnpublicuser@gmail.com” and your password–make sure to click “Remember me on this computer.” If you don’t have a Gmail account, you can create a Google Account at https://www.google.com/accounts/ . Once you are logged into your Google Account, click on “Personalized Search” on the left-hand side:
personalized search
Once you are in Personalized Search, you should see the Remove Result link any time that you’re signed in.

Q: Are you going to use this data to improve general search?
A: It’s too early to say. It’s still an experiment that may not even launch; it depends on how people like it. The form or format could also change as well.

Again, thanks to the people who worked on this. I’ll be curious to see whether people enjoy it.

AdWords, AdSense, and the Google blog

I spent a year as an engineer in the ads group, but that was a long time ago (28 dog years, at least). So asking me detailed AdSense or AdWords questions is sometimes like asking a cat how an airplane works:

Q to cat: How does an airplane work?
A from cat: Meow?

So when someone asks me how to carry the clickthrough from one NodeGroup to a different currency while preserving their negative matches (yes, I’m making this question up), I’m usually stuck with meow. I keep intending to use AdWords or check out AdSense, but I haven’t had the cycles yet, and I’m not sure when I will.

So if you read this site and want more advertising info, I would hop over to the AdWords blog or the AdSense blog. They’re crunchy with good information and tips. For example, the link I did for the AdWords blog talks about increasing the capacity of the Site Exclusion tool so you can be more selective about where your ads show up in AdSense.

If you haven’t noticed, the Official Google Blog has been getting more solid, too. Take this post for example; I’m a little surprised we talked about how Google can harness the collective wisdom of Googlers, but I’m glad that we did. Was the original incarnation of the Google blog fluffy? Yeah, it kinda was. To be fair, it was launched in the middle of an IPO quiet period, so there were limits to what the blog could discuss. I’m glad that the blog is talking about issues of substance, from Kai-Fu Lee to fair use. Crunchy is good.

Geez. I wrote this up this morning, and completely missed the new Google Group dedicated to talking about AdWords. Cool. Also, I should point out that if you position this book next to this book, they will in fact explode.

Alerting site owners to problems

We’ve started a pilot program to alert sites that we consider to be outside our quality guidelines. Some of this was already discussed on Threadwatch, but let’s put the info in one place in convenient Q&A format:

Q: Are you writing to every site that receives a spam penalty?
A: No. Right now we’re running this as a test.

Q: What sort of sites are you writing to?
A: This is not targeted to sites like buy-my-cheap-viagra-here-while-consolidating-your-debt-and-buy-some-posters-about-online-casinos.com, but more for sites that have good content, but may not be as savvy about what their SEO was doing or what that “Make thousands of doorway pages for $39.95″ software was doing.

Q: Are these sites penalized forever?
A: No, they can return to the index if they correct or remove the pages that were violating our guidelines. See my previous post about how to do a reinclusion request for advice on that.

Q: Are you emailing webhosts too? Where are you getting email addresses from?
A: We’re not trying to email webhosts, just the site owner or webmaster. Our primary way of finding who to contact is via email addresses from the web. If there really aren’t any, we use a few addresses like webmaster@domain.com and support@domain.com. We may try a contact address by doing a whois search as another backup, but we will avoid emailing the technical or admin contact from whois.

Q: Can you give me an idea of what an example email might look like?
A: Sure. Here’s an example one for hidden text.

Dear site owner or webmaster of http://www.chefrevival.com.au/,

While we were indexing your webpages, we detected that some of your
pages were using techniques that were outside our quality guidelines,
which can be found here: http://www.google.com/webmasters/guidelines.html
In order to preserve the quality of our search engine, we have
temporarily removed some webpages from our search results. Currently
pages from http://www.chefrevival.com.au/ are scheduled to be removed for at least 30 days.

Specifically, we detected the following practices on your webpages:
On http://www.chefrevival.com.au/, we noticed the following hidden text: “Chef Revival Chef Uniforms – A range of stylish, comfortable and durable chef uniforms designed to withstand the pressures of today’s kitchens, Chef apron Chef Jackets Chef Pant Chef trouser Chef headwear Chef Apron Chef Shirt Chef Neckties, Chef aprons Chef Jackets Chef Pants Chef trousers Chef headwears Chef Aprons Chef Shirts Chef Neckties, traditional check chefwear clothes”

We would prefer to have your pages in Google’s index. If you wish to be
reincluded, please correct or remove all pages that are outside our
quality guidelines. When you are ready, please submit a reinclusion
request at http://www.google.com/support/bin/request.py

You can select “I’m a webmaster inquiring about my website” and
then “Why my site disappeared from the search results or dropped in
ranking,” click Continue, and then make sure to type “Reinclusion
Request” in the Subject: line of the resulting form.

Sincerely,
Google Search Quality Team

Q: Matt, are you excited about this?
A: Heck yeah. I’m glad we’re trying to proactively contact webmasters and site owners when there’s an issue with their site in Google. I’m so excited that I split an infinitive in that sentence, didn’t I? Doh! :)

Filing a reinclusion request

Update November 4th, 2007: Hey everyone, the official Google documentation on how to file a reconsideration request is here: http://www.google.com/support/webmasters/bin/answer.py?answer=35843 and we now refer to it as a “reconsideration request.” Why? Well, not every spam penalty results in removal from Google’s index, so “reconsideration” is more accurate than “reinclusion.” I’ll leave the rest of the post up because much of the info below is still useful.

——————

Hmm. Everybody wants to hear about SEO-ish stuff instead of gadgets. I’ll still subject you to pure geekery now and then, but let’s tackle how to do a reinclusion request.

First off, what’s a reinclusion request and why would you want to do one? If you’ve been experimenting with SEO, or you employ as SEO company that might be doing things outside Google’s guidelines, and your site has taken a precipitous drop recently, you may have a spam penalty. A reinclusion request asks Google to remove any potential spam penalty.

The first step is to take a long, hard look at your website. Is there hidden text, hidden links, or cloaking on your site, especially on the front page? Are there doorway pages that do a JavaScript or some other redirect to a different page? Were you trying to use some automated program to get links or scrape Google? Whatever you find that you think may have been against Google’s guidelines, correct or remove those pages.

Now where should you send a reinclusion request? This has changed in the last few months from an email address to a web form. The best location to go is http://www.google.com/support/bin/request.py . You can select “I’m a webmaster inquiring about my website” and then select “Why my site disappeared from the search results or dropped in ranking.” Click Continue, and on the page that shows up, make sure to type “Reinclusion Request” in the Subject: line of the resulting form. Upper- or lower-case doesn’t matter, but make sure you use the words “reinclusion request” in the subject line so it gets routed to the right place. (See the newer instructions at the top of this post.)

Now we come to the heart of things: what goes into a reinclusion request. Fundamentally, Google wants to know two things: 1) that any spam on the site is gone or fixed, and 2) that it’s not going to happen again. I’d recommend giving a short explanation of what happened from your perspective: what actions may have led to any penalties and any corrective action that you’ve taken to prevent any spam in the future. If you employed an SEO company, it indicates good faith if you tell us specifics about the SEO firm and what they did–it assists us in evaluating reinclusion requests. Note that SEO and mostly-affiliate sites may need to provide more evidence of good faith before a site will be reincluded; such sites should be quite familiar with Google’s quality guidelines.

Okay, so you found the hidden text that your webmaster put on your front page, you removed it, and you sent your reinclusion request off to Google. How long do you have to wait now? That depends on when Google reviews the request and on the type of spam penalty you have. In the days of monthly index updates it could take 6-8 weeks for a site to be reincluded after a site was approved, and the severest spam penalties can take that long to clear out after an approval. For less severe stuff like hidden text, it may only take 2-3 weeks, depending on when someone looks at the request and if the request is approved.

There’s an interesting thread started by stuntdubl here. I’d add the following things to that thread:

  • Don’t bother mentioning that you spend money on AdWords or you’re an AdSense publisher. The person who will look at your reinclusion request doesn’t care if you have a business relationship with Google. Remember, we need to know 1) that the spam has been corrected or removed and 2) that it isn’t going to happen again.
  • I would request reinclusion for one domain at a time. It looks bad if you had 20+ sites all thrown out at once, and you send a reinclusion request for 20 domains in one email.

That’s what I can think of right now. For the 1-2 people who have asked about their sites in comments–that’s the right procedure to follow. Hope that helps.

css.php