Simplifying Apache configuration?

I love the Apache web server. It’s blazingly fast, it’s ubiquitous, and it can do a ton of neat stuff. Most of the world’s webservers run on Apache. The only thing that I don’t like about Apache is some of the configuration and set-up. If you ask 10 webmasters what’s the trickiest technical thing to do, about 5-6 of them will say things like “configuring a web server to do redirects, mod_rewrite, and setting up .htaccess.” For example, WMW has a guide to changing dynamic urls to static urls with mod_rewrite, but it’s still pretty complicated. Notice intelligent people debating the finer points of regular expressions places like here, here, here, and here.

A few of us were talking about this at Google. Should we include an .htaccess tool in Sitemaps so that you don’t need a UNIX command-line to generate password hashes? Maybe a tool to take a list of desired redirects or rewrites, then output the correct syntax that you could cut and paste into a web server config file?

Then I remembered: Summer of Code 2006 just started taking project proposals! Not only is the Apache Software Foundation a mentoring organization, but they even have a wiki for project suggestions.

So if you’re a student and want to earn $4500 for hacking on some code this summer (and beef up your resume with Open Source experience *and* get more familiar with Apache), why not try to making it easier to configure Apache? A light-weight project might be a program that takes easy input and outputs the correct configuration code that can be cut-and-pasted into Apache config files. A project for a more skilled person might be directly changing Apache to allow simpler configuration. If you do propose a project to simplify Apache config files, let me know. πŸ™‚

(Note: I didn’t actually discuss this with the Apache folks; it’s just an area where I see webmasters make mistakes. If you’re doing Summer of Code, you’ll want to chat with the Apache folks first, because they might have other things that they need more.)

28 Responses to Simplifying Apache configuration? (Leave a comment)

  1. Isn’t most of the URL rewriting just knowing how to do regular expressions? htaccess files were meant to give you a lot of control over a directory, if an alternative is created, it better not interfere with the control I get with regular expressions. Regular expressions are the bomb and always have been for string camparisons. People just need to start thinking in [A-Za-z0-9]+ :)!

  2. >>Should we include an .htaccess tool in Sitemaps so that you don’t need a UNIX command-line to generate password hashes?

    Matt, some of us who are technically challenged (nice way of saying idjuts) don’t use command line, but do use whatever the shared hosting provides with .htaccess and sometimes those can create a problem we have no control over.

    Well, over the weekend I printed out what Google’s currently got indexed (and Supplemental and no longer indexed) for a few of my sites, and it seems I’ve hijacked myself using 301’s with some pages. Not 302’s, but 301’s.

    No way will I tell publicly how I did it (for obvious reasons), I’m just going through and fixing it up, but it’s a bit messed up at Google and Yahoo knocked me right on my patootie with the sites involved.

    I’m thinking it may be some kind of an Apache glitch with some configurations that’s affecting more people than just myself using cheap shared hosting who seem to be losing pages.

  3. I would love for sitemaps to offer a little more support in educating us about .htaccess, it is just one of those things we can really mess up bad. I screw around with it and get it working but never know if what I do is compliant or correct.

  4. For starters, can someone simplify setting up mod_gzip? πŸ˜‰

    Really, great post on mod_gzip. There are just way too many things that Apache makes tougher than necessary, but most of it is in the spirit of making it work on more operating systems than anyone ever really cares about. Heck, even just making it Linux compatible means it has to work on how many 1000’s of very different types of systems now? Everyone has their own Linux flavor, and none of them work the same way.

  5. You mentioned open source so I thought I toss this in with it…
    I run osCommerce sites…with BidDaddy it forced me to look hard at how the system created links…it created multiple uri to the same info…so it looked like duplicate content…MSN did fine, Google did good in the beginning but when Yahoo picked up links from space and beyond…it affected Google as Google then picked up these links from hell…Well, I have reworked all the links and made sure there is only one way to any page…set up a custom 404 page (yes it gives a 404 status in the header) to pick up customers with old/bad links…and php pages will give 404 error if parameters are not correct…

    Now for the questions…is there anything else I can do to clean up the supplemental problem created by this? And is there anything I can do to speed up the process of Googlebot picking up these changes?

    As for the Apache thing…heck I am already bald on top…do you think $4500 will be enough for a full transplant?

  6. Speaking of which, I just bought a new Compaq Presario and had to install PHP, Apache, MySQL 5.0 and PHPMyAdmin. PHP… OK, still having problems with php_curl (where do you want my .dlls oh mighty XP? I’ve tried it all over, but still no luck!)… Apache, OK, just took the ini from my PC. MySQL 5.0… gave me tons of problems. Took me a few hours before I found a bug report on the config.exe, then bypassed it to finally get it working. PHPMyAdmin was definately the easiest…

  7. >>Should we include an .htaccess tool … ?

    Yes, great idea! I’ve struggled a lot getting .htaccess to do what I want, esp with respect to redirecting old to new pages to avoid duplication.

  8. Pffffffft @ whining about Apache / gzip / htaccess / regex. RTFM’s chumps.

    Disclaimer:
    .NET developer & MS Whore πŸ˜‰

  9. Matt I am so happy you brought this up. I would love to see an .htaccess tool from Google Sitemaps. I have been learning about .htaccess syntax over the past year and am sure I still only know the minimum.

    It’s nice to see that the really intelligent people that comment here are also having problems with this aspect of webmastering. Makes me feel better about myself as a webmaster.

    Let us know when the Sitemaps crew has this all done. Tomorrow right? πŸ˜‰

  10. Hmm, was actually thinking of this the other day, including on the Samba configuration side of things. I’ve seen “Swat” and the “Webmin” stuff, and they’re ok. But, they’re also REALLY “cludgy”. If you look at OS X’s administration tools, those are elegant and easy to use to setup machines. There hasn’t been a really decent configuration tool that allows powerful manipulations of the configuration files, while allowing simple manipulation.

    SO, on to my thoughts on how to implement such a thing. Personally, I’d do a java/swing type system. First, this let’s you run the app on any system you’d wish. Keep the configuration handler stuff separate from the interface code, so you could use a toolkit like Echo2 to generate the files through a web-face. Then, do things like using say a tree-style interface for the configuration – have categories, such as “Virtual Hosts”, “Modules”, etc.

    This is just a preliminary thought, and with a good help system, this could be a really nice system overall.

    SO, yes, I agree – configuring most of this stuff is a pain, and it would be handy to have an easy way to configure these systems.

  11. What good is an .htaccess tool if the same people will then have problems doing the appropriate changes to the code for displaying the same URLs? There are already free tools available to create htaccess files and rules (just do a Google search for htaccess generator). Having an .htaccess file is only half the rent coming in and it is certainly not the holy grale everyone is looking for. The website code also needs to be changed for displaying those nice looking URLs and if you are having a problem writing the htaccess file rules, you will also have problems changing the code in a software not written by yourself.

    Chris

  12. I read posts like this and think, geez, maybe I don’t give myself enough credit. Is it really that hard to configure Apache? I’ve been using it for about five or six years, and never really found it all that difficult. It’s exceedingly simple next to the needless complexity of BIND or Sendmail or AWStats or…

    It reminds me of a Slashdot discussion bemoaning how difficult SQL was to learn. SQL? Hard? You gotta be kidding. It’s about as straightforward as something like that can get. “Select this from that where this equals something order by this limit to this many.” Just understand the order in which everything goes, and you’re golden.

  13. .htaccess, regex, mod_rewrite, etc., are responsible for billions of dollars in lost productivity every year. It’s mind-boggling that this backward, user-hostile technology still exists. Anything that Google could do to kill it off as quickly as possible would be a Great Thing for the world.

  14. @Ben
    Isnt anyone who uses MS a whore? I mean, with all of the viruses and all – it just seems fitting.

    Using .htaccess has proven to be a great method to create dynamic URLS, as well as filter what is typed in the URL. It’s all about Regular Expressions. Problem is, some of what is out in the great Internet is very poor (performance wise). There are good ways and bad ways to write a Regex – and I have seen alot of bad ones.

    If you cant use that, PHP/Apache users could always use the ForceType and Apache lookback to get the pages.

  15. πŸ˜› Nate

    I’ve had the (dis)pleasure of using a near-identical port of mod_rewrite for IIS, and I fully agree rewriting with it sucks. Doing it in a search engine friendly way sucks extra hard because of all the checks and measures to ensure that the rewritten url is the url it should be (eg: /page/1_this-is-the-page.aspx vs /page/1_this-site-sucks.aspx), adding more data to the db and another round trip to the db.

    I’ve also done a lot with rewriting using .NET’s Application_BeginRequest function, which intercepts requests for aspx’s and allows you to rewrite urls amongst other things.

    There is more overhead with the second method and it’s limited to filetypes passed to the aspnet process, but it is so much easier when you actually have the full weight of a programming language to work with instead of just text patterns and basic operators.

    So if someone wants to make mod_rewrite easier/better, think about losing the stripped down operators and options and having it handled by php (unless it can already be done that way).

  16. Cheers! I would love the Apache configuration to be easier. I recently have made a migration to Lighttpd for Rails, which has a much simpler configuration. I still have many PHP apps and open source projects running out of Apache however πŸ˜‰

  17. >>Should we include an .htaccess tool … ?

    It’s really great idea!

    I have been learning about .htaccess over the last year (having problems with one of my sites) and now i can notice: The .htaccess syntax is very powerful. You can do some amazing things with it.

  18. Kate Morris, I haven’t even suggested it to the Sitemaps folks yet. They’ve been busy though, so when I see them I’ll ask.

    Jason McIntosh, Samba configuration files seem like they’re a little harder than they need to be too. Hmmm.

    Mike Jackson, fair point, but at the point that your Mom/Dad/Sister/Brother/Cousin/Neighbor/Pet wants to set up a web site, it’s definitely much harder for those folks.

  19. I’m a little late to the game on this one but I wanted to throw in my support for a .htaccess tool with a friendly UI.

    I taught myself (as did most people who are reading this, me thinks) how to use the wonders of htaccess and config files. It was a bit painful but doable.

    Where a htaccess tool would be great is as part of a shared hosting package on the different hosting providers. It could be set up to give some access to the power of this without completely opening up the guts of the server to someone who may or may not have the wheels to work it safely.

  20. A tool for easy conf(s) is just what the *rest of the* world needs. Not everyone is technical but most can write articles, put up sites to place their ideas, family picture galleries, etc. Lot of content do come from the *normal* people who dont live and breathe .htaccess.
    I am a student myself and also participating in SoC 2006, and hope to work on this. I have used Apache for about 2 years but never felt very troubled with conf. phpMyAdmin surely is helpful and something similar for Apache is beautiful. Most tech guys will not understand the need for this but whenever you think that there are more college professors writing articles, or housewives writing stories or young guys making picture galleries; you will understand that these are *web content* yet the producers are not that tech savvy. So a tool must exist to help them.
    I surely look forward to working on something similar in SoC 2006.

  21. A great idea. I am also a student looking to participate in SoC 2006, and I had a few thoughts about this particular project.

    Obviously, the first thing that needs to be done is to identify maybe the top 10 or 20 of these tricky, yet common tasks that people are having trouble with. I know that using mod_rewrite, for instance, is constantly coming up on webmaster forums, despite the fact that many people are trying to do the exact same thing. One example is trying to make dynamic applications look static so that they’ll be crawled. Most of the tasks that come to my mind — password protecting a directory is another example — seem pretty simple to handle.

    For the writing of configuration files, the application is pretty useless unless it can be accessed remotely. One could start out by writing a local application in Java/C/C++ and then, to provide a nice user experience, you could implement front-ends in a whole slew of webapp technologies — PHP comes to mind, but JSP or Perl or anything similar would work, too. The whole thing could be done in using one of these webapp platforms, but that presupposes that the user has already configured one of them correctly and that seems a bit circular to me πŸ™‚ Also, having a command-line tool would give developers to write nice GUIs for their favorite OS to serve users running Apache locally.

    If something like this were implemented, it would make life a lot easier for professional and amateur webmasters alike, and, in doing so, would be a great way to promote adopting Apache.

    Dan

  22. Hello,

    we have currently developing a configuration tool for apache. We have a meeting soon with the author of mod_rewrite and mod_ssl to see if he is interested in getting involved. Our challenge is to get a full functional but easy UI which is not easy at all.

    erkan

  23. Cheers!

    The idea is really great!

    Most people will(or would like to) configure apache remotely, so any local app will be only a half-solution or even 1/8 solution.
    I’d suggest developing mod for apache that will provide simple web interface. Like if you go to http://www.yoursite.com/yourfolder/.htaccess and get it right there.

    Of course it’s going to be a serious challenge to compete with a simplicity of plain text files.

    Looking to participate in SoC 2006
    Andrew

  24. Hey Matt,
    I am also planning to develop a module that would make doing mod_rewrite a breeze, maybe using C++ so that it can be incorporated into other GUIs like WebMin, Comanche etc.
    Actually, i have just submitted my proposal to SOC 2006, but whether i get approved or not does not matter, i will still do it.

    Any ideas would be greatly welcome.

    erkan, i got loads of free time and i would love to help you anywhere i can.

  25. Matt, I really appreciate your thoughts on this issue. I have answered countless questions on these issues over the years, so I know this is an area that causes problems.

    But, you may want to consider that by exposing the very concept of “htaccess” through a tool such as Google sitemaps (or any other Google branded thing) you create extra attention towards it.

    That will mean that people who would otherwise never even know such a tool existed will suddently:

    (1) start asking in forums what it does,
    (2) hear all kinds of half-baked success stories,
    (3) start experimenting with it with no real need
    (4) make lots of errors, and
    (5) write a lot of posts in forums calling for help

    (and possibly cursing forums, htaccess, apache, and Google all the way)

    The Apache config files are very powerful tools. You can actually disable access to your whole site by accident if you mess it up – and I guess that’s where Google’s interest lies.

    But, both Google and the few people that dare to answer questions on these issues would really be better off if we turned this questions upside down:

    How do we make it less necessary for people to mess with these files in the first place?

    These are actually issues that the fine people at Google can solve (mostly) in-house, and without even thinking about Apache. It’s about

    a) handling long URLs better,
    b) handling URLs with parameters better
    c) handling site and page relocations better (301, 302, meta …)

    If these issues were handled better and/or more consisently all we needed was communication, and less people would feel a need to mess with htaccess files. I would gladly help in the communication effort as that would mean answering the same questions over-and-over-again less times.

  26. I m using a package program that install php,mysql,apache and phpmyadmin easily. You can download Phpdev at this adress http://sourceforge.net/projects/phpdev5

  27. If anyone ever takes up this challenge (still needed) Feel free to use the article Ultimate htaccess article as a reference. The day is coming..

  28. Great post Matt!
    I will try this on my site map. The .htaccess syntax is very powerful. You can do some amazing things with it.
    Thanks again Matt for great articles!

css.php