How to fetch a url with curl or wget silently

January 3, 2007

in How to, Linux/Ubuntu, Productivity

Cron jobs need quiet operation; if a command generates output, you’ll get an email from cron with the command output. So if you want to fetch a file silently with wget or curl, use a command like this:

curl --silent --output output_filename http://example.com/urltofetch.html

wget --quiet --output-document output_filename http://example.com/urltofetch.html

There are shorter versions of these options, but using the verbose options will make code or cron jobs easier to understand if you come back to them. Be aware that urls with “&” in them can confuse wget at least, so depending on your shell (bash, csh, tcsh), you may need to put single or double quotes around the url.

{ 20 comments… read them below or add one }

bob rains January 3, 2007 at 9:08 pm

I have nothing to say other than “Hey Look I’m the first comment”

It’s the little things that keep me going.

Steve January 3, 2007 at 9:49 pm

Of course if you want it deathly quiet, just add ‘>/dev/null 2>&1′ to the end…

ZZPrices January 3, 2007 at 10:10 pm

PHP executable works well too. (It’s what I prefer to use.)

alek January 3, 2007 at 10:14 pm

If you want really, really silent wget/curl, then add a “2>&1 /dev/null” … but it is often a good idea for errors/exception conditions to show up *somewhere* in case things go awry. Along those lines, wget has an “--append-output” option that may be useful.

Matt Cutts January 3, 2007 at 10:30 pm

Good points, alek and Steve.

Xig January 4, 2007 at 12:35 am

Thanks Matt, that’s my first file in my new HOWTO folder :)

Maurice January 4, 2007 at 12:57 am

I tend to do thease sort of automated jobs as perl scripts.

I can then make sure the perl logs its actions to a logfile for later debuging – its also a little clearer when you or a thirdparty comes back to it in 6 months time.

The last one I did was a handler for parseing our incoming sms’s (delivered as xml) into our SMS campaign manager.

BTW get the Perl cook book from oreily it saves so much Time and has loads of usefull tools you can steal^h^h^h^h^h adapt.

Milan Kryl January 4, 2007 at 2:36 am

After some more howtos you can change your blog title to:

Matt Cutts: Gadgets, Google, HowTo and SEO

:-)

feddy January 4, 2007 at 5:46 am

I understand the ‘> /dev/null’ part, but what it the purpose of the ‘2>&1′

Matt Sandy January 4, 2007 at 8:30 am

I simple use the file_get_contentents() for everything GET, I save curl for POST.

Steve January 4, 2007 at 10:19 am

feddy: 2>&1 redirects stderr to stdout so that everything ends up in stdout and therefore to /dev/null…

feddy January 5, 2007 at 9:47 am

Matt Sandy, Do you know how you could add proxy/tor support with file_get_contents()

Matt Sandy January 5, 2007 at 11:51 am

feddy, as far as I know you can’t use a proxy with that function, but if you really need more functionality then go about it the curl way.

Matt January 5, 2007 at 2:47 pm

Another useful one to know is: wget --spider

I have some protected pages that are inside my framework that need to be run at intervals, --spider makes wget behave as a web spider (it won’t download any pages, it’ll just check to see if they are there).

You can also disable output by passing everything to /dev/null

* * * * * wget --spider http://www.example.com >/dev/null 2>&1

Nick January 7, 2007 at 1:05 pm

I had another problem. I had restricted access to curl, wget and every other suspicious bot to my site, so the only way to achive a cronjob like this was by using the -A directive, which sends an agent header.

eg.

10 * * * * curl -A Firefox http://……. > /dev/null 2>&1

Anjanesh January 8, 2007 at 6:22 pm

Oh…for a min there I though wget --spider http://example.com would give me all the links spidered right from the default site page. Something like Google’s site:http://example.com

cgiproxy guy January 21, 2007 at 7:06 am

wget is a program with incredible untapped potential for most people

I personally like wget --delete-after http://website.com

it deletes the output after the execution, not as nice as quite or /dev/null 2>&1 but very powerful still.

also wget will work with tor, just a question of having tor proxy set up right on your server and digging for the additional commands.

Jeff Huckaby April 10, 2007 at 7:21 pm

Verbosity can be good. Wget or curl with their respective “quiet” options will silence some output from those scripts but not all. They will still likely show critical errors, which is why you may want the redirects to /dev/null. However, we often see cases where you need some errors but not others. wget has a -nv flag that is not verbose but not quiet. You can also use /etc/cron.d/filename on most linux systems to fine-tune your cron. You can specify a mail address within the file you place in this directory. This can be useful to alert someone in case of a problem.

Also, don’t overlook security. Run your crons with a user with as few privileges as possible. If you simply need to wget a file, then a normal user with no login privileges will often suffice.

Lastly, don’t forget --tries=number option. This will have wget retry in case of a failure. Note the default is 20 retries unless a failure occurs. There is also a --retry-connrefused which will retry even when a connection is refused, useful for overloaded URLs.

Lastly, there is the --timeout option. Always use this option if you are fetching URLs frequently. The default read timeout is 900 seconds. That’s 15 minutes! I’ve seen many servers with dozens of crons piled up because they are polling every 5 minutes but the server is slow, so they are waiting 10 minutes or so to get the data. The problem quickly snowballs out of control.

In brief, we recommend:
1. use the least privileged user as possible for the user running the cron.
2. explicitly set timeouts to work with your application
3. decide what level of error reporting you need and use -q -nv and/or /etc/cron.d as required.

These tips are mostly for wget but curl has many of the same options.

Lastly, one more security tip. We often create a “wgetforuser” which is wget with permission that users can use. We then set the main wget to only be used by root. This helps (does not prevent) some attacks where a wget command is passed into an insecure web application.

دروس May 24, 2007 at 3:26 pm

Oh…for a min there I though wget –spider http://example.com would give me all the links spidered right from the default site page. Something like Google’s site:http://example.com

cgiproxy guy Said,
January 21, 2007 @ 7:06 am

wget is a program with incredible untapped potential for most people

I personally like wget –delete-after http://website.com

it deletes the output after the execution, not as nice as quite or /dev/null 2>&1 but very powerful still.

also wget will work with tor, just a question of having tor proxy set up right on your server and digging for the additional commands.

Jeff Huckaby Said,
April 10, 2007 @ 7:21 pm

Verbosity can be good. Wget or curl with their respective “quiet” options will silence some output from those scripts but not all. They will still likely show critical errors, which is why you may want the redirects to /dev/null. However, we often see cases where you need some errors but not others. wget has a -nv flag that is not verbose but not quiet. You can also use /etc/cron.d/filename on most linux systems to fine-tune your cron. You can specify a mail address within the file you place in this directory. This can be useful to alert someone in case of a problem.

Also, don’t overlook security. Run your crons with a user with as few privileges as possible. If you simply need to wget a file, then a normal user with no login privileges will often suffice.

Lastly, don’t forget –tries=number option. This will have wget retry in case of a failure. Note the default is 20 retries unless a failure occurs. There is also a –retry-connrefused which will retry even when a connection is refused, useful for overloaded URLs.

Lastly, there is the –timeout option. Always use this option if you are fetching URLs frequently. The default read timeout is 900 seconds. That’s 15 minutes! I’ve seen many servers with dozens of crons piled up because they are polling every 5 minutes but the server is slow, so they are waiting 10 minutes or so to get the data. The problem quickly snowballs out of control.

In brief, we recommend:
1. use the least privileged user as possible for the user running the cron.
2. explicitly set timeouts to work with your application
3. decide what level of error reporting you need and use -q -nv and/or /etc/cron.d as required.

These tips are mostly for wget but curl has many of the same options.

Lastly, one more security tip. We often create a “wgetforuser” which is wget with permission that users can use. We then set the main wget to only be used by root. This helps (does not prevent) some attacks where a wget command is passed into an insecure web application.

http://www.ihsac.com

David Spector May 28, 2009 at 4:55 pm

I’m hoping I can use “wget http://…?arg=value&pwd=password” in crontab to call a server I wrote to get a particular action done (it sends me the results in an email).

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Previous post: Converting deb files in Ubuntu

Next post: New Reader Trends page