<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: How to fetch a url with curl or wget silently</title>
	<atom:link href="http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/</link>
	<description>neat fun stuff</description>
	<pubDate>Sun, 12 Oct 2008 10:58:00 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: دروس</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-105017</link>
		<dc:creator>دروس</dc:creator>
		<pubDate>Thu, 24 May 2007 22:26:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-105017</guid>
		<description>Oh…for a min there I though wget –spider http://example.com would give me all the links spidered right from the default site page. Something like Google’s site:http://example.com

cgiproxy guy Said,
January 21, 2007 @ 7:06 am 

wget is a program with incredible untapped potential for most people

I personally like wget –delete-after http://website.com

it deletes the output after the execution, not as nice as quite or /dev/null 2&#62;&#38;1 but very powerful still.

also wget will work with tor, just a question of having tor proxy set up right on your server and digging for the additional commands.

Jeff Huckaby Said,
April 10, 2007 @ 7:21 pm 

Verbosity can be good. Wget or curl with their respective “quiet” options will silence some output from those scripts but not all. They will still likely show critical errors, which is why you may want the redirects to /dev/null. However, we often see cases where you need some errors but not others. wget has a -nv flag that is not verbose but not quiet. You can also use /etc/cron.d/filename on most linux systems to fine-tune your cron. You can specify a mail address within the file you place in this directory. This can be useful to alert someone in case of a problem.

Also, don’t overlook security. Run your crons with a user with as few privileges as possible. If you simply need to wget a file, then a normal user with no login privileges will often suffice. 

Lastly, don’t forget –tries=number option. This will have wget retry in case of a failure. Note the default is 20 retries unless a failure occurs. There is also a –retry-connrefused which will retry even when a connection is refused, useful for overloaded URLs. 

Lastly, there is the –timeout option. Always use this option if you are fetching URLs frequently. The default read timeout is 900 seconds. That’s 15 minutes! I’ve seen many servers with dozens of crons piled up because they are polling every 5 minutes but the server is slow, so they are waiting 10 minutes or so to get the data. The problem quickly snowballs out of control.

In brief, we recommend:
1. use the least privileged user as possible for the user running the cron.
2. explicitly set timeouts to work with your application
3. decide what level of error reporting you need and use -q -nv and/or /etc/cron.d as required.

These tips are mostly for wget but curl has many of the same options.

Lastly, one more security tip. We often create a “wgetforuser” which is wget with permission that users can use. We then set the main wget to only be used by root. This helps (does not prevent) some attacks where a wget command is passed into an insecure web application.



http://www.ihsac.com</description>
		<content:encoded><![CDATA[<p>Oh…for a min there I though wget –spider <a href="http://example.com" rel="nofollow">http://example.com</a> would give me all the links spidered right from the default site page. Something like Google’s site:http://example.com</p>
<p>cgiproxy guy Said,<br />
January 21, 2007 @ 7:06 am </p>
<p>wget is a program with incredible untapped potential for most people</p>
<p>I personally like wget –delete-after <a href="http://website.com" rel="nofollow">http://website.com</a></p>
<p>it deletes the output after the execution, not as nice as quite or /dev/null 2&gt;&amp;1 but very powerful still.</p>
<p>also wget will work with tor, just a question of having tor proxy set up right on your server and digging for the additional commands.</p>
<p>Jeff Huckaby Said,<br />
April 10, 2007 @ 7:21 pm </p>
<p>Verbosity can be good. Wget or curl with their respective “quiet” options will silence some output from those scripts but not all. They will still likely show critical errors, which is why you may want the redirects to /dev/null. However, we often see cases where you need some errors but not others. wget has a -nv flag that is not verbose but not quiet. You can also use /etc/cron.d/filename on most linux systems to fine-tune your cron. You can specify a mail address within the file you place in this directory. This can be useful to alert someone in case of a problem.</p>
<p>Also, don’t overlook security. Run your crons with a user with as few privileges as possible. If you simply need to wget a file, then a normal user with no login privileges will often suffice. </p>
<p>Lastly, don’t forget –tries=number option. This will have wget retry in case of a failure. Note the default is 20 retries unless a failure occurs. There is also a –retry-connrefused which will retry even when a connection is refused, useful for overloaded URLs. </p>
<p>Lastly, there is the –timeout option. Always use this option if you are fetching URLs frequently. The default read timeout is 900 seconds. That’s 15 minutes! I’ve seen many servers with dozens of crons piled up because they are polling every 5 minutes but the server is slow, so they are waiting 10 minutes or so to get the data. The problem quickly snowballs out of control.</p>
<p>In brief, we recommend:<br />
1. use the least privileged user as possible for the user running the cron.<br />
2. explicitly set timeouts to work with your application<br />
3. decide what level of error reporting you need and use -q -nv and/or /etc/cron.d as required.</p>
<p>These tips are mostly for wget but curl has many of the same options.</p>
<p>Lastly, one more security tip. We often create a “wgetforuser” which is wget with permission that users can use. We then set the main wget to only be used by root. This helps (does not prevent) some attacks where a wget command is passed into an insecure web application.</p>
<p><a href="http://www.ihsac.com" rel="nofollow">http://www.ihsac.com</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Huckaby</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-101338</link>
		<dc:creator>Jeff Huckaby</dc:creator>
		<pubDate>Wed, 11 Apr 2007 02:21:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-101338</guid>
		<description>Verbosity can be good.  Wget or curl with their respective "quiet" options will silence some output from those scripts but not all.  They will still likely show critical errors, which is why you may want the redirects to /dev/null.  However, we often see cases where you need some errors but not others.  wget has a -nv flag that is not verbose but not quiet.  You can also use /etc/cron.d/filename on most linux systems to fine-tune your cron.  You can specify a mail address within the file you place in this directory.  This can be useful to alert someone in case of a problem.

Also, don't overlook security.  Run your crons with a user with as few privileges as possible.  If you simply need to wget a file, then a normal user with no login privileges will often suffice. 

Lastly, don't forget --tries=number option. This will have wget retry in case of a failure. Note the default is 20 retries unless a failure occurs.  There is also a --retry-connrefused which will retry even when a connection is refused, useful for overloaded URLs.  

Lastly, there is the --timeout option. Always use this option if you are fetching URLs frequently.  The default read timeout is 900 seconds. That's 15 minutes! I've seen many servers with dozens of crons piled up because they are polling every 5 minutes but the server is slow, so they are waiting 10 minutes or so to get the data. The problem quickly snowballs out of control.

In brief, we recommend:
1. use the least privileged user as possible for the user running the cron.
2. explicitly set timeouts to work with your application
3. decide what level of error reporting you need and use -q -nv and/or /etc/cron.d as required.

These tips are mostly for wget but curl has many of the same options.

Lastly, one more security tip. We often create a "wgetforuser" which is wget with permission that users can use.  We then set the main wget to only be used by root. This helps (does not prevent) some attacks where a wget command is passed into an insecure web application.</description>
		<content:encoded><![CDATA[<p>Verbosity can be good.  Wget or curl with their respective &#8220;quiet&#8221; options will silence some output from those scripts but not all.  They will still likely show critical errors, which is why you may want the redirects to /dev/null.  However, we often see cases where you need some errors but not others.  wget has a -nv flag that is not verbose but not quiet.  You can also use /etc/cron.d/filename on most linux systems to fine-tune your cron.  You can specify a mail address within the file you place in this directory.  This can be useful to alert someone in case of a problem.</p>
<p>Also, don&#8217;t overlook security.  Run your crons with a user with as few privileges as possible.  If you simply need to wget a file, then a normal user with no login privileges will often suffice. </p>
<p>Lastly, don&#8217;t forget &#8211;tries=number option. This will have wget retry in case of a failure. Note the default is 20 retries unless a failure occurs.  There is also a &#8211;retry-connrefused which will retry even when a connection is refused, useful for overloaded URLs.  </p>
<p>Lastly, there is the &#8211;timeout option. Always use this option if you are fetching URLs frequently.  The default read timeout is 900 seconds. That&#8217;s 15 minutes! I&#8217;ve seen many servers with dozens of crons piled up because they are polling every 5 minutes but the server is slow, so they are waiting 10 minutes or so to get the data. The problem quickly snowballs out of control.</p>
<p>In brief, we recommend:<br />
1. use the least privileged user as possible for the user running the cron.<br />
2. explicitly set timeouts to work with your application<br />
3. decide what level of error reporting you need and use -q -nv and/or /etc/cron.d as required.</p>
<p>These tips are mostly for wget but curl has many of the same options.</p>
<p>Lastly, one more security tip. We often create a &#8220;wgetforuser&#8221; which is wget with permission that users can use.  We then set the main wget to only be used by root. This helps (does not prevent) some attacks where a wget command is passed into an insecure web application.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cgiproxy guy</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-94078</link>
		<dc:creator>cgiproxy guy</dc:creator>
		<pubDate>Sun, 21 Jan 2007 15:06:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-94078</guid>
		<description>wget is a program with incredible untapped potential for most people

I personally like wget --delete-after http://website.com

it deletes the output after the execution, not as nice as quite or  /dev/null 2&#62;&#38;1 but very powerful still.

also wget will work with tor, just a question of having tor proxy set up right on your server and digging for the additional commands.</description>
		<content:encoded><![CDATA[<p>wget is a program with incredible untapped potential for most people</p>
<p>I personally like wget &#8211;delete-after <a href="http://website.com" rel="nofollow">http://website.com</a></p>
<p>it deletes the output after the execution, not as nice as quite or  /dev/null 2&gt;&amp;1 but very powerful still.</p>
<p>also wget will work with tor, just a question of having tor proxy set up right on your server and digging for the additional commands.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anjanesh</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93320</link>
		<dc:creator>Anjanesh</dc:creator>
		<pubDate>Tue, 09 Jan 2007 02:22:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93320</guid>
		<description>Oh...for a min there I though &lt;b&gt;wget --spider http://example.com&lt;/b&gt; would give me all the links spidered right from the default site page. Something like Google's site:http://example.com</description>
		<content:encoded><![CDATA[<p>Oh&#8230;for a min there I though <b>wget &#8211;spider <a href="http://example.com" rel="nofollow">http://example.com</a></b> would give me all the links spidered right from the default site page. Something like Google&#8217;s site:http://example.com</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93283</link>
		<dc:creator>Nick</dc:creator>
		<pubDate>Sun, 07 Jan 2007 21:05:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93283</guid>
		<description>I had another problem. I had restricted access to curl, wget and every other suspicious bot to my site, so the only way to achive a cronjob like this was by using the -A directive, which sends an agent header.

eg. 

10 * * * * curl -A Firefox http://.......  &#62; /dev/null 2&#62;&#38;1</description>
		<content:encoded><![CDATA[<p>I had another problem. I had restricted access to curl, wget and every other suspicious bot to my site, so the only way to achive a cronjob like this was by using the -A directive, which sends an agent header.</p>
<p>eg. </p>
<p>10 * * * * curl -A Firefox <a href="http://......" rel="nofollow">http://&#8230;&#8230;</a>.  &gt; /dev/null 2&gt;&amp;1</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93244</link>
		<dc:creator>Matt</dc:creator>
		<pubDate>Fri, 05 Jan 2007 22:47:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93244</guid>
		<description>Another useful one to know is: wget --spider

I have some protected pages that are inside my framework that need to be run at intervals, --spider makes wget behave as a web spider (it won't download any pages, it'll just check to see if they are there). 

You can also disable output by passing everything to /dev/null

* * * * * wget --spider http://www.example.com &#62;/dev/null 2&#62;&#38;1</description>
		<content:encoded><![CDATA[<p>Another useful one to know is: wget &#8211;spider</p>
<p>I have some protected pages that are inside my framework that need to be run at intervals, &#8211;spider makes wget behave as a web spider (it won&#8217;t download any pages, it&#8217;ll just check to see if they are there). </p>
<p>You can also disable output by passing everything to /dev/null</p>
<p>* * * * * wget &#8211;spider <a href="http://www.example.com" rel="nofollow">http://www.example.com</a> &gt;/dev/null 2&gt;&amp;1</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Sandy</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93236</link>
		<dc:creator>Matt Sandy</dc:creator>
		<pubDate>Fri, 05 Jan 2007 19:51:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93236</guid>
		<description>feddy, as far as I know you can't use a proxy with that function, but if you really need more functionality then go about it the curl way.</description>
		<content:encoded><![CDATA[<p>feddy, as far as I know you can&#8217;t use a proxy with that function, but if you really need more functionality then go about it the curl way.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: feddy</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93233</link>
		<dc:creator>feddy</dc:creator>
		<pubDate>Fri, 05 Jan 2007 17:47:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93233</guid>
		<description>Matt Sandy,  Do you know how you could add proxy/tor support with file_get_contents()</description>
		<content:encoded><![CDATA[<p>Matt Sandy,  Do you know how you could add proxy/tor support with file_get_contents()</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93196</link>
		<dc:creator>Steve</dc:creator>
		<pubDate>Thu, 04 Jan 2007 18:19:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93196</guid>
		<description>feddy: 2&#62;&#38;1 redirects stderr to stdout so that everything ends up in stdout and therefore to /dev/null...</description>
		<content:encoded><![CDATA[<p>feddy: 2&gt;&amp;1 redirects stderr to stdout so that everything ends up in stdout and therefore to /dev/null&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt Sandy</title>
		<link>http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93187</link>
		<dc:creator>Matt Sandy</dc:creator>
		<pubDate>Thu, 04 Jan 2007 16:30:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/#comment-93187</guid>
		<description>I simple use the file_get_contentents() for everything GET, I save curl for POST.</description>
		<content:encoded><![CDATA[<p>I simple use the file_get_contentents() for everything GET, I save curl for POST.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
