l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
April 7: Social gathering
Next Installfest:
Latest News:
Mar. 17: DavisGIG: municipal fiber for Davis
Page last updated:
2011 May 25 13:46

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] how to modify .htaccess to prevent wget or the likesfrom downing my site?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] how to modify .htaccess to prevent wget or the likesfrom downing my site?

On Wed, 2011-05-25 at 14:50 -0400, Hai Yi wrote:
> Hello all:
> I first asked this question to the support of my web host, and they
> redirected me to this link:
> http://www.webhostingtalk.com/showthread.php?t=437549
> and the snippet on that page looks like:
> SetEnvIfNoCase User-Agent "^Wget" bad_bot
> <Limit GET POST>
>    Order Allow,Deny
>    Allow from all
>    Deny from env=bad_bot
> </Limit>

This snippet will only block wget, if wget deigns to identify itself as
wget by saying so in the user-agent string.

> I copied and pasted it to the .htaccess under /public_html. Still, I
> am able to use this command to fetch my site:
> wget --wait=20 --limit-rate=20K -r -p -U Mozilla www.my_iste.com

Yup. Wget decided to identify itself as Mozilla in the user-agent
string. That means you have no way at all of knowing that someone's
trying to use Wget to download from your site.

> However, if I  tried the same wget with a slight change in the command
> line (without " -U Mozilla ")
>  wget --wait=20 --limit-rate=20K -r -p www.my_site.com
> I get this:
> --2011-05-25 14:30:36--  http://www.my_site.com/
> Resolving www.my_site.com... xxx.xx.xxx.xx
> Connecting to www.my_site.com|xxx.xx.xxx.xx|:80... connected.
> HTTP request sent, awaiting response... 403 Forbidden
> 2011-05-25 14:30:37 ERROR 403: Forbidden.

Wget deigned to identify itself as wget this time.

> Now I have three questions:

> 1. Why didn't the code in .htaccess prevent the downloading? Did I
> miss something?

(See my explanation above.)

> 2. Do we have other tools acting like wget, how can we prevent them
> all from downing the site content?

There are other tools that act like wget. You can't prevent them *all*
from downloading, though you could blacklist specific ones the way you
did with Wget. Of course, they may also decide to change the User-Agent
string, then you have no way of telling at all.

> 3. If someone is downloading, can we have some log file that can
> expose the downloader's info?

Your web browser logs will have their IP address, but I doubt you could
do anything useful with that information. If your user logs in to the
site, you could try to keep track of that yourself somehow, but that
could be very complex depending what you're trying to prevent.


In other words, the protection you're asking for is basically impossible
against a determined downloader.

vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Sunset Systems
Who graciously hosts our website & mailing lists!