l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2011 May 25 13:52

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] how to modify .htaccess to prevent wget or the likesfrom downing my site?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] how to modify .htaccess to prevent wget or the likesfrom downing my site?

thanks a lot, Ken! I love this mailing list!
its not that a disappointing fact thou, I guess I can still enumerate
downloaders and common user-agent strings. Do you have a link or
something that lists popular downloaders? I still want to protect my
site to some extend.


On Wed, May 25, 2011 at 3:10 PM, Chanoch (Ken) Bloom <kbloom@gmail.com> wrote:
> On Wed, 2011-05-25 at 14:50 -0400, Hai Yi wrote:
>> Hello all:
>> I first asked this question to the support of my web host, and they
>> redirected me to this link:
>> http://www.webhostingtalk.com/showthread.php?t=437549
>> and the snippet on that page looks like:
>> SetEnvIfNoCase User-Agent "^Wget" bad_bot
>> <Limit GET POST>
>>    Order Allow,Deny
>>    Allow from all
>>    Deny from env=bad_bot
>> </Limit>
> This snippet will only block wget, if wget deigns to identify itself as
> wget by saying so in the user-agent string.
>> I copied and pasted it to the .htaccess under /public_html. Still, I
>> am able to use this command to fetch my site:
>> wget --wait=20 --limit-rate=20K -r -p -U Mozilla www.my_iste.com
> Yup. Wget decided to identify itself as Mozilla in the user-agent
> string. That means you have no way at all of knowing that someone's
> trying to use Wget to download from your site.
>> However, if I  tried the same wget with a slight change in the command
>> line (without " -U Mozilla ")
>>  wget --wait=20 --limit-rate=20K -r -p www.my_site.com
>> I get this:
>> --2011-05-25 14:30:36--  http://www.my_site.com/
>> Resolving www.my_site.com... xxx.xx.xxx.xx
>> Connecting to www.my_site.com|xxx.xx.xxx.xx|:80... connected.
>> HTTP request sent, awaiting response... 403 Forbidden
>> 2011-05-25 14:30:37 ERROR 403: Forbidden.
> Wget deigned to identify itself as wget this time.
>> Now I have three questions:
>> 1. Why didn't the code in .htaccess prevent the downloading? Did I
>> miss something?
> (See my explanation above.)
>> 2. Do we have other tools acting like wget, how can we prevent them
>> all from downing the site content?
> There are other tools that act like wget. You can't prevent them *all*
> from downloading, though you could blacklist specific ones the way you
> did with Wget. Of course, they may also decide to change the User-Agent
> string, then you have no way of telling at all.
>> 3. If someone is downloading, can we have some log file that can
>> expose the downloader's info?
> Your web browser logs will have their IP address, but I doubt you could
> do anything useful with that information. If your user logs in to the
> site, you could try to keep track of that yourself somehow, but that
> could be very complex depending what you're trying to prevent.
> ...
> In other words, the protection you're asking for is basically impossible
> against a determined downloader.
> --Ken
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.