l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
October 7: Social gathering
Next Installfest:
TBD
Latest News:
Aug. 18: Discounts to "Velocity" in NY; come to tonight's "Photography" talk
Page last updated:
2011 May 25 13:52

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] how to modify .htaccess to prevent wget or the likesfrom downing my site?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] how to modify .htaccess to prevent wget or the likesfrom downing my site?



thanks a lot, Ken! I love this mailing list!
its not that a disappointing fact thou, I guess I can still enumerate
downloaders and common user-agent strings. Do you have a link or
something that lists popular downloaders? I still want to protect my
site to some extend.

Thanks,
Hai


On Wed, May 25, 2011 at 3:10 PM, Chanoch (Ken) Bloom <kbloom@gmail.com> wrote:
> On Wed, 2011-05-25 at 14:50 -0400, Hai Yi wrote:
>> Hello all:
>>
>> I first asked this question to the support of my web host, and they
>> redirected me to this link:
>> http://www.webhostingtalk.com/showthread.php?t=437549
>>
>> and the snippet on that page looks like:
>>
>>
>> SetEnvIfNoCase User-Agent "^Wget" bad_bot
>>
>> <Limit GET POST>
>>    Order Allow,Deny
>>    Allow from all
>>    Deny from env=bad_bot
>> </Limit>
>
> This snippet will only block wget, if wget deigns to identify itself as
> wget by saying so in the user-agent string.
>
>>
>> I copied and pasted it to the .htaccess under /public_html. Still, I
>> am able to use this command to fetch my site:
>>
>> wget --wait=20 --limit-rate=20K -r -p -U Mozilla www.my_iste.com
>
> Yup. Wget decided to identify itself as Mozilla in the user-agent
> string. That means you have no way at all of knowing that someone's
> trying to use Wget to download from your site.
>
>> However, if I  tried the same wget with a slight change in the command
>> line (without " -U Mozilla ")
>>
>>  wget --wait=20 --limit-rate=20K -r -p www.my_site.com
>>
>> I get this:
>>
>> --2011-05-25 14:30:36--  http://www.my_site.com/
>> Resolving www.my_site.com... xxx.xx.xxx.xx
>> Connecting to www.my_site.com|xxx.xx.xxx.xx|:80... connected.
>> HTTP request sent, awaiting response... 403 Forbidden
>> 2011-05-25 14:30:37 ERROR 403: Forbidden.
>
> Wget deigned to identify itself as wget this time.
>
>> Now I have three questions:
>
>> 1. Why didn't the code in .htaccess prevent the downloading? Did I
>> miss something?
>
> (See my explanation above.)
>
>> 2. Do we have other tools acting like wget, how can we prevent them
>> all from downing the site content?
>
> There are other tools that act like wget. You can't prevent them *all*
> from downloading, though you could blacklist specific ones the way you
> did with Wget. Of course, they may also decide to change the User-Agent
> string, then you have no way of telling at all.
>
>> 3. If someone is downloading, can we have some log file that can
>> expose the downloader's info?
>
> Your web browser logs will have their IP address, but I doubt you could
> do anything useful with that information. If your user logs in to the
> site, you could try to keep track of that yourself somehow, but that
> could be very complex depending what you're trying to prevent.
>
> ...
>
>
> In other words, the protection you're asking for is basically impossible
> against a determined downloader.
>
> --Ken
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
>
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Sunset Systems
Who graciously hosts our website & mailing lists!