#22035 – Unknown robot (identified by empty user agent string)

Posted in ‘Akeeba Admin Tools for Joomla!’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Monday, 09 February 2015 01:42 CST
 Hi,

It seems that some one is after one of the site i manage. I started to get huge amount of resources eaten up by a bot. As per awestats coming with the hosting, it shows as:
Unknown robot (identified by empty user agent string)
I can see the one for google, yahoo or alexa which is fine. I also see 2 as follow:
Unknown robot (identified by 'bot*')
and
Unknown robot (identified by 'crawl')
But these last two are not crawling too much.

I suppose i can just add 'crawl' and 'bot*' to the list of bots to block in the htaccess maker. But how can i add identified by empty user agent?

I started blocking the IP it was from, but obviously they switch IPs. I then started blocking an IP range. But they changed ISP.
I can not continue blocking everything or i will loose too much "real" traffic since they are coming from the same country as my target audience.

Is there some tricks or tips to harden bot crawling the site? And how can i block bots identified by empty user agent?

Thank you in advance.
Custom Fields
Which documentation pages did you read? most of the suggested results
Which troubleshooter articles did you read? None
Have you searched the tickets before posting? Yes
Joomla! version (in x.y.z format) 3.2.2
PHP version (in x.y.z format) 5.5.17
MySQL/database version 5.5.32-31.0
Host (who is hosting your site, not your domain) SiteGround
Admin Tools version (x.y.z format) 3.4.3
user46704
Monday, 09 February 2015 04:26 CST
Hello Julien,

you can add these custom rules to the Htaccess Maker:
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule ^(.*)$ - [F,L]

In that way you'll block empty user agents.


Davide Tampellini

Developer and Support Staff



Italian: native

English: good



Please keep in mind my timezone and cultural differences when reading my replies. Thank you!



tampe125
Monday, 09 February 2015 05:31 CST
Thanks Davide,

I will try and monitor to see if that is working for me.

Cheers
user46704
Monday, 09 February 2015 05:47 CST
Hi Davide,

i check with the tech support at SiteGround and he tells me that RewriteCond will block all bots to index the website including google, yahoo etc.. which means it will kill my SEO.

Is that correct?
user46704
Monday, 09 February 2015 06:01 CST
It depends if Google crawlers are using an empty User Agent or not.
If it's empty, they will be blocked, otherwise they can index your site as usual.


Davide Tampellini

Developer and Support Staff



Italian: native

English: good



Please keep in mind my timezone and cultural differences when reading my replies. Thank you!



tampe125
Monday, 09 February 2015 21:15 CST
But it would be foolish to think that google (or any "good" search engine) would crawl site using an empty User Agent?
That would serve them no purpose as they would get blocked by many servers or CDN services such as cloudflare.

My god, all this bad bots stuff is just a pure waste of time and resources for everybody.

What is difficult is that 2 trusted sources do not seem to agree on something so technical which should be so black or white.

SG is telling me it will block everything regardless the user agent, and you are telling me it will block ONLY empty user agent.
And what is at stake is the a website being indexed and driving business or a site that no one can find on the internet?

G! i think i need another coffee!
user46704
Tuesday, 10 February 2015 02:04 CST
No, the rule I provided before will only block visitors with empty user agent.
For your information, Google sets his own user agent.


Davide Tampellini

Developer and Support Staff



Italian: native

English: good



Please keep in mind my timezone and cultural differences when reading my replies. Thank you!



tampe125
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.

Support Information

Working hours: Typically we work Monday to Friday, 9am to 7pm Cyprus timezone (EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets, but we cannot respond to them, outside of our working hours.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!

Cookies Notification - Action required

This website uses cookies to provide user authentication and improve your user experience. Please indicate whether you consent to our site placing these cookies on your device. You can change your preference later, from the controls which will be made available to you at the bottom of every page of our site.