Support

Admin Tools

#22035 Unknown robot (identified by empty user agent string)

Posted in ‘Admin Tools for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
n/a
PHP version
n/a
Admin Tools version
n/a

Latest post by tampe125 on Tuesday, 10 February 2015 02:04 CST

user46704
 Hi,

It seems that some one is after one of the site i manage. I started to get huge amount of resources eaten up by a bot. As per awestats coming with the hosting, it shows as:
Unknown robot (identified by empty user agent string)
I can see the one for google, yahoo or alexa which is fine. I also see 2 as follow:
Unknown robot (identified by 'bot*')
and
Unknown robot (identified by 'crawl')
But these last two are not crawling too much.

I suppose i can just add 'crawl' and 'bot*' to the list of bots to block in the htaccess maker. But how can i add identified by empty user agent?

I started blocking the IP it was from, but obviously they switch IPs. I then started blocking an IP range. But they changed ISP.
I can not continue blocking everything or i will loose too much "real" traffic since they are coming from the same country as my target audience.

Is there some tricks or tips to harden bot crawling the site? And how can i block bots identified by empty user agent?

Thank you in advance.

tampe125
Akeeba Staff
Hello Julien,

you can add these custom rules to the Htaccess Maker:
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule ^(.*)$ - [F,L]

In that way you'll block empty user agents.

Davide Tampellini

Developer and Support Staff

๐Ÿ‡ฎ๐Ÿ‡นItalian: native ๐Ÿ‡ฌ๐Ÿ‡งEnglish: good โ€ข ๐Ÿ• My time zone is Europe / Rome (UTC +1)
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

user46704
Thanks Davide,

I will try and monitor to see if that is working for me.

Cheers

user46704
Hi Davide,

i check with the tech support at SiteGround and he tells me that RewriteCond will block all bots to index the website including google, yahoo etc.. which means it will kill my SEO.

Is that correct?

tampe125
Akeeba Staff
It depends if Google crawlers are using an empty User Agent or not.
If it's empty, they will be blocked, otherwise they can index your site as usual.

Davide Tampellini

Developer and Support Staff

๐Ÿ‡ฎ๐Ÿ‡นItalian: native ๐Ÿ‡ฌ๐Ÿ‡งEnglish: good โ€ข ๐Ÿ• My time zone is Europe / Rome (UTC +1)
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

user46704
But it would be foolish to think that google (or any "good" search engine) would crawl site using an empty User Agent?
That would serve them no purpose as they would get blocked by many servers or CDN services such as cloudflare.

My god, all this bad bots stuff is just a pure waste of time and resources for everybody.

What is difficult is that 2 trusted sources do not seem to agree on something so technical which should be so black or white.

SG is telling me it will block everything regardless the user agent, and you are telling me it will block ONLY empty user agent.
And what is at stake is the a website being indexed and driving business or a site that no one can find on the internet?

G! i think i need another coffee!

tampe125
Akeeba Staff
No, the rule I provided before will only block visitors with empty user agent.
For your information, Google sets his own user agent.

Davide Tampellini

Developer and Support Staff

๐Ÿ‡ฎ๐Ÿ‡นItalian: native ๐Ÿ‡ฌ๐Ÿ‡งEnglish: good โ€ข ๐Ÿ• My time zone is Europe / Rome (UTC +1)
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!