Support

Admin Tools

#26149 About bot list suggestion

Posted in ‘Admin Tools for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
n/a
PHP version
n/a
Admin Tools version
n/a

Latest post by BigStef on Monday, 19 September 2016 13:40 CDT

BigStef
 Hi, I had this last weeks problems with bots eating ressources on my server. SiteGround suggested me to add a rule on my htaccess. Here's it :

# Bad Bots Guard fourni par SiteGround rajouté par ticket 17-09-2016

RewriteCond %{HTTP_USER_AGENT} ^(.*)msnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)BLEXBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)SolomonoBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)bingbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Yeti [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Mail.Ru [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)Ezooms [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)XoviBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.*)exabot [NC]
RewriteRule .* - [F]

# Bad Bots Guard fourni par SiteGround par chat ensuite

RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} (bot|crawl|robot)
RewriteCond %{HTTP_USER_AGENT} !(bing|Google|msn|MSR|Twitter|Yandex) [NC]
RewriteRule ^/?.*$ "http\:\/\/127\.0\.0\.1" [R,L]

I just wanted to communicate it to you, maybe it could be useful for the bot list in the htaccess function of AdminTool ?

BTW 1 : The rules SiteGround gave me is written like :
RewriteCond %{HTTP_USER_AGENT} ^(.*)Yandex [NC]

Yours are written like :
SetEnvIf user-agent "Yandex" stayout=1

Wich one is the more efficient ? In StackExchange, they just talk about ressources intensive module for rewriteCond. If then, it would be better for me to add their bot just in your list in AT backend ? Do you confirm ?

BTW 2 : When I upgrade admintool with your last version, the list of bot could be have been changed. Have I to re-write a new htaccess on each upgrade of AT ?

Stephan Herby PAO Production New Caledonia - Canada - France

nicholas
Akeeba Staff
Manager
You don't need to add these custom directives in your .htaccess file since you are already using the .htaccess Maker. Just enable the "Block access from specific user agents" feature. It does exactly what SiteGround was trying to help you with.

Wich one is the more efficient ? In StackExchange, they just talk about ressources intensive module for rewriteCond. If then, it would be better for me to add their bot just in your list in AT backend ? Do you confirm ?

Our method is slightly more efficient. Our method uses substring matches instead of regular expressions. Substring matches are very efficient. On the other hand regular expressions need to be parsed and subsequently evaluated by the regular expression engine which is not as performant. That said, the difference is less than 1 msec per request, practically insignificant.

When I upgrade admintool with your last version, the list of bot could be have been changed. Have I to re-write a new htaccess on each upgrade of AT ?

If the changelog has items that talk about the .htaccess Maker you need to regenerate your .htaccess. The last time we had an update like that was in Admin Tools 4.0.0.b1, the beta version preceding the 4.0.0 release.

Nicholas K. Dionysopoulos

Lead Developer and Director

πŸ‡¬πŸ‡·Greek: native πŸ‡¬πŸ‡§English: excellent πŸ‡«πŸ‡·French: basic β€’ πŸ• My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

BigStef
Hi Nicholas, and thank you very much for yours "lights" on this point.
So what I understand, is I can just add "Sitegrounds's bots list in the htaccess maker of AT, and it will be alright ? (some of them are not, and it is true that I noticed less resssources usage since I add their rules).

Stephan Herby PAO Production New Caledonia - Canada - France

nicholas
Akeeba Staff
Manager
Yes. However I recommend against adding bingbot and Yandex in that list. They are two search engines, Microsoft Bing (default search engine on Windows) and Yandex (the biggest search engine on the Russian speaking Internet) respectively.

Likewise the last four lines of code you pasted are really bad. They basically tell all search engines on the planet to not crawl your site. Worse, they redirect them to invalid URLs which makes search engines consider your site a spam source. I wouldn't recommend that you should inflict this kind of carnage on any site.

Nicholas K. Dionysopoulos

Lead Developer and Director

πŸ‡¬πŸ‡·Greek: native πŸ‡¬πŸ‡§English: excellent πŸ‡«πŸ‡·French: basic β€’ πŸ• My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

BigStef
Thanks for all this informations.
they're gonna be useful :)

Stephan Herby PAO Production New Caledonia - Canada - France

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!