Как заблокировать вредоносных ботов.

11.03.2020
Сложность
14 мин.
1345

Заблокировать вредоносных ботов можно по средствам .htaccess файла и в конифугурации nginx. Для пользователей хостинга, чаще всего подходит именно .htaccess, т.к. к конфигурации nginx сервера не всегда есть доступ.


  1. Зайдите в свой аккаунт через FTP или SSH.
  2. Создайте файл .htaccess в корневой директории сайта, если такого файла нет.
  3. Внесите в начале файла блокировки (они представлены ниже), после чего файл сохраните.
  4. Возьмите любого заблокированного бота и проверьте curl доступность сайта:
    curl -I -L --user-agent "SemrushBot" http://your_site.ru/
    Если блокировка работает, то ответ будет "HTTP/1.1 403 Forbidden".

Блокировки для внесения в файл

RewriteEngine on
# Начало блокировкам ботов
RewriteCond %{HTTP_USER_AGENT} ^.*(AhrefsBot|SemrushBot|Имя_вашего_бота).*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Mozilla.*Indy" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Mozilla.*NEWT" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^SeaMonkey$" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Acunetix" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^binlar" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^BlackWidow" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Bolt 0" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^BOT for JCE" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Bot mailto\:craftbot@yahoo\.com" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^casper" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^checkprivacy" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^ChinaClaw" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^clshttp" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^cmsworldmap" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Custo" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Default Browser 0" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^diavol" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^DIIbot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^DISCo" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^dotbot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Download Demon" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^eCatch" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^EirGrabber" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^EmailCollector" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^EmailSiphon" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^EmailWolf" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Express WebPictures" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^extract" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^ExtractorPro" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^EyeNetIE" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^feedfinder" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^FHscan" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^FlashGet" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^flicky" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^g00g1e" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^GetRight" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^GetWeb\!" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Go\!Zilla" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Go\-Ahead\-Got\-It" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^grab" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^GrabNet" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Grafula" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^harvest" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^HMView" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Image Stripper" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Image Sucker" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^InterGET" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Internet Ninja" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^InternetSeer\.com" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^jakarta" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Java" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^JetCar" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^JOC Web Spider" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^kanagawa" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^kmccrew" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^larbin" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^LeechFTP" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^libwww" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Mass Downloader" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^microsoft\.url" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^MIDown tool" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^miner" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Mister PiX" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^MSFrontPage" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Navroad" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^NearSite" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Net Vampire" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^NetAnts" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^NetSpider" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^NetZIP" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^nutch" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Octopus" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Offline Explorer" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Offline Navigator" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^PageGrabber" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Papa Foto" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^pavuk" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^pcBrowser" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^PeoplePal" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^planetwork" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^psbot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^purebot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^pycurl" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^RealDownload" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^ReGet" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Rippers 0" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^sitecheck\.internetseer\.com" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^SiteSnagger" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^skygrid" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^SmartDownload" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^sucker" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^SuperBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^SuperHTTP" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Surfbot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^tAkeOut" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Teleport Pro" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Toata dragostea mea pentru diavola" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^turnit" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^vikspider" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^VoidEYE" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Web Image Collector" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebAuto" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebBandit" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebCopier" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebFetch" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebGo IS" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebLeacher" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebReaper" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebSauger" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Website eXtractor" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Website Quester" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebStripper" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebWhacker" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WebZIP" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Widow" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WPScan" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WWW\-Mechanize" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^WWWOFFLE" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Xaldon WebSpider" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^Zeus" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "^zmeu" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "360Spider" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "CazoodleBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "discobot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "EasouSpider" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ecxi" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "GT\:\:WWW" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "HTTP\:\:Lite" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "HTTrack" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ia_archiver" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "id\-search" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "IDBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Indy Library" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "IRLbot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ISC Systems iRc Search 2\.1" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "LinksCrawler" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "LinksManager\.com_bot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "linkwalker" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "lwp\-trivial" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MFC_Tear_Sample" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Missigua Locator" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "MJ12bot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "panscient\.com" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "PECL\:\:HTTP" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "PHPCrawl" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "PleaseCrawl" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "SBIder" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "SearchmetricsBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Snoopy" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Steeler" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "URI\:\:Fetch" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "urllib" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Web Sucker" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "webalta" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "WebCollage" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "Wells Search II" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "WEP Search" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "XoviBot" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "YisouSpider" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "zermelo" [NC,OR]
RewriteCond %{HTTP_USER_AGENT} "ZyBorg" [NC,OR]
# Конец блокировкам ботов
# Начало блокировки по HTTP запросам
RewriteCond %{HTTP_REFERER} "^https?://(?:[^/]+\.)?semalt\.com" [NC,OR]
RewriteCond %{HTTP_REFERER} "^https?://(?:[^/]+\.)?kambasoft\.com" [NC,OR]
RewriteCond %{HTTP_REFERER} "^https?://(?:[^/]+\.)?savetubevideo\.com" [NC]

# Конец блокировки по HTTP запросам
RewriteRule ^.* - [F,L]
  1. Создайте файл с кастомными блокировками:
  2. touch /etc/nginx/vhosts-includes/custom_blocks.conf
  3. Внесите в начале файла блокировки, после чего файл сохраните.
    ### blocking bad bots (case insensitive).
    if ($http_user_agent ~* (YOUR_CUSTOM_BOT_NAME|SemrushBot|AhrefsBot|Riddler|aiHitBot|trovitBot|Detectify|BLEXBot|LinkpadBot|Mozilla.*Indy|Mozilla.*NEWT|SeaMonkey|Acunetix|binlar|BlackWidow|Bolt\s0|BOT\sfor\sJCE|Bot\smailto\:craftbot@yahoo\.com|casper|checkprivacy|ChinaClaw|clshttp|cmsworldmap|Custo|Default\sBrowser\s0|diavol|DIIbot|DISCo|dotbot|Download\sDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|Express\sWebPictures|extract|ExtractorPro|EyeNetIE|feedfinder|FHscan|FlashGet|flicky|g00g1e|GetRight|GetWeb\!|Go\!Zilla|Go\-Ahead\-Got\-It|grab|GrabNet|Grafula|harvest|HMView|Image\sStripper|Image\sSucker|InterGET|Internet\sNinja|InternetSeer\.com|jakarta|Java|JetCar|JOC\sWeb\sSpider|kanagawa|kmccrew|larbin|LeechFTP|libwww|Mass\sDownloader|microsoft\.url|MIDown\stool|miner|Mister\sPiX|MSFrontPage|Navroad|NearSite|Net\sVampire|NetAnts|NetSpider|NetZIP|nutch|Octopus|Offline\sExplorer|Offline\sNavigator|PageGrabber|Papa\sFoto|pavuk|pcBrowser|PeoplePal|planetwork|psbot|purebot|pycurl|RealDownload|ReGet|Rippers\s0|sitecheck\.internetseer\.com|SiteSnagger|skygrid|SmartDownload|sucker|SuperBot|SuperHTTP|Surfbot|tAkeOut|Teleport\sPro|Toata\sdragostea\smea\spentru\sdiavola|turnit|vikspider|VoidEYE|Web\sImage\sCollector|WebAuto|WebBandit|WebCopier|WebFetch|WebGo\sIS|WebLeacher|WebReaper|WebSauger|Website\seXtractor|Website\sQuester|WebStripper|WebWhacker|WebZIP|Widow|WPScan|WWW\-Mechanize|WWWOFFLE|Xaldon\sWebSpider|Zeus|zmeu|360Spider|CazoodleBot|discobot|EasouSpider|ecxi|GT\:\:WWW|heritrix|HTTP\:\:Lite|HTTrack|ia_archiver|id\-search|IDBot|Indy\sLibrary|IRLbot|ISC\sSystems\siRc\sSearch\s2\.1|LinksCrawler|LinksManager\.com_bot|linkwalker|lwp\-trivial|MFC_Tear_Sample|Microsoft\sURL\sControl|Missigua\sLocator|MJ12bot|panscient\.com|PECL\:\:HTTP|PHPCrawl|PleaseCrawl|SBIder|SearchmetricsBot|Snoopy|Steeler|URI\:\:Fetch|urllib|Web\sSucker|webalta|WebCollage|Wells\sSearch\sII|WEP\sSearch|XoviBot|YisouSpider|zermelo|ZyBorg)) {
    return 403;
    }
  4. Проверьте конфигурацию nginx:
    nginx -t
  5. Если ошибок в конфигурации нет, перезагрузите службу nginx:
    service nginx reload
    Если reload не сработал:
    service nginx restart
  6. Возьмите любого заблокированного бота и проверьте curl доступность сайта:
    curl -I -L --user-agent "SemrushBot" http://your_site.ru/
    Если блокировка работает, то ответ будет "HTTP/1.1 403 Forbidden".
Были ли сведения полезными?
6 
Продолжая использовать этот сайт и пользуясь нашими услугами, Вы соглашаетесь с Правилами и условиями веб-сайта и использованием файлов cookie на нашем веб-сайте. Также ознакомьтесь с нашей Политикой конфиденциальности, согласно которой, в заявленной степени, Вы соглашаетесь на обработку Ваших персональных данных.