How to Stop Search Engine Robots from Creating Magento Session

No Comments

This article covers:

  1. Limit bot access to Magento 1
  2. Stop search engine robots from creating Magento sessions

Audiences of this article:

  • Magento stores who are effected by excessive bot traffic and the session files are growing at an alarming rate.

Limit Bot Access in Robots.txt

Here are a few lines you can add to your existing robots.txt file to limit bot access:

# # Crawl-delay parameter: the number of seconds you want to wait between successful requests to the same server. 
# # Set a crawl rate, if your server's traffic problems. Please note that Google ignore crawl-delay setting in Robots.txt. You can set up this in Google Webmaster tool 
Crawl-delay: 10 
# # Denny Any bot access i.e Moz bot
User-agent: rogerbot
Disallow: /

Alternatively, you can also deny bot access (in case they don’t respect the Robots.txt file), by adding restrictions to your .htaccess file:

# # Take Baidu bot as an example
RewriteCond %{HTTP_USER_AGENT} baiduspider [NC] 
RewriteRule .* - [F]
# # [NC] flag means boroader match for case sensitive
# # [F] flag gives a status code of "403 Forbidden" for the restricted bot

Stop Magento from Creating Sessions for Bot Traffic

First copy the file Varien.php from:

app/code/core/Mage/Core/Model/Session/Abstract/Varien.php

to:

app/code/local/Mage/Core/Model/Session/Abstract/Varien.php

In this file modify session start functions:

public function start($sessionName=null)
    //add this line to stop bot from creating session
    if($this->isBot()){
        return false;
    }
    if (isset($_SESSION) && !$this->getSkipEmptySessionCheck()) {
        return $this;
    }
    ...

Then add the bot validation function to the bottom of the file:

public function isBot()
     {
         $isbot = false;
         $bot_regex = '/BotLink|bingbot|AhrefsBot|ahoy|AlkalineBOT|anthill|appie|arale|araneo|AraybOt|ariadne|arks|ATN_Worldwide|Atomz|bbot|Bjaaland|Ukonline|borg-bot\/0.9|boxseabot|bspider|calif|christcrawler|CMC\/0.01|combine|confuzzledbot|CoolBot|cosmos|Internet Cruiser Robot|cusco|cyberspyder|cydralspider|desertrealm, desert realm|digger|DIIbot|grabber|downloadexpress|DragonBot|dwcp|ecollector|ebiness|elfinbot|esculapio|esther|fastcrawler|FDSE|FELIX IDE|ESI|fido|KIT-Fireball|fouineur|Freecrawl|gammaSpider|gazz|gcreep|golem|googlebot|griffon|Gromit|gulliver|gulper|hambot|havIndex|hotwired|htdig|iajabot|INGRID\/0.1|Informant|InfoSpiders|inspectorwww|irobot|Iron33|JBot|jcrawler|Teoma|Jeeves|jobo|image.kapsi.net|KDD-Explorer|ko_yappo_robot|label-grabber|larbin|legs|Linkidator|linkwalker|Lockon|logo_gif_crawler|marvin|mattie|mediafox|MerzScope|NEC-MeshExplorer|MindCrawler|udmsearch|moget|Motor|msnbot|muncher|muninn|MuscatFerret|MwdSearch|sharp-info-agent|WebMechanic|NetScoop|newscan-online|ObjectsSearch|Occam|Orbsearch\/1.0|packrat|pageboy|ParaSite|patric|pegasus|perlcrawler|phpdig|piltdownman|Pimptrain|pjspider|PlumtreeWebAccessor|PortalBSpider|psbot|Getterrobo-Plus|Raven|RHCS|RixBot|roadrunner|Robbie|robi|RoboCrawl|robofox|Scooter|Search-AU|searchprocess|Senrigan|Shagseeker|sift|SimBot|Site Valet|skymob|SLCrawler\/2.0|slurp|ESI|snooper|solbot|speedy|spider_monkey|SpiderBot\/1.0|spiderline|nil|suke|http:\/\/www.sygol.com|tach_bw|TechBOT|templeton|titin|topiclink|UdmSearch|urlck|Valkyrie libwww-perl|verticrawl|Victoria|void-bot|Voyager|VWbot_K|crawlpaper|wapspider|WebBandit\/1.0|webcatcher|T-H-U-N-D-E-R-S-T-O-N-E|WebMoose|webquest|webreaper|webs|webspider|WebWalker|wget|winona|whowhere|wlm|WOLP|WWWC|none|XGET|Nederland.zoek|AISearchBot|woriobot|NetSeer|Nutch|YandexBot|YandexMobileBot|SemrushBot|FatBot|MJ12bot|DotBot|AddThis|baiduspider|m2e/i';
         $userAgent = empty($_SERVER['HTTP_USER_AGENT']) ? FALSE : $_SERVER['HTTP_USER_AGENT'];
         $isBot = !$userAgent || preg_match($bot_regex, $userAgent);
     return $isBot; }

Now clear your Magento Cache and Session folder and check if your excessive Magento bot sessions has stopped. You can add to the regex match for any new bots identified on your server, by either looking at the access log or using netstat command.

To many, a business is a lifetime commitment. It's easy to start one yet difficult to make it successful. Attitude, skills, experiences and dedication help hone the craft along the way, but it's often the great vision and resilience to remain focused wins the game. Read more about me here

More from Our Blog

See all posts