A Blog about Linux, Open Source and Code! 
Symsys Inform Blog Home

Symsys Ltd Text logo in the banner area


Author:  Gremlette
November 21, 2008



 

 

Laycat, Kyklo, what next?…and even admits is ‘cloaking’ itself

When I was looking through my November website logs, Laycat and Kyclo were of the highest visiting robots above Yahoo and Google. Of course, I googled it to see what on earth it was and sure enough other people were also complaining it was their highest visitor.

It is a relatively small cross-section of web designers and developers that actually look through their records and we’re one of them, the hits from Kyclo and Laycat were too big to ignore. Only a handful of people at the time reported about this particular Robot, some said that they were getting a minimum of 550 hits  eg http://jagf.net/blog/?tag=laycat,

For a short period Laycat.com issued a web crawler notice on their site saying that they were simply gathering information for a new search engine…. and that was good enough for some, since a poster had copy/pasted the robot notice on a forum. The robots are sporadic, keep changing names, hit A LOT and the links to their website did not have any information on multiple occasions they were checked therefore this post was originally written. It looked a bit dodgy.

Now that this post was brought to the attention of Laycat/Kyclo, the very plain robot information page is back online, after being assured by the admin at Laycat that it must have been temporary down-time when I was looking.

There are currently 3 known robots all named differently operating under the same people. (rather odd – and how many more are there?) Kyklo.com, aceleo.com and and laycat.com. Not to tell someone else how to run their operation but couldn’t you simply use 3 different server names at one domain, for example kyklo.laycat.com aceleo.laycat.com and laycat.laycat.com? This might make people slightly less suspiscious of 3 different robots with completely different names linking back to the same place.

http://www.kyklo.com and http://www.aceleo.com all redirect to http://www.laycat.com/, – Don’t expect anything too fancy – it’s just a plain robot information notice blurb - no site, no branding or company information, nor anything further, plus despite being asked for further details on several occassions, they with not oblige and instead want to insist we change our public and might I say rightfully free, opinion of it, without further information, I’m sorry if that’s the way I ran my life I’d be a devout christian who thought science was just the devils way of trying to trick us because I’d be ignoring all evidence and putting my faith in the hands of someone elses words.

The admin at Laycat have been extremely bitter and resentful about their bots being mentioned on here in a skeptical light. Their initial contact was immediately followed by the post being re-titled,  their admin being thanked for the 3 links above and thanked for their Robots text being re-issued online…. I got told I was being ‘Nasty’ !

Without further aggrevation, Laycat admin continued to bombard us with very long comment posts laced with further derogatory comments, calling us ‘undocumented trolls’, using childish tactics of posting word counts of his posts, due to the fact we said the comments length may have been something to do with Askimet Spam canning his comments. Ripping our post and comments apart line by line  (Just like what would normally be considered “a troll” on most forums/blogs) with negatively verbose responses etc. We were painted as simpletons, writing rubbish to just drive people through our affiliate links (hardly advert city here with a maximum 4 links placed for layout aid vs 30+ links to our own site and services), we just wont stand for that, tell us we’re wrong by all means, but provide proof of it, don’t just bombard the comments with links and excuses.

Laycat (also aceleo and Kyklo…. even though I was told that it was kyclo not kyklo by Laycat even though the Kyklo website is kyklo.com), they have an absolutely stinking attitude to say the least. Given Laycats response, the dawn of a new search engine being the reason for these robots has become highly unlikely in our minds, and if it has that sort of childish mentality at the head of it, then frankly we don’t need it. Considering the type of responses that were given, we find it is far more likely this new search engine will be the next “Web Ripper” and not a search engine at all. Due to the nature of our site in comparison to the nature of his comments, we have been forced to remove ALL comments and re-write this post appropriately and close further comments, if admin@laycat.com would like to further comment on this post, we invite him to use our contact form http://www.symsysit.com/core/Symsys-Contact-Details.php to do so, beware though if you fill your email to us with lots of links, a massive character count, swear words etc, then our Web Spam filter will probably pick it up as well.

As repeated in all of Laycats comments, it is highly recommended, that their bots be blocked in the form of IP banning and robots.txt block lists if you think they may be maliscious – I am only repeating the advice given by Laycat admin here and just to please him, since he thinks we have such a controlling effect on our readers, I must molly-coddle you all by saying, “We encourage you to make up your own mind and this post is purely for informational purposes, we are not the definitive voice on the internet” – Laycat do you feel re-assured that we still don’t like your bots but have told our readers to make up their own minds? Readers do you feel re-assured that you’re not being “ordered” to believe what we tell you to?

Laycat, Kyklo, Aceleo maliscious?…..I say HELL YES … well, the admin certainly is!

Paranoid?….. YES  :)  lol, maybe just bored. At the end of the day, it is your site, you should be able to control what drive though taking your information to some extent, be it on the Internet or not. I’m now off to put on my tin hat, install barbed wire fencing around my house and instruct my datacenter to restrict all traffic to and from my server, just because I feel like it!

Our crawler has visited your web site?

Do you have any questions?
1) Why is your robot visiting my web site? Laycat crawler is a web documents indexing robot.

5) What is the search engine this web crawler is working for? The search engine this crawler is working for is currently in an early
development stage, and will go public as soon as we achieve the beta stage.

His job is to retrieve millions of pages from the world wide web
in order to feed a search engine. 

6) Why is your crawler using an anonymous user agent? 

Many documents found on the internet are generated dynamicaly, and may present
different content to crawlers than they would to regular visitors by examining
the user agent string. Examples of pages adding links to gambling, adult
content web sites when a crawler is visiting are plethora.

This practice is called cloaking, and the goal is to fool crawlers and
search engines in order to make them index some different content
than a normal person would actually see.

This is what we might call search engine spamming.

To avoid that kind of practice, the crawler uses an anonymous user agent,
and it will remain that way until we have enough data to do it the best way.
At this point we will of course consider using a dedicated user agent.

Most antivirus software use the same method as we do when scanning web pages.

There is no real need for a webmaster to detect a crawler using the
user agent string since this crawler respects the Robot Exclusion Standard,
and webmasters can decide to allow him to visit or not using this standard.

Please also note that the crawler will never fetch more than one page every
two seconds on a same IP address, thus never eating server's resources.=4

Filed under: Robots + Htaccess ... Comments (0)

Tags: , , ,
  

 





Author:  Gremlette
October 18, 2008



 

 

Robots txt bot list update Oct 08

It has come to the time to do another website clean up. This generally involves sitewide link and accessibility checks, making sure the sitemap is correct etc. It can be a rather sporatic event that I turn my mind to Robots.txt and htaccess. In order to keep things relatively simple, this page is a bad bot list recompiled from old and new records for Robots.TXT only.

PLEASE NOTE: that this will not prevent bad bots completely by any means! There are many bots that ignore the robots.txt altogether so the fact that this text file does include older bots is actually a good thing since they are the more likely to still be actually reading the file.

I had the thought again today, that sometimes it may not be the best idea to shut out so many of these bots. I know we don’t want the site ripped off (not that robots.txt is going to make a difference to that). I know we do’nt want spam marketing emails galore. I just have to point out that some of the higher google ranked websites are turning out to not be the ones that enforce exhaustive security on robots and htaccess. There is Cautious, Meticulous and darned right Anal and Overboard.

If you look through this list, you will see that when a robot version is discovered, another is released and can be as simple as a name change, no matter how small to break though the disallow list. Always remember, Nothing is infallible, and if someone wants IN, they WILL get in – end. Saying that, it is still a good idea to list many of these robots. Some are above board marketing companies with ethics that will honour the txt file (well we like to think so). A lot of these are part of commercial and open software releases that anyone can use – so if you don’t want thier program to work on your site, then you can tell it so.

Best effort has been made to remove any duplicate entires and get it into a general albhabetic order. I never use anyone elses code without checking it over, checking syntax etc so neither should you. 

How do you find the latest bots? – with a watchful eye on your raw access logs. Many robots / crawlers often include links to thier source so that you can find out who it is and what the robots is doing so that you can decide for yourself wether to add it to your disallow list or not. If you are the sort that wants to ‘look’ like your web hits are through the roof on say ‘webalizer’ then you may aswell allow the lot…. I prefer to only monitor real humans and the crawlers I DO want, not lies and statistics.

DON’T FORGET:

Disallow ANYTHING including Google from your Cgi-bin, private, secure etc folders

Example:

User-agent: *
Disallow: /cgi-bin/
 

The ‘Google Hacking Database’ has become a quite popular pastime for many.
Example, ‘secret’ folders are easily reached by searching for intitle:index.of.secret in google.

It is all the same for ‘secure’, ‘cgi-bin’, /tmp and so on. Just go see for yourself

 

There is a handy little database which tells you more about some of the robots and what they do by name on robotstxt.org

 

 

User-agent: 216.34.209.23
Disallow: /
User-agent: aipbot
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: Alexibot
Disallow: /
User-agent: Aqua_Products
Disallow: /
User-agent: asterias
Disallow: /

User-agent: b2w/0.1
Disallow: /
User-agent: BackDoorBot
Disallow: /
User-agent: BackDoorBot/1.0
Disallow: /
User-agent: Black.Hole
Disallow: /
User-agent: BlackWidow
Disallow: /
User-agent: BlowFish
Disallow: /
User-agent: BlowFish/1.0
Disallow: /
User-agent: Bookmark search tool
Disallow: /
User-agent: Bot mailto:craftbot@yahoo.com
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: BotRightHere
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: Bullseye
Disallow: /
User-agent: Bullseye/1.0
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: b2w/0.1
Disallow: /
User-agent: becomebot
Disallow: /

User-agent: Cegbfeieh
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: CherryPickerElite/1.0
Disallow: /
User-agent: CherryPickerSE/1.0
Disallow: /
User-agent: ChinaClaw
Disallow: /
User-agent: Copernic
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /
User-agent: Custo
Disallow: /
User-agent: cosmos
Disallow: /

User-agent: DISCo
Disallow: /
User-agent: DISCo Pump 3.0
Disallow: /
User-agent: DISCo Pump 3.2
Disallow: /
User-agent: DISCoFinder
Disallow: /
User-agent: DittoSpyder
Disallow: /
User-agent: Download Demon
Disallow: /
User-agent: Download Demon/3.2.0.8
Disallow: /
User-agent: Download Demon/3.5.0.11
Disallow: /
User-agent: dumbot
Disallow: /

User-agent: eCatch
Disallow: /
User-agent: eCatch/3.0
Disallow: /
User-agent: EirGrabber
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: EroCrawler
Disallow: /
User-agent: Express WebPictures
Disallow: /
User-agent: Express WebPictures (www.express-soft.com)
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: EyeNetIE
Disallow: /
User-agent: Enterprise_Search
Disallow: /
User-agent: Enterprise_Search/1.0
Disallow: /
User-agent: es
Disallow: /

User-agent: FairAd Client
Disallow: /
User-agent: Flaming AttackBot
Disallow: /
User-agent: FlashGet
Disallow: /
User-agent: FlashGet WebWasher 3.2
Disallow: /
User-agent: Foobot
Disallow: /
User-agent: FrontPage
Disallow: /
User-agent: FrontPage [NC,OR]
Disallow: /
User-agent: Fasterfox
Disallow: /

User-agent: Gaisbot
Disallow: /
User-agent: GetRight
Disallow: /
User-agent: GetRight/2.11
Disallow: /
User-agent: GetRight/3.1
Disallow: /
User-agent: GetRight/3.2
Disallow: /
User-agent: GetRight/3.3
Disallow: /
User-agent: GetRight/3.3.3
Disallow: /
User-agent: GetRight/3.3.4
Disallow: /
User-agent: GetRight/4.0.0
Disallow: /
User-agent: GetRight/4.1.0
Disallow: /
User-agent: GetRight/4.1.1
Disallow: /
User-agent: GetRight/4.1.2
Disallow: /
User-agent: GetRight/4.2
Disallow: /
User-agent: GetRight/4.2b (Portuguxeas)
Disallow: /
User-agent: GetRight/4.2c
Disallow: /
User-agent: GetRight/4.3
Disallow: /
User-agent: GetRight/4.5
Disallow: /
User-agent: GetRight/4.5a
Disallow: /
User-agent: GetRight/4.5b
Disallow: /
User-agent: GetRight/4.5b1
Disallow: /
User-agent: GetRight/4.5b2
Disallow: /
User-agent: GetRight/4.5b3
Disallow: /
User-agent: GetRight/4.5b6
Disallow: /
User-agent: GetRight/4.5b7
Disallow: /
User-agent: GetRight/4.5c
Disallow: /
User-agent: GetRight/4.5d
Disallow: /
User-agent: GetRight/4.5e
Disallow: /
User-agent: GetRight/5.0beta1
Disallow: /
User-agent: GetRight/5.0beta2
Disallow: /
User-agent: GetWeb!
Disallow: /
User-agent: Go!Zilla
Disallow: /
User-agent: Go!Zilla (www.gozilla.com)
Disallow: /
User-agent: Go!Zilla 3.3 (www.gozilla.com)
Disallow: /
User-agent: Go!Zilla 3.5 (www.gozilla.com)
Disallow: /
User-agent: Go-Ahead-Got-It
Disallow: /
User-agent: GrabNet
Disallow: /
User-agent: Grafula
Disallow: /
User-agent: grub
Disallow: /
User-agent: grub-client
Disallow: /

User-agent: HMView
Disallow: /
User-agent: HTTrack
Disallow: /
User-agent: HTTrack 3.0
Disallow: /
User-agent: HTTrack [NC,OR]
Disallow: /
User-agent: Harvest
Disallow: /
User-agent: Harvest/1.5
Disallow: /
User-agent: hloader
Disallow: /
User-agent: httplib
Disallow: /
User-agent: humanlinks
Disallow: /

User-agent: ia_archiver
Disallow: /
User-agent: ia_archiver/1.6
Disallow: /
User-agent: IconSurf
Disallow: /
User-agent: Image Stripper
Disallow: /
User-agent: ImageWalker/2.0
Disallow: /
User-agent: Image Sucker
Disallow: /
User-agent: Indy Library
Disallow: /
User-agent: Indy Library [NC,OR]
Disallow: /
User-agent: InfoNaviRobot
Disallow: /
User-agent: InterGET
Disallow: /
User-agent: Internet Ninja
Disallow: /
User-agent: InternetSeer.com
Disallow: /
User-agent: Internet Ninja 4.0
Disallow: /
User-agent: Internet Ninja 5.0
Disallow: /
User-agent: Internet Ninja 6.0
Disallow: /
User-agent: Iron33/1.0.2
Disallow: /

User-agent: JOC Web Spider
Disallow: /
User-agent: JennyBot
Disallow: /
User-agent: JetCar
Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Kenjin.Spider
Disallow: /
User-agent: Keyword Density/0.9
Disallow: /
User-agent: Keyword.Density
Disallow: /

User-agent: LNSpiderguy
Disallow: /
User-agent: LeechFTP
Disallow: /
User-agent: LexiBot
Disallow: /
User-agent: LinkScan/8.1a Unix
Disallow: /
User-agent: LinkWalker
Disallow: /
User-agent: LinkWalker/2.0
Disallow: /
User-agent: LinkextractorPro
Disallow: /
User-agent: larbin
Disallow: /
User-agent: larbin (samualt9@bigfoot.com)
Disallow: /
User-agent: larbin samualt9@bigfoot.com
Disallow: /
User-agent: larbin_2.6.2 (kabura@sushi.com)
Disallow: /
User-agent: larbin_2.6.2 (larbin2.6.2@unspecified.mail)
Disallow: /
User-agent: larbin_2.6.2 (listonATccDOTgatechDOTedu)
Disallow: /
User-agent: larbin_2.6.2 (vitalbox1@hotmail.com)
Disallow: /
User-agent: larbin_2.6.2 kabura@sushi.com
Disallow: /
User-agent: larbin_2.6.2 larbin2.6.2@unspecified.mail
Disallow: /
User-agent: larbin_2.6.2 larbin@correa.org
Disallow: /
User-agent: larbin_2.6.2 listonATccDOTgatechDOTedu
Disallow: /
User-agent: larbin_2.6.2 vitalbox1@hotmail.com
Disallow: /
User-agent: libWeb/clsHTTP
Disallow: /
User-agent: lwp-trivial
Disallow: /
User-agent: looksmart
Disallow: /
User-agent: lwp-trivial/1.34
Disallow: /

User-agent: MJ12bot
Disallow: /
User-agent: MIDown tool
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: MIIxpc/4.2
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: Mass Downloader
Disallow: /
User-agent: Mass Downloader/2.2
Disallow: /
User-agent: Mata Hari
Disallow: /
User-agent: Mata.Hari
Disallow: /
User-agent: Microsoft URL Control
Disallow: /
User-agent: Microsoft URL Control - 5.01.4511
Disallow: /
User-agent: Microsoft URL Control - 6.00.8169
Disallow: /
User-agent: Microsoft.URL
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: Mister PiX version.dll
Disallow: /
User-agent: Mister Pix II 2.01
Disallow: /
User-agent: Mister Pix II 2.02a
Disallow: /
User-agent: Mister.PiX
Disallow: /
User-agent: moget
Disallow: /
User-agent: moget/2.1
Disallow: /

User-agent: naver
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: NPBot
Disallow: /
User-agent: Navroad
Disallow: /
User-agent: NearSite
Disallow: /
User-agent: Net Vampire
Disallow: /
User-agent: Net Vampire/3.0
Disallow: /
User-agent: NetAnts
Disallow: /
User-agent: NetAnts/1.10
Disallow: /
User-agent: NetAnts/1.23
Disallow: /
User-agent: NetAnts/1.24
Disallow: /
User-agent: NetAnts/1.25
Disallow: /
User-agent: NetMechanic
Disallow: /
User-agent: NetSpider
Disallow: /
User-agent: NetZIP
Disallow: /
User-agent: NetZip Downloader 1.0 Win32(Nov 12 1998)
Disallow: /
User-agent: NetZip-Downloader/1.0.62 (Win32; Dec 7 1998)
Disallow: /
User-agent: NetZippy+(http://www.innerprise.net/usp-spider.asp)
Disallow: /

User-agent: Octopus
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Offline Explorer/1.2
Disallow: /
User-agent: Offline Explorer/1.4
Disallow: /
User-agent: Offline Explorer/1.6
Disallow: /
User-agent: Offline Explorer/1.7
Disallow: /
User-agent: Offline Explorer/1.9
Disallow: /
User-agent: Offline Explorer/2.0
Disallow: /
User-agent: Offline Explorer/2.1
Disallow: /
User-agent: Offline Explorer/2.3
Disallow: /
User-agent: Offline Explorer/2.4
Disallow: /
User-agent: Offline Explorer/2.5
Disallow: /
User-agent: Offline Navigator
Disallow: /
User-agent: Offline.Explorer
Disallow: /
User-agent: Openbot
Disallow: /
User-agent: Openfind
Disallow: /
User-agent: Openfind data gatherer
Disallow: /
User-agent: Oracle Ultra Search
Disallow: /

User-agent: pavuk
Disallow: /
User-agent: PerMan
Disallow: /
User-agent: pcBrowser
Disallow: /
User-agent: psbot
Disallow: /
User-agent: PageGrabber
Disallow: /
User-agent: Papa Foto
Disallow: /
User-agent: PerMan
Disallow: /
User-agent: ProPowerBot/2.14
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: Python-urllib
Disallow: /
User-agent: QueryN Metasearch
Disallow: /
User-agent: QueryN.Metasearch
Disallow: /

User-agent: RMA
Disallow: /
User-agent: Radiation Retriever 1.1
Disallow: /
User-agent: ReGet
Disallow: /
User-agent: RealDownload
Disallow: /
User-agent: RealDownload/4.0.0.40
Disallow: /
User-agent: RealDownload/4.0.0.41
Disallow: /
User-agent: RealDownload/4.0.0.42
Disallow: /
User-agent: RepoMonkey
Disallow: /
User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: SBIder
Disallow: /
User-agent: SBIder/SBIder-0.8.2-dev
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: SlySearch
Disallow: /
User-agent: SmartDownload
Disallow: /
User-agent: SmartDownload/1.2.76 (Win32; Apr 1 1999)
Disallow: /
User-agent: SmartDownload/1.2.77 (Win32; Aug 17 1999)
Disallow: /
User-agent: SmartDownload/1.2.77 (Win32; Feb 1 2000)
Disallow: /
User-agent: SmartDownload/1.2.77 (Win32; Jun 19 2001)
Disallow: /
User-agent: SpankBot
Disallow: /
User-agent: sootle
Disallow: /
User-agent: Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux
Disallow: /
User-agent: SuperBot
Disallow: /
User-agent: SuperBot/3.0 (Win32)
Disallow: /
User-agent: SuperBot/3.1 (Win32)
Disallow: /
User-agent: SuperHTTP
Disallow: /
User-agent: SuperHTTP/1.0
Disallow: /
User-agent: Surfbot
Disallow: /
User-agent: Szukacz/1.4
Disallow: /
User-agent: searchpreview
Disallow: /
User-agent: spanner
Disallow: /
User-agent: SurveyBot
Disallow: /
User-agent: suzuran
Disallow: /

User-agent: tAkeOut
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: Teleport Pro/1.29
Disallow: /
User-agent: Teleport Pro/1.29.1590
Disallow: /
User-agent: Teleport Pro/1.29.1634
Disallow: /
User-agent: Teleport Pro/1.29.1718
Disallow: /
User-agent: Teleport Pro/1.29.1820
Disallow: /
User-agent: Teleport Pro/1.29.1847
Disallow: /
User-agent: Telesoft
Disallow: /
User-agent: The Intraformant
Disallow: /
User-agent: The.Intraformant
Disallow: /
User-agent: TheNomad
Disallow: /
User-agent: TightTwatBot
Disallow: /
User-agent: toCrawl/UrlDispatcher
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: True_Robot/1.0
Disallow: /
User-agent: turingos
Disallow: /
User-agent: TurnitinBot
Disallow: /
User-agent: TurnitinBot/1.5
Disallow: /
User-agent: Titan
Disallow: /
User-agent: URL Control
Disallow: /
User-agent: URL_Spider_Pro
Disallow: /
User-agent: URLy Warning
Disallow: /
User-agent: URLy.Warning
Disallow: /

User-agent: VCI
Disallow: /
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
User-agent: VoidEYE
Disallow: /

User-agent: WWW-Collector-E
Disallow: /
User-agent: WWWOFFLE
Disallow: /
User-agent: Web Image Collector
Disallow: /
User-agent: Web Sucker
Disallow: /
User-agent: Web.Image.Collector
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: WebAuto/3.40 (Win98; I)
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: WebBandit/3.50
Disallow: /
User-agent: WebCapture 2.0
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: WebCopier v.2.2
Disallow: /
User-agent: WebCopier v2.5
Disallow: /
User-agent: WebCopier v2.6
Disallow: /
User-agent: WebCopier v2.7a
Disallow: /
User-agent: WebCopier v2.8
Disallow: /
User-agent: WebCopier v3.0
Disallow: /
User-agent: WebCopier v3.0.1
Disallow: /
User-agent: WebCopier v3.2
Disallow: /
User-agent: WebCopier v3.2a
Disallow: /
User-agent: WebEMailExtrac.*
Disallow: /
User-agent: WebEnhancer
Disallow: /
User-agent: WebFetch
Disallow: /
User-agent: webfetch/2.1.0
Disallow: /
User-agent: WebGo IS
Disallow: /
User-agent: WebLeacher
Disallow: /
User-agent: WebReaper
Disallow: /
User-agent: WebReaper [info@webreaper.net]
Disallow: /
User-agent: WebReaper [webreaper@otway.com]
Disallow: /
User-agent: WebReaper v9.1 - www.otway.com/webreaper
Disallow: /
User-agent: WebReaper v9.7 - www.webreaper.net
Disallow: /
User-agent: WebReaper v9.8 - www.webreaper.net
Disallow: /
User-agent: WebReaper vWebReaper v7.3 - www,otway.com/webreaper
Disallow: /
User-agent: WebSauger
Disallow: /
User-agent: WebSauger 1.20b
Disallow: /
User-agent: WebSauger 1.20j
Disallow: /
User-agent: WebSauger 1.20k
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebStripper/2.03
Disallow: /
User-agent: WebStripper/2.10
Disallow: /
User-agent: WebStripper/2.12
Disallow: /
User-agent: WebStripper/2.13
Disallow: /
User-agent: WebStripper/2.15
Disallow: /
User-agent: WebStripper/2.16
Disallow: /
User-agent: WebStripper/2.19
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: Webster Pro
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: WebWhacker
Disallow: /
User-agent: WebZIP/2.75 (http://www.spidersoft.com)
Disallow: /
User-agent: WebZIP/3.65 (http://www.spidersoft.com)
Disallow: /
User-agent: WebZIP/3.80 (http://www.spidersoft.com)
Disallow: /
User-agent: WebZIP/4.1 (http://www.spidersoft.com)
Disallow: /
User-agent: WebZIP/4.21
Disallow: /
User-agent: WebZIP/4.21 (http://www.spidersoft.com)
Disallow: /
User-agent: WebZIP/5.0
Disallow: /
User-agent: WebZIP/5.0 PR1 (http://www.spidersoft.com)
Disallow: /
User-agent: WebZIP/7.0
Disallow: /
User-agent: WebZip/4.0
Disallow: /
User-agent: wget
Disallow: /
User-agent: Wget/1.5.3
Disallow: /
User-agent: Wget/1.6
Disallow: /
User-agent: Wget/1.5.2
Disallow: /
User-agent: Wget/1.7
Disallow: /
User-agent: Wget/1.8
Disallow: /
User-agent: Wget/1.8.1
Disallow: /
User-agent: Wget/1.8.1+cvs
Disallow: /
User-agent: Wget/1.8.2
Disallow: /
User-agent: Wget/1.9-beta
Disallow: /
User-agent: Widow
Disallow: /
User-agent: WebmasterWorldForumBot
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: Website Quester - www.asona.org
Disallow: /
User-agent: Website Quester - www.esalesbiz.com/extra/
Disallow: /
User-agent: Website eXtractor
Disallow: /
User-agent: Website eXtractor (http://www.asona.org)
Disallow: /
User-agent: WebmasterWorldForumBot
Disallow: /

User-agent: Xaldon WebSpider
Disallow: /
User-agent: Xaldon WebSpider 2.5.b3
Disallow: /
User-agent: Xenu's
Disallow: /
User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Zeus
Disallow: /
User-agent: Zeus 11389 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 11652 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 18018 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 26378 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 30747 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 39206 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 41641 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 44238 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 51070 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 51674 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 51837 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 63567 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 6694 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 71129 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 82016 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 82900 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 84842 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 90872 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 94934 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 95245 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 95351 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus 97371 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus Link Scout
Disallow: /

Now, you may agree that this list is now BEYOND ridiculous, and will not include the probably about another 10 maliscious robots only yesterday.
I think its would be a far better and wiser approach to simply make a robots.txt that bans the lot apart from a safe Good robot list if there is such a thing!


Filed under: Code,Robots + Htaccess ... Comments (0)

Tags: , , , ,
  

 





Enter your email address:

Delivered by FeedBurner