阻止 Twiceler

前几天我的 bluehost 账号总有 CPU 使用超量的问题,今天甚至因此被暂时停掉。察看了下访问记录,发现有个叫 twiceler 的 robot 很奇怪,每秒种访问同一个页面多次(例子附在帖子最后,懂行的人帮我看看)。猜它就是问题的来源,不过并不确定。似乎有人在做一个叫 Cuill 的新搜索引擎,放出这个叫 Twiceler 的机器人。我现在只好阻止它了。在 robots.txt 里加上

User-agent: twiceler
Disallow: /

略微学习一下这个 Cuill,发现还被人认为是 Google 的有力挑战者呢。他们自己声称 Cuill 检索网页的速度比 Google 快,成本低十倍。

……
38.117.64.101 – - [05/Oct/2007:21:59:38 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:39 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:39 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:39 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:40 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:40 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:40 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:41 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:41 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:42 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:42 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:42 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:43 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:43 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
38.117.64.101 – - [05/Oct/2007:21:59:43 -0600] “GET /111/ HTTP/1.0″ 302 0 “http://vonye.com/111″ “Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)”
……

关键词: ,

9 Responses to “阻止 Twiceler”

  1. 紫菜 says:

    真是对不住啊兄弟,我家的小蜘蛛给你带来了那么多麻烦

  2. 你家的蜘蛛再乱跑,我就煮汤喝了。

  3. 野鸭 says:

    我最喜欢吃蜘蛛了。

  4. Paveo says:

    我也发现了,不过我步阻止它 ;)

  5. pmme says:

    有的蜘蛛耍流氓,你用robot.txt是阻止不了它的。

  6. qflash says:

    这个蜘蛛应该是可以用ROBOTS制服的。她说了,每24-48小时读取一次robots.txt文件。

  7. 笑容 says:

    可能他们的蜘蛛那个时候还没学会怎么正确的爬。

  8. chenlb says:

    发觉它不遵守 robots.txt,

    我只允许 Googlebot 等, 其它的不允许, 后台看到还有 twiceler 的 bot

  9. Guest says:

    本以为被它爬段时间,怎么着在 cuil 里能 site 到自己,可是还没有

Leave a Reply