admin管理员组

文章数量:1560458

判断IP是否为搜索引擎蜘蛛或爬虫


主要是通过向DNS服务器发送反向域名解析查询,获取指定ip的相关域名信息来判断是否为相应搜索引擎或爬虫. 
通过 dig 或者 host 工具皆可查询.


Example:


> dig -x 8.8.8.8 +short
google-public-dns-a.google.


>  dig google-public-dns-a.google +short
8.8.8.8




Example:


> host 8.8.8.8
8.8.8.8.in-addr.arpa domain name pointer google-public-dns-a.google.


> host google-public-dns-a.google
google-public-dns-a.google has address 8.8.8.8
google-public-dns-a.google has IPv6 address 2001:4860:4860::8888


常见搜索引擎蜘蛛及官方说明
Googlebot
http://www.google/bot.html


bingbot
http://www.bing/webmaster/help/which-crawlers-does-bing-use-8c184ec0


Baiduspider
http://www.baidu/search/spider.htm


Yahoo!
http://help.yahoo/help/us/ysearch/slurp


360Spider
http://www.so/help/help_3_2.html


YoudaoBot
http://www.youdao/help/webmaster/spider/


sogou spider
http://www.sogou/docs/help/webmasters.htm#07


EasouSpider
http://www.easou/search/spider.html


Applebot
http://www.apple/go/applebot


FacebookBot
https://developers.facebook/docs/sharing/webmasters/crawler




百度robots协议
> curl -i  http://www.baidu/robots.txt
HTTP/1.1 200 OK
Date: Thu, 31 Mar 2016 04:27:29 GMT
Server: Apache
P3P: CP=" OTI DSP COR IVA OUR IND COM "
Set-Cookie: BAIDUID=178CAA8DA6084CFB2B1131C5BC48270B:FG=1; expires=Fri, 31-Mar-17 04:27:29 GMT; max-age=31536000; path=/; domain=.baidu; version=1
Last-Modified: Thu, 25 Dec 2014 04:29:36 GMT
ETag: "91e-50b02db060c00"
Accept-Ranges: bytes
Content-Length: 2334
Vary: Accept-Encoding,User-Agent
Connection: Keep-Alive
Content-Type: text/plain


User-agent: Baiduspider
Disallow: /baidu
Disallow: /s?
Disallow: /ulink?
Disallow: /link?


User-agent: Googlebot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: MSNBot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Baiduspider-image
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: YoudaoBot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou web spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou inst spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou spider2
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou blog
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou News Spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou Orion spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: ChinasoSpider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sosospider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?




User-agent: yisouspider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: EasouSpider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: *
Disallow: /









本文标签: 爬虫蜘蛛搜索引擎ip