admin管理员组文章数量:1619292
1、结果展示
将每一个英雄保存一个文件夹下,把他所有的皮肤保存在他对应的文件夹下(自动生成的你运行爬虫就好了)
2、代码解释
2.1用到第三方的模块
有些自带,有些需要你自己安装,pip install 模块名 就好了,如果有问题可以看我的第三方库导入大全那篇文章,有详细解释
import requests # 请求数据
import os # 操作系统模块,用于创建文件夹
import jsonpath # 用于提取json类型的数据
import re # 正则表达式模块,用于获取皮肤名称
import time # 时间模块,防止爬的太快被封id
import random # 随机数模块,配合time使用
2.2请求头以及主页面js地址
user-agent:故名思意,用户代理,你设置了这个相当于把爬虫程序伪装成浏览器,如果不设置,服务器就会发现你是爬虫,这是最基本的反爬手段之一
hero_list_url:这是通过分析lol官网页面从中提取hero_id的url,如果想学页面分析可以留言我在写一篇分析页面的文章,这个主要教如何爬取
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36'
}
hero_list_url = 'https://game.gtimg/images/lol/act/img/js/heroList/hero_list.js'
2.3获得详情页面的函数
写一个获得hero_id的函数找到每一个英雄的详情url为下载做准备,这里我用了列表推导式作为返回值将所有英雄的详情url返回出来方便下载函数调用
def get_id(url):
response = requests.get(url, headers=headers).json()
hero_Id = jsonpath.jsonpath(response, '$..heroId')
time.sleep(random.randint(1, 3))
base_url = 'http://game.gtimg/images/lol/act/img/js/hero/{}.js'
return [base_url.format(every_id) for every_id in hero_Id]
2.4定义函数一个提取及下载数据
我直接再代码中进行解释
def get_skin(li1):
for url in li1: # 遍历列表推导式
response = requests.get(url, headers=headers)
result = response.json()['skins'] # 得到关于皮肤的所有信息
skin_name = [] # 设置空列表用来存储提取到的皮肤名
skin_url = [] # 设置空列表用来存储提取到的皮肤下载地址
time.sleep(random.randint(1, 3)) # 随机休眠1到3秒防止被封
for skin_json in result:
skin_name.append(skin_json['name']) # 将英雄名保存到上面的空列表之中
skin_url.append(skin_json['mainImg']) # 将英雄下载地址保存到上面的空列表之中
hero_folder = 'allhero/' + response.json()['hero']['name'] + response.json()['hero']['title']
#设置保存的路径
if not os.path.exists(hero_folder):
os.mkdir(hero_folder)
# 判断路径是否存在不存在就创建一个
for i in range(len(skin_url)):
if not skin_url[i]=='':
image_path = hero_folder +'/' + re.findall('\w+',skin_name[i])[0] + '.png' # 具体设置图片的下载路径以及名称和格式
with open(image_path,'wb')as file:
print('正在下载{}'.format(skin_name[i])) # 打印下载进度
file.write(requests.get(skin_url[i],headers=headers).content) #下载图片
3、完整代码
如果遇到问题可以留言,我看到了就会解答,喜欢的话可以关注我呀,我基本每天都会更新有趣的东西
import requests
import os
import jsonpath
import re
import time
import random
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36'
}
def get_id(url):
response = requests.get(url, headers=headers).json()
hero_Id = jsonpath.jsonpath(response, '$..heroId')
time.sleep(random.randint(1, 3))
base_url = 'http://game.gtimg/images/lol/act/img/js/hero/{}.js'
return [base_url.format(every_id) for every_id in hero_Id]
def get_skin(li1):
for url in li1:
response = requests.get(url, headers=headers)
result = response.json()['skins']
skin_name = []
skin_url = []
time.sleep(random.randint(1, 3))
for skin_json in result:
skin_name.append(skin_json['name'])
skin_url.append(skin_json['mainImg'])
hero_folder = 'allhero/' + response.json()['hero']['name'] + response.json()['hero']['title']
if not os.path.exists(hero_folder):
os.mkdir(hero_folder)
for i in range(len(skin_url)):
if not skin_url[i] == '':
image_path = hero_folder + '/' + re.findall('\w+', skin_name[i])[0] + '.png'
with open(image_path, 'wb')as file:
print('正在下载{}'.format(skin_name[i]))
file.write(requests.get(skin_url[i], headers=headers).content)
if __name__ == '__main__':
hero_list_url = 'https://game.gtimg/images/lol/act/img/js/heroList/hero_list.js'
li1 = get_id(hero_list_url)
get_skin(li1)
4、王者荣耀
会遇到英雄名乱码,以及皮肤名乱码等问题,通过selenium获得网页解决,爬取思路很简单,附上结果截图以及源码,为啥我写的博客这么少人看呢
# url = 'https://game.gtimg/images/yxzj/img201606/skin/hero-info/150/150-bigskin-6.jpg'
# url = 'https://game.gtimg/images/yxzj/img201606/heroimg/167/167-smallskin-8.jpg'
# url = 'https:game.gtimg/images/yxzj/img201606/skin/hero-info/167/167-bigskin-8.jpg'
# url = 'https:game.gtimg/images/yxzj/img201606/skin/hero-info/506/506-bigskin-2.jpg'
# https://pvp.qq/web201605/herodetail/167.shtml
# 获得英雄名称以及英雄id
import requests,re,os,random,time
from lxml import etree
from selenium import webdriver
option = webdriver.ChromeOptions()
option.add_argument('headless')
driver = webdriver.Chrome(options=option)
list = ['Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.163 Safari/535.1',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11',
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1',
'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50',
'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36']
user_agent = random.choice(list)
header = {
'User-Agent': user_agent,
'Referer': 'https://pvp.qq/web201605/herolist.shtml'
}
driver.get('https://pvp.qq/web201605/herolist.shtml')
res = driver.page_source
dom1 = etree.HTML(res)
every_id = dom1.xpath('//ul[@class="herolist clearfix"]/li/a/@href')
every_name = dom1.xpath('//ul[@class="herolist clearfix"]/li/a//@alt')
for init_id,name in zip(every_id,every_name):
id = re.findall('herodetail/(.*?).shtml',init_id)[0]
detail_url = f'https://pvp.qq/web201605/herodetail/{id}.shtml'
driver.get(detail_url)
result2 = driver.page_source
dom2 = etree.HTML(result2)
skin_name = dom2.xpath('//div[@class="pic-pf"]/ul/@data-imgname')[0]
ervery_skin = skin_name.split('|')
hero_folder = os.getcwd() + '/王者荣耀/' + name
if not os.path.exists(hero_folder):
os.mkdir(hero_folder)
for i,skin_na in enumerate(ervery_skin):
# pic_link = 'https:game.gtimg/images/yxzj/img201606/skin/hero-info/506/506-bigskin-2.jpg'
pic_link = f'https://game.gtimg/images/yxzj/img201606/skin/hero-info/{id}/{id}-bigskin-{i+1}.jpg'
image_path = hero_folder + '/' + skin_na + '.jpg'
with open(image_path, 'wb')as file:
print('正在下载%s 之 %s' % (name, skin_na))
file.write(requests.get(pic_link, headers=header).content)
版权声明:本文标题:爬取实例三:爬取lol英雄联盟全阵容皮肤和爬王者荣耀全阵容皮肤 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://m.elefans.com/xitong/1728795506a1174126.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论