Appium实现爬取oppo应用商店评论|电子爱好者

admin管理员组
文章数量:1662090

Appium实现爬取oppo应用商店评论

环境配置
具体实现
- 连接到你想要爬取的APP
- 模拟人操作并拿取部分字段
- - 点击搜索框并输入搜索内容
  - 点击到详情页
  - 点击评论
  - 开始循环拿评论
- 解析并合并结果

环境配置

可以直接参考知乎大佬的文章
Appium环境搭建超详细教程

具体实现

连接到你想要爬取的APP

AppiumDesktop控制手机和安卓模拟器
专门为伸手党们整理了oppo应用商店的对应配置
{
“platformName”: “Android”,
“platformVersion”: “6.0”,
“deviceName”: “Andriod2”,
“noReset”: true,
“appPackage”: “com.oppo.market”,
“appActivity”: “.activity.MainActivity”
}

模拟人操作并拿取部分字段

此处写给出我自己写的一个基于python的appium的辅助工具类

from time import sleep
from appium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from PIL import Image
class App(object):
    def __init__(self,appPackage,appActivity,no_reset=True,platformVersion="6.0",deviceName="android_first",wait_time=None,udid=None):
        self.desired_caps = {
            'platformName': 'Android',
            'platformVersion': platformVersion,
            'deviceName': deviceName,
            "appPackage":appPackage,
            'appActivity':appActivity
        }
        if udid:
            self.desired_caps['udid']=udid
        if no_reset:
            self.desired_caps['noReset']=no_reset
        self.driver = webdriver.Remote('http://localhost:4723/wd/hub', self.desired_caps)
        # 当资源未加载出时,最大等待时间20S
        if wait_time:
            self.driver.implicitly_wait(wait_time)

    def click_by_id(self,id):
        self.driver.find_element_by_id(id).click()

    def close_app(self):
        self.driver.close_app()

    def back(self):
        self.driver.back()

    def click_by_xpath(self,xpath):
        self.driver.find_element_by_xpath(xpath).click()

    def click_by_name(self,name):
        self.driver.find_element_by_name(name).click()

    def click_by_class_name(self,class_name):
        self.driver.find_element_by_class_name(class_name).click()

    def send_keys_by_id(self,id,key):
        self.driver.find_element_by_id(id).send_keys(key)

    def send_keys_by_xpath(self,id,key):
        self.driver.find_element_by_xpath(id).send_keys(key)

    def send_keys_by_name(self,name,key):
        self.driver.find_element_by_name(name).send_keys(key)

    def send_keys_by_class_name(self,class_name,key):
        self.driver.find_element_by_class_name(class_name).send_keys(key)

    def tap_by_position(self,x_position,y_position,x_length=10,y_length=10,time=500):
        self.driver.tap([(x_position-x_length/2,y_position-y_length*2),(x_position+x_length/2,y_position+y_length*2)],time)

    def get_page_source(self):
        return self.driver.page_source

    def swipe(self,y_start,y_end,x_start=500,x_end=500,duration=1000):
        self.driver.swipe(x_start,y_start,x_end,y_end,duration)

    #获取方位和坐标，左上角和右下角的列表
    @staticmethod
    def get_bounds(element):
        rect=element.rect
        return [{'x':rect['x'],'y':rect['y']},{'x':rect['x']+rect['width'],'y':rect['y']+rect['height']}]

    def get_text_by_x_path(self,x_path):
        return str(self.driver.find_element_by_xpath(x_path).text)

    #method为enum，xpath或者id或者name或者class_name
    def swipe_by_element(self,method,value):
        if method=='xpath':
            element=self.driver.find_element_by_xpath(value)
        elif method=='id':
            element=self.driver.find_element_by_id(value)
        elif method=='name':
            element=self.driver.find_element_by_name(value)
        elif method=='class_name':
            element=self.driver.find_element_by_class_name(value)
        else:
            raise Exception("输入参数错误")

        bounds=self.get_bounds(element)
        x_start=x_end=(bounds[0]['x']+bounds[1]['x'])/2
        y_start=bounds[1]['y']
        y_end=bounds[0]['y']
        try:
            self.swipe(y_start,y_end,x_start,x_end)
        except Exception:
            self.swipe(500+abs(y_start-y_end),500)

    @staticmethod
    def screenshot_by_element(driver, element, out_image: str):
        """
        对指定元素截图
        :param driver:
        :param element: 元素
        :param out_image: 截图输出路径（验证码.png）
        """
        # step1 全屏图
        global_image = '全屏图.png'
        driver.get_screenshot_as_file(global_image)

        # step2 获取元素的四个坐标
        min_x = element.location['x']
        min_y = element.location['y']
        max_x = min_x + element.size['width']
        max_y = min_y + element.size['height']

        # step3 从全屏图中裁剪出目的元素的图片
        im = Image.open(global_image)
        im.save(global_image)
        im = Image.open(global_image)
        im = im.crop((min_x, min_y, max_x, max_y))  # 裁剪(左上至右下)
        print(min_x, min_y, max_x, max_y)
        im.save(out_image)

下面是程序主代码

from appium_utils.app import App
import time
import pandas as pd

def keywords_search(keyword):
    app=App('com.oppo.market','.activity.MainActivity',wait_time=30)

    list_title=[]
    time.sleep(10)



    #点击搜索框并输入值
    app.tap_by_position(347,145)
    app.send_keys_by_id("com.oppo.market:id/et_search",keyword)
    app.click_by_id("com.oppo.market:id/search_clear")
    app.send_keys_by_id("com.oppo.market:id/et_search",keyword)
    app.click_by_id("com.oppo.market:id/tv_search_text")


    #点击到详情页
    app.click_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.widget.ViewAnimator/android.widget.FrameLayout/android.widget.ViewAnimator/android.view.ViewGroup/android.widget.FrameLayout/android.widget.ListView/android.widget.LinearLayout[1]/android.widget.RelativeLayout/android.widget.LinearLayout")
    time.sleep(10)


    #点击评论按钮
    app.click_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.RelativeLayout[1]/android.widget.ScrollView/android.widget.LinearLayout/android.widget.RelativeLayout[2]/android.widget.LinearLayout/android.widget.TextView[2]")
    time.sleep(10)
    app.swipe(1330, 740)
    k=0

    #开始循环拿评论
    while True:
        k=k+1
        for i in range(1,5):
            try:
                element=app.driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.RelativeLayout[1]/android.widget.ScrollView/android.widget.LinearLayout/com.heytap.cdo.client.detail.ui.detail.widget.ColorViewPager/android.widget.ListView/android.widget.RelativeLayout[{}]/android.widget.LinearLayout[2]".format(i))
                user_name=app.get_text_by_x_path("/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.RelativeLayout[1]/android.widget.ScrollView/android.widget.LinearLayout/com.heytap.cdo.client.detail.ui.detail.widget.ColorViewPager/android.widget.ListView/android.widget.RelativeLayout[{}]/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.TextView[1]".format(i))
                desc=app.get_text_by_x_path("/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.RelativeLayout[1]/android.widget.ScrollView/android.widget.LinearLayout/com.heytap.cdo.client.detail.ui.detail.widget.ColorViewPager/android.widget.ListView/android.widget.RelativeLayout[{}]/android.widget.TextView[2]".format(i))
                publish_time=app.get_text_by_x_path("/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.RelativeLayout[1]/android.widget.ScrollView/android.widget.LinearLayout/com.heytap.cdo.client.detail.ui.detail.widget.ColorViewPager/android.widget.ListView/android.widget.RelativeLayout[{}]/android.widget.LinearLayout[1]/android.widget.TextView".format(i))
                like_num=app.get_text_by_x_path("/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.RelativeLayout[1]/android.widget.ScrollView/android.widget.LinearLayout/com.heytap.cdo.client.detail.ui.detail.widget.ColorViewPager/android.widget.ListView/android.widget.RelativeLayout[{}]/android.widget.TextView[1]".format(i))
                print(user_name+'_'+desc)
                key=hash(user_name+'_'+desc)
                data={"user_name":user_name,"desc":desc,"publish_time":publish_time,"like_num":like_num,"key":key}
                df=pd.DataFrame(data,index=[0])
                df.to_excel(r"data/excel/{}_{}.xlsx".format(keyword,key),index=False)
                app.screenshot_by_element(app.driver,element,"data/photo/{}_{}.png".format(keyword,key))
                list_title.append(key)
            except Exception as e:
                print(e)
                continue
        app.swipe(1330, 540)
        if k>20:
            # app.close_app()
            return

if __name__ == '__main__':
    # keywords_search("平安银行")
    # keywords_search("招商银行")
    keywords_search("中国建设银行")
    #keywords_search("浦发手机银行")

点击搜索框并输入搜索内容

此处我直接使用的tap_by_position方法，根据位置点击屏幕
此处需要注意一个点（可能是oppo应用商店自己的bug）在第一次输入值的时候会带上一些奇怪的东西

如图中输入值其实是浦发手机银行，但前面带上了抖音火山般，因此在程序中有两次输入，即第一次输入之后按删除键，再进行第二次输入

点击到详情页

没啥好说的，由于是精确搜索直接点击第一个元素就行了

点击评论

同样没啥好说的，点就完事了

开始循环拿评论

由于我的方案是通过XPATH拿的，通常应用商店的评论的格式都是一样的，因此只需要替换XPATH中的某一个或者某几个参数，从1-5循环就可以了
代码中几个变量分别为
element:星级对应的那一块，用来截图的
user_name:评论者
desc:评论内容
publish_time;评论时间
like_num:点赞数
key:我自己定义的一个唯一键

再根据自己定义的key值存成单个单个的excel（文件名按照keyword_key命名，后续要用），将星级截图也存下来。
photo截图长这样

解析并合并结果

之前那一步已经获得了除了评分之外的字段了，这一步要做的就是合并所有的单条数据，并附加上评分

话不多说，直接上代码

import os
from utils import io
import pandas as pd
from appium_utils.score_utils import score


def process(keyword):
    total_df=pd.DataFrame()
    for root,dirs,files in os.walk(r"data\excel"):
        for file in files:
            total_path=os.path.join(root,file)
            if keyword in total_path:
                df=pd.read_excel(total_path,encoding='utf8')
                photo_path=r"data/photo/{}.png".format(total_path.split("\\")[2].split(".xlsx")[0])
                df['score']=score(photo_path,10,10,30,215,215,215,255)# 要识别像素的坐标
                total_df=pd.concat([total_df,df],axis=0)
                io.move_file(total_path, r"data/checked")
    total_df.to_excel(r"data/final/{}.xlsx".format(keyword),index=False)

if __name__ == '__main__':
    # process("浦发手机银行")
    # process("平安银行")
    # process("招商银行")
    process("中国建设银行")

其中io.move_file的作用是把检查完的数据丢到另一个文件夹下，其对应代码为

def move_file(file, folder_tgt, suffix=0):
    if os.path.isdir(folder_tgt) is False:
        os.mkdir(folder_tgt)

    # if os.path.isfile(file):
    file_name = os.path.split(file)[1]

    file_type = file_name.split(".")[-1]

    new_name = os.path.join(folder_tgt, file_name)
    while os.path.isfile(new_name):
        suffix += 1
        new_name = os.path.join(
            folder_tgt,
            "{fn}({sfx}).{ft}".format(
                fn=file_name[:-(len(file_type) + 1)],
                sfx=suffix,
                ft=file_type
            )
        )

    shutil.copy(file, new_name)
    os.remove(file)

score的作用是打分（根据像素点识别）其代码如下

from PIL import Image

def score(path,fist_x,first_y,delta_x,r,g,b,d):
    level=0
    image = Image.open(path)
    for i in range(5):
        x=fist_x+i*delta_x
        y=first_y
        r_i,g_i,b_i,d_i=image.getpixel((x, y))
        if r_i!=r or g_i!=g or b_i!=b or d_i!=d:
            level=level+1
        else:
            break
    return level

path为文件路径，fist_x和first_y分别为第一颗星星中间点的坐标（偏一点没事），delta_x为每颗星星中间点的距离,rgbd分别为暗掉的星星中间点对应的rgbd值（不考虑颜色渐变的情况）

最终结果生成如下

本文标签：商店 Appium oppo

版权声明：本文标题：Appium实现爬取oppo应用商店评论内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/dongtai/1729956755a1217338.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

Appium实现爬取oppo应用商店评论

Appium实现爬取oppo应用商店评论

环境配置

具体实现

连接到你想要爬取的APP

模拟人操作并拿取部分字段

点击搜索框并输入搜索内容

点击到详情页

点击评论

开始循环拿评论

解析并合并结果

更多相关文章

Windows实现appium+iOS自动化测试

VIVO应用商店APP侵权投诉流程

app上架小米应用商店流程

点击手机按钮存在APP则唤醒否则打开应用商店

H5唤起应用商店或者应用市场

去应用商店给app评分

苹果商店APP预订功能

苹果xr邮件无法连接服务器,iPhone XR无法连接到APP Store怎么办？苹果应用商店打不开解决方...

Google Play 商店上架Android App应用出现的问题记录4-上架成功了但搜索不到自己的app应用

​cmd输入python打开应用商店解决方法

uniapp上架华为应用商店 隐私处理

UOS测试版本应用商店无法使用的问题

发布 apk 到小米应用商店

小米应用商店 总是自动更新

如何成功将自己开发的APP上架到应用商店

华为深度Linux系统使用教程,华为笔记本OEM版本Linux系统安装深度商店（deepin-appstore）的方法...

app 上架到苹果应用商店

BlackBerry应用商店(App World)

Google Play 商店上架Android App应用出现的问题记录3

苹果App，不用上传苹果商店，如何让其他人也能安装？

发表评论

推荐文章

mysql temporary_MySQL中临时表(TEMPORARY)

Adversarial learning for semi-supervised semantic segmentation

VMware安装ubuntu20.04（win11进入虚拟机后电脑直接蓝屏问题以及安装时分辨率问题解决）

svn在本地删除远程服务器上的文件

计算机录入技术课程教学,《计算机录入技术》课程标准 (1).doc

热门文章

求职与面试(一):Java必备

继瑞幸之后，爱奇艺再遭&quot;做空&quot;，背后的浑水研究到底是个什么来头！

安卓10岁了：这些消失的经典App你还记得吗？

Android BLE自动测试系统与框架

java创建与删除文件（文件夹）

CCJ PRML Study Note -Chapter 1 Summary : MLE (Maximum-likelihood Estimate) and Bayesian Approach

android 双拼输入法,高效输入解决方案——双拼输入法

Windows Vista KMS使用实录

华为mate20保时捷鸿蒙系统,华为Mate20 RS保时捷版双清教程_清理缓存和恢复出厂设置方法...

cmd输入python弹出windows应用商店的问题

最新文章

Windows10系统映像

Alienware m17 R3 原厂Win10系统包下载指南

win7讲述人安装包_Windows7系统里应用＂讲述人＂程序的方法

win7优化方案

win7默认Aero系统主题灰色修复源码

ASUS华硕天选33P笔记本原装Win11系统下载：重拾原厂体验

win7系统备份还原软件_免费Win7Win10备份和创建系统镜像？

win7窗口颜色没有透明的解决方法

win7计算机怎么放在桌面上,win7系统把我的电脑放到桌面的解决办法

CMD命令行方式更改Win7系统主题

重拾初心：Hasee神舟战神Z7M-CT7NA笔记本原厂Windows10系统镜像推荐

如何使用parallelsdesktopMac虚拟机安装Win7

win7程序员御用主题包制作

快速解决win7系统Aero主题无法使用

Win7主题文件themepack不能安装，低级错误！

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

cmd输入python打开应用商店解决方法

uniapp上架华为应用商店隐私处理

小米应用商店总是自动更新

继瑞幸之后，爱奇艺再遭"做空"，背后的浑水研究到底是个什么来头！

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载