admin管理员组文章数量:1663656
爬取学校超星网上未完成作业或考试,并输出至qq邮箱
- 相关接口
- 思路
- 登录
- 爬取课程信息
- 根据课程信息构造链接爬取作业和考试信息
- 全代码
项目结构采用我之前编写的爬虫模板
爬虫模板.
相关接口
因为完成代码和写这篇笔记隔了好几天的,懒得截图接口信息了,这里就贴上我之前爬取的笔记,可以自己去抓取下练练手
courseid,classid,cpi等参数为 id 类,是不变的,且后续网站提供的链接并无此类参数,可以储存起来,
而每个链接都需要enc,而enc随机生成,每个链接都不一样,所以不能采用一步到位爬取,得依次爬取解析html取出链接访问
/**
* 姓名,学号,电话接口:
* 这里是会有参数的,但是我当时忘了贴
* @返回:html
* http://passport2.chaoxing/mooc/accountManage
*
* 课程数据接口:
* 这里是会有参数的,但是我当时忘了贴
* @返回:html
* http://mooc1-1.chaoxing/visit/courselistdata
*
* 课程首页接口:
* @参数:可来自课程数据接口
* courseid,courseid,cpi为
* @返回:html
* https://mooc2-ans.chaoxing/mycourse/stu?courseid=???&classId=???&cpi=???&enc=cda787016ce2c1f162fe6230b02bf948&t=1630417924301&pageHeader=9
*
* 课程作业接口:
* @参数:来自课程首页
* @返回:html
* https://mooc1.chaoxing/mooc2/work/list?courseId=?&classId=?&cpi=?&enc=27719c62b0263f5249aa52edea81c125&
*
* 课程考试接口:
* @参数:来自课程首页
* @返回html
* https://mooc1.chaoxing/mooc2/exam/exam-list?enc=a13c03b8366c587872c31e0a5d16eac6&openc=5a474025b333b5b147076b1eb3c38ab3&courseid=???&clazzid=???031&cpi=???&ut=s
*
* 课程任务接口:
* @参数:来自课程首页
* @返回:json
* https://mobilelearn.chaoxing/v2/apis/active/student/activelist?fid=1971&courseId=???&classId=???
*
思路
获取课程信息->构造首页链接->访问考试,作业接口->解析出未完成,并添加到json,
得到接口后,就开始分析.因为访问课程考试,作业信息时,有enc参数,此参数不固定,为服务器随机生成,所以不能直接访问,且课程首页提供的链接不包含课程的courseid,classid,cpi,
需要先到课程首页获取考试,作业链接,并添加courseid,classid,cpi,访问获得考试,课程信息
得到信息后解析,输出
登录
登录没什么说的,主要看地址,参数
地址抓包来的
参数 分析
分析出那些为通用参数(人人都一样)
那些为个性化参数(都有,但没有不一样)
在分析下加密
完成
@Override
public boolean login() {
boolean result = false;
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpClientContext httpClientContext = HttpClientContext.create();
HttpPost post = new HttpPost("http://passport2.chaoxing/fanyalogin");
//将密码用base64方式
String upwbase64password = base64(password);
List<NameValuePair> paramslist = new ArrayList<NameValuePair>();
paramslist.add(new BasicNameValuePair("fid", "1971"));//常量
paramslist.add(new BasicNameValuePair("uname", uname));
paramslist.add(new BasicNameValuePair("password", upwbase64password));
paramslist.add(new BasicNameValuePair("refer", "http://i.mooc.chaoxing"));//常量
paramslist.add(new BasicNameValuePair("t", "true"));//常量
try {
UrlEncodedFormEntity urlEncodedFormEntity = new UrlEncodedFormEntity(paramslist, "UTF-8");
post.setEntity(urlEncodedFormEntity);
} catch (UnsupportedEncodingException e1) {
e1.printStackTrace();
}
CloseableHttpResponse response;
try {
response = httpclient.execute(post, httpClientContext);
if (response.getStatusLine().getStatusCode() == 200) {
//将服务器返回的cookie放在cookieStore中,以便下次使用
cookieStore = httpClientContext.getCookieStore();
result = true;
}
} catch (ClientProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return result;
}
爬取课程信息
courseid,classid,cpi等参数为 id 类,描述课程id,学生id,是不变的,且后续网站提供的链接并无此类参数(js动态链接),可以储存起来,
这个方法爬取用户首页并获取课程信息,并储存至全局变量courselist中
public void getcourse() {
courselist = new ArrayList<schoolclass>();
HashMap<String, String> params = new HashMap<String, String>();
params.put("courseType", "1");
params.put("courseFolderId", "0");
params.put("courseFolderSize", "0");
HttpResponse response = post("http://mooc1-1.chaoxing/visit/courselistdata", params);
Document document = null;
try {
document = Jsoup.parse(EntityUtils.toString(response.getEntity()));
} catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Elements elements = document.getElementsByAttributeValue("class", "course clearfix");
elements.forEach(element -> {
String courseid = element.attr("courseId");
String clazzId = element.attr("clazzId");
String cpi = element.attr("personId");
String name = element.child(1).child(0).child(0).child(0).text();
courselist.add(new schoolclass(name, courseid, clazzId, cpi));
});
}
根据课程信息构造链接爬取作业和考试信息
此方法为获取单门课程的信息,还封装了一个遍历所有课程的方法,见全代码
public JSONObject getUncompleteHomeworkAndexame(schoolclass sc) throws ParseException, IOException {
JSONObject jclass = new JSONObject();
JSONArray ja = new JSONArray();
String url = "https://mooc2-ans.chaoxing/mycourse/stu?courseid=" + sc.getCourseid() + "&cpi=" + sc.getCpi()
+ "&clazzid=" + sc.getClassid() + "&pageHeader=8";
HttpResponse httpResponse = get(url, null);
Document document = null;
document = Jsoup.parse(EntityUtils.toString(httpResponse.getEntity()));
Elements homework = document.getElementsByAttributeValue("title", "作业");
Elements exame = document.getElementsByAttributeValue("title", "考试");
String homeworkurl = homework.attr("data-url");
String exameurl = exame.attr("data-url") + "&courseid=" + sc.getCourseid() + "&cpi=" + sc.getCpi() + "&clazzid="
+ sc.getClassid();
HttpResponse homeworkresponse = get(homeworkurl, null);
Document homeworkdocument = Jsoup.parse(EntityUtils.toString(homeworkresponse.getEntity()));
Elements homeworks = homeworkdocument.getElementsByTag("li");
homeworks.forEach(li -> {
if (li.getElementsByClass("status").text().equals("未交")) {
String hwname = li.getElementsByClass("overHidden2 fl").text();
Elements time = li.getElementsByClass("time notOver");
if(!time.isEmpty()) {
String timestr=time.text();
JSONObject jo = new JSONObject();
jo.put("名称", hwname);
jo.put("类型","作业");
jo.put("时间",timestr);
ja.add(jo);
}
}
});
HttpResponse exameresponse = get(exameurl, null);
Document examedocument = Jsoup.parse(EntityUtils.toString(exameresponse.getEntity()));
Elements exames = examedocument.getElementsByTag("li");
exames.forEach(li -> {
if (li.getElementsByClass("status").text().equals("未完成")) {
String hwname = li.getElementsByClass("overHidden2 fl").text();
Elements time = li.getElementsByClass("time notOver");
if(!time.isEmpty()) {
String timestr=time.text();
JSONObject jo = new JSONObject();
jo.put("名称", hwname);
jo.put("类型","考试");
jo.put("时间",timestr);
ja.add(jo);
}
}
});
if(!ja.isEmpty()) {
jclass.put("name",sc.getName());
jclass.put("uncompleted", ja);
}
return jclass;
}
全代码
public class caoxing extends mypachongimpl {
String password;//密码 ,且密码采用base64加密
String uname;//账号
String email;//要接受的邮箱
//shcoolclass封装了courseid,classid,cpi等参数,至于为啥有俩list,我也忘了
List<schoolclass> classlist = null;
List<schoolclass> courselist = null;
//构造器
public caoxing(String uname, String password,String email) {
this.password = password;
this.uname = uname;
this.email = email;
}
public caoxing() {
}
//相当于主函数 ,解析其他方法构造的json信息并发送.在使用时,直接调用这个方法
public void domain() {
JSONArray json=null;
try {
json = getUncomplete();//这个方法返回所有未完成项目的json,主要的爬虫代码就在这个方法中
} catch (ParseException e1) {
e1.printStackTrace();
} catch (IOException e1) {
e1.printStackTrace();
}
if(!json.isEmpty()) {
Iterator<JSONObject> iter = json.iterator();
String str="";
while(iter.hasNext()) {
JSONObject next = iter.next();
String classname = (String) next.get("name");
JSONArray uncompletedlist = (JSONArray) next.get("uncompleted");
Iterator<JSONObject> listiter = uncompletedlist.iterator();
String strlist="";
while(listiter.hasNext()) {
JSONObject next2 = listiter.next();
String taskname = (String) next2.get("名称");
String type = (String) next2.get("类型");
String time = (String) next2.get("时间");
strlist="<br>您的: "+taskname+"存在未完成:"+type+" "+time;
}
str+="您的 "+classname+" "+"存在未完成作业或考试,名单如下:"+strlist+"<br>";
}
mail.sendmail(email,str);//发送邮件,我直接封装的mail模块,csdn上一代堆,拿过来封装成一个类,哪里需要cv哪里
}
}
/**
* ��ȡ����δ��ɵ���ҵ�Ϳ���(����ʱ��)
* @return
* @throws ParseException
* @throws IOException
*/
public JSONArray getUncomplete() throws ParseException, IOException {
if(courselist==null) getcourse();
Iterator<schoolclass> iter =courselist.iterator();
JSONArray ja = new JSONArray();
while (iter.hasNext()) {
schoolclass next = iter.next();
JSONObject jo = getUncompleteHomeworkAndexame(next);
if(!jo.isEmpty()) {
ja.add(jo);
}
}
return ja;
}
/**
* ��ȡ���ſγ�δ��ɵ���ҵ�Ϳ���(����ʱ��)
* @param sc ��Ҫ��ѯ��school����
* @return ����JSONObject
* @throws ParseException
* @throws IOException
*/
public JSONObject getUncompleteHomeworkAndexame(schoolclass sc) throws ParseException, IOException {
JSONObject jclass = new JSONObject();
JSONArray ja = new JSONArray();
String url = "https://mooc2-ans.chaoxing/mycourse/stu?courseid=" + sc.getCourseid() + "&cpi=" + sc.getCpi()
+ "&clazzid=" + sc.getClassid() + "&pageHeader=8";
HttpResponse httpResponse = get(url, null);
Document document = null;
document = Jsoup.parse(EntityUtils.toString(httpResponse.getEntity()));
Elements homework = document.getElementsByAttributeValue("title", "作业");
Elements exame = document.getElementsByAttributeValue("title", "考试");
String homeworkurl = homework.attr("data-url");
String exameurl = exame.attr("data-url") + "&courseid=" + sc.getCourseid() + "&cpi=" + sc.getCpi() + "&clazzid="
+ sc.getClassid();
HttpResponse homeworkresponse = get(homeworkurl, null);
Document homeworkdocument = Jsoup.parse(EntityUtils.toString(homeworkresponse.getEntity()));
Elements homeworks = homeworkdocument.getElementsByTag("li");
homeworks.forEach(li -> {
if (li.getElementsByClass("status").text().equals("未交")) {
String hwname = li.getElementsByClass("overHidden2 fl").text();
Elements time = li.getElementsByClass("time notOver");
if(!time.isEmpty()) {
String timestr=time.text();
JSONObject jo = new JSONObject();
jo.put("名称", hwname);
jo.put("类型","作业");
jo.put("时间",timestr);
ja.add(jo);
}
}
});
HttpResponse exameresponse = get(exameurl, null);
Document examedocument = Jsoup.parse(EntityUtils.toString(exameresponse.getEntity()));
Elements exames = examedocument.getElementsByTag("li");
exames.forEach(li -> {
if (li.getElementsByClass("status").text().equals("未完成")) {
String hwname = li.getElementsByClass("overHidden2 fl").text();
Elements time = li.getElementsByClass("time notOver");
if(!time.isEmpty()) {
String timestr=time.text();
JSONObject jo = new JSONObject();
jo.put("名称", hwname);
jo.put("类型","考试");
jo.put("时间",timestr);
ja.add(jo);
}
}
});
if(!ja.isEmpty()) {
jclass.put("name",sc.getName());
jclass.put("uncompleted", ja);
}
return jclass;
}
/**
* ��¼֮���ȡ�γ���Ϣ,����ֵ������ij�Ա����
*/
public void getcourse() {
courselist = new ArrayList<schoolclass>();
HashMap<String, String> params = new HashMap<String, String>();
params.put("courseType", "1");
params.put("courseFolderId", "0");
params.put("courseFolderSize", "0");
HttpResponse response = post("http://mooc1-1.chaoxing/visit/courselistdata", params);
Document document = null;
try {
document = Jsoup.parse(EntityUtils.toString(response.getEntity()));
} catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Elements elements = document.getElementsByAttributeValue("class", "course clearfix");
elements.forEach(element -> {
String courseid = element.attr("courseId");
String clazzId = element.attr("clazzId");
String cpi = element.attr("personId");
String name = element.child(1).child(0).child(0).child(0).text();
courselist.add(new schoolclass(name, courseid, clazzId, cpi));
});
}
/**
* ����������get,postʵ��,��Ϊ,get,post����ô˷�����ʼ��cookieStore
*/
@Override
public boolean login() {
boolean result = false;
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpClientContext httpClientContext = HttpClientContext.create();
HttpPost post = new HttpPost("http://passport2.chaoxing/fanyalogin");
String upwbase64password = base64(password);
List<NameValuePair> paramslist = new ArrayList<NameValuePair>();
paramslist.add(new BasicNameValuePair("fid", "1971"));
paramslist.add(new BasicNameValuePair("uname", uname));
paramslist.add(new BasicNameValuePair("password", upwbase64password));
paramslist.add(new BasicNameValuePair("refer", "http://i.mooc.chaoxing"));
paramslist.add(new BasicNameValuePair("t", "true"));
try {
UrlEncodedFormEntity urlEncodedFormEntity = new UrlEncodedFormEntity(paramslist, "UTF-8");
post.setEntity(urlEncodedFormEntity);
} catch (UnsupportedEncodingException e1) {
e1.printStackTrace();
}
CloseableHttpResponse response;
try {
response = httpclient.execute(post, httpClientContext);
if (response.getStatusLine().getStatusCode() == 200) {
cookieStore = httpClientContext.getCookieStore();
result = true;
}
} catch (ClientProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return result;
}
/**
*
* @param str Ҫ���ܵ�str
* @return ���ܺ��str
*/
public static String base64(String str) {
String result = null;
// ��Ϊ�ջ�null��ֱ�ӷ���null
if (str.equals("") || str == null) {
return null;
}
byte[] bytes = str.getBytes();
result = Base64.getEncoder().encodeToString(bytes);
return result;
}
}
版权声明:本文标题:爬取学校超星网上未完成作业或考试,并输出至qq邮箱 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://m.elefans.com/dianzi/1730000924a1218803.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论