admin管理员组

文章数量:1539852

2024年4月7日发(作者:)

网络爬虫毕业论文

抓取网页中所有链接Java代码

摘 要

因特网目前是一个巨大、分布广泛、全球性的信息服务中心,它涉及新闻、广

告、消费信息、金融管理、教育、政府、电子商务和许多其它信息服务。但

Internet所固有的开放性、动态性与异构性,使得准确快捷地获取网络信息存在

一定难度。

本文的目的就是对网站内容进行分析,解析其中的超链接以及对应的正文信

息,然后再通过URL与正文反馈网站内容,设计出抓取网页链接这个程序。

抓取网页中的所有链接是一种搜集互联网信息的程序。通过抓取网页中的链接

能够为搜索引擎采集网络信息,这种方法有生成页面简单、快速的优点,提高了网

页的可读性、安全性,生成的页面也更利于设计者使用。

关键词: 网页解析;JAVA;链接;信息抽取

Scraping of the page all links in the Java code

Abstract

The Internet is a large, widely distributed, global information

service center, it involves news, advertisement, consumption information,

financial management, education, government, electronic commerce and

many other information services. But the Internet inherent in the open,

dynamic and heterogeneous sex, make quickly and accurately obtain the

network information has certain difficulty.

The purpose of this article is to analyze the content of the website,

which resolves the hyperlink and the corresponding text message, and

then through the website URL and the text content of the feedback,design

the scraping of the page links to this program.

Scraping of the page all links is a program to collect information

on the Internet. Collected by search engines can crawl the web link in

the network information, this approach has generated page is simple,

quick advantage, improve the readability of web security, generated

pages are also more conducive to the designer to use.

Key words: Page analysis; JAVA; link; information ext

ii

目录

要 .....................................................................

................................................................... I

ABSTRACT ...........................................................

...................................................................II

1 绪

论 .....................................................................

................................................................ 1

1.1 课题背

景 .....................................................................

.................................................. 1 1.2 网页信息抓取的

历史和应

用 .....................................................................

................. 1 1.3 抓取链接技术的现

本文标签: 信息链接网络网页抓取