套餐购买

获取代理

代理产品

帮助中心

企业服务

推广计划

登录

注册

个人中心

退出登录

如何借助代理ip获取UC新闻内容

IPIDEA

2022-09-23

　　

　　选择优质的ip代理，我们可以用它来完成很多网络工作，比如在线大数据捕获，实际上是依靠ip代理来做。接下来IPIDEA介绍一个爬行新闻网站内容的教程。

如何借助代理ip获取UC新闻内容.png

　　IPIDEA以UC网站为例子：

　　这个网站并没有太复杂的访问虫，我们可以直接解析爬取就好。

　　from bs4 import BeautifulSoup

　　from urllib import request

　　def download(title,url):

　　req = request.Request(url)

　　response = request.urlopen(req)

　　response = response.read().decode(utf-8)

　　soup = BeautifulSoup(response,lxml)

　　tag = soup.find(div,class_=sm-article-content)

　　if tag == None:

　　return 0

　　title = title.replace(:,)

　　title = title.replace(",)

　　title = title.replace(|,)

　　title = title.replace(/,)

　　title = title.replace(\,)

　　title = title.replace(*,)

　　title = title.replace(<,)

　　title = title.replace(>,)

　　title = title.replace(?,)

　　with open(rD:codepythonspider_newsUC_newssociety\ + title + .txt,w,encoding=utf-8) as file_object:

　　file_object.write( )

　　file_object.write(title)

　　file_object.write( )

　　file_object.write(该新闻地址：)

　　file_object.write(url)

　　file_object.write( )

　　file_object.write(tag.get_text())

　　#print(正在爬取)

　　if __name__ == __main__:

　　for i in range(0,7):

　　url = https://news.uc.cn/c_shehui/

　　# headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.91 Safari/537.36",

　　# "cookie":"sn=3957284397500558579; _uc_pramas=%7B%22fr%22%3A%22pc%22%7D"}

　　# res = request.Request(url,headers = headers)

　　res = request.urlopen(url)

　　req = res.read().decode(utf-8)

　　soup = BeautifulSoup(req,lxml)

　　#print(soup.prettify())

　　tag = soup.find_all(div,class_ = txt-area-title)

　　#print(tag.name)

　　for x in tag:

　　news_url = https://news.uc.cn + x.a.get(href)

　　print(x.a.string,news_url)

　　download(x.a.string,news_url)

　　这样，我们就完成了UC数据的抓取，通过检查运行结果可以看到，是否成功获得数据。

声明：本文来自网络投稿，不代表IPIDEA立场，若存在侵权、安全合规问题，请及时联系IPIDEA进行删除。

上一篇：代理服务器有哪些作用？

下一篇：如何判断ip代理是否能用？

最新文章

热门文章

- 220+地区

- 动态住宅IP

- 独享静态IP

- 9000万代理池

- 无限并发

- HTTP(S)/SOCKS5协议

- 城市级定位

- 不限带宽

- 稳定不掉线

QQ客服

微信客服