爬取校园新闻首页的新闻-白红宇

爬取校园新闻首页的新闻

阅读量：4334 次

发布时间：2019-06-07

本文共 1421 字，大约阅读时间需要 4 分钟。

import requestsfrom bs4 import BeautifulSoupfrom datetime import datetime url = "http://news.gzcc.cn/html/xiaoyuanxinwen/"res = requests.get(url)res.encoding = 'utf-8'soup = BeautifulSoup(res.text, 'html.parser')a = soup.select('li')  for news in a:    if len(news.select('.news-list-title'))>0:        t = news.select('.news-list-title')[0].text        dt = news.select('.news-list-info')[0].contents[0]        dd = news.select('.news-list-info')[0].contents[1].text        a1 = news.select('a')[0].attrs['href']        res1 = requests.get(a1)        res1.encoding = 'utf-8'        soup1 = BeautifulSoup(res1.text, 'html.parser')        content = soup1.select("#content")[0].text        about = soup1.select('.show-info')[0].text        time = about.lstrip('发布时间:')[:19]        s = datetime.strftime(time,'%Y-%m-%d %H:%M:S%')        now = datetime.now()        type(now)        now.strftime('%Y-%m-%d %H:%M:S%')        if about.find('来源：')>0:            origin = about[about.find('来源：'):].split()[0].lstrip('来源：')        if about.find('作者：')>0:            writer = about[about.find('作者：'):].split()[0].lstrip('作者：')        if about.find('摄影：')>0:            photograph = about[about.find('摄影：'):].split()[0].lstrip('摄影：')        print(t,dt,dd,a1,now,origin,writer,photograph)str = '2018-03-30 17:10:12 'datetime.strptime(str,'%Y-%m-%d %H:%M:%S ')print('\n',str)

转载于:https://www.cnblogs.com/sunset-Panda/p/8697481.html

你可能感兴趣的文章

总结（6）--- python基础知识点小结（细全）

我的Android进阶之旅------>Android嵌入图像InsetDrawable的使用方法

查看>>