1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
| """ 网络爬虫演示 分析: 1.导入对应的包,下面需要用到的有request,pyquery,json(json是标准的) 2.获取请求 3.查找数据 4.清洗数据 5.保存数据 Warning:该演示仅作为学习使用,严格遵守中华人民共和国法律法规,不作为任何商业和非法行为,所有数据在学习过后删除。 """
import requests import pyquery import json import pymysql import urllib.parse
def get_connect(): conn = pymysql.connect( host='localhost', port=2280, user='root', password='password', db='city', ) cursor = conn.cursor() return conn, cursor
def close_connect(conn, cursor): if cursor: cursor.close() if conn: conn.close()
city_name = input('请输入你要爬取的城市名称:')
cookies = { "city_name" : urllib.parse.quote(city_name) }
url = 'https://www.dongchedi.com/sales'
res = requests.get(url, cookies=cookies)
print(res, type(res))
if res.status_code == 200: print(url) print('请求成功') html = res.text with open('./dongchedi.html', 'w', encoding='utf-8') as f: f.write(html) query = pyquery.PyQuery(html) lis = query.find(".hot-sales-rank_item__UZthc") datas = [] for item in lis.items(): num = item.find(".hot-sales-rank_item-rank__1r1pX").text() name = item.find(".hot-sales-rank_item-info-title__JSnDI.line-1").text() price = item.find(".hot-sales-rank_item-info-price__2Gbyg").text() cost = item.find(".hot-sales-rank_item-sales-volume__19dk3").text() car = { 'num': num, 'name': name, 'price': price, 'cost': cost } datas.append(car) for car in datas: print(car) string_data = json.dumps(datas, ensure_ascii=False) with open('./dongchedi.json', 'w', encoding='utf-8') as f: f.write(string_data)
conn, cursor = get_connect() for car in datas: sql = 'insert into demo(city, num, name, price, cost) values (%s, %s, %s, %s, %s)' cursor.execute(sql, (city_name, car['num'], car['name'], car['price'], car['cost'])) conn.commit() close_connect(conn, cursor) print('input database success')
else: print('请求失败')
|