Python3.5+requests 爬取網(wǎng)站遇到中文亂碼怎么辦？???è????????è?ˉ?o??′2? ????é?￠

2023-10-10 12:49| 來(lái)源: 網(wǎng)絡(luò)整理| 查看: 265

import requests from bs4 import BeautifulSoup url = 'http://quote.eastmoney.com/stocklist.html' user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers = {'User-Agent': user_agent} req = requests.get(url, headers=headers) req.encoding = 'utf-8' bs = BeautifulSoup(req.content, 'html.parser') # type: BeautifulSoup quotesearch = bs.find('div', attrs={'id': 'quotesearch'}) print(quotesearch)

運(yùn)行以上代碼，顯示結(jié)果如下：

???31é·Y(300737) °?·éêy?Y(300738) ?÷??μ??·(300739) óù?ò??(300740) ?a±|1é·Y(300741)

1.解決思路一：查看網(wǎng)頁(yè)的編碼方式

F12打開(kāi)網(wǎng)站地址，查看最上方head，發(fā)現(xiàn)編碼方式為‘gb2312’(charset=gb2312)，修改代碼第八行req.encoding = 'gb2312',重新運(yùn)行代碼。運(yùn)行結(jié)果未改變，仍有亂碼。

2.解決思路二：修改代碼第九行bs = BeautifulSoup(req.text, 'html.parser')，將req.content改為req.text，運(yùn)行代碼，結(jié)果正常，無(wú)亂碼。

原理：

resp.text返回的是Unicode型的數(shù)據(jù)。 resp.content返回的是bytes型也就是二進(jìn)制的數(shù)據(jù)

因此如果我們想讀取解析文本數(shù)據(jù)時(shí)，使用的是response.text。而想讀取解析圖片文件，往往使用的就是response.content

轉(zhuǎn)載自：https://blog.csdn.net/weixin_41931602/article/details/81181946

【本文地址】

公司簡(jiǎn)介

聯(lián)系我們

今日新聞

推薦新聞

專題文章