好的,可以采用代理和多线程来加快获取速度。以下是示例代码:
```python
import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor
proxies = {
'http': 'http://127.0.0.1:10809',
'https': 'http://127.0.0.1:10809'
}
def get_page(url):
try:
response = requests.get(url, proxies=proxies)
if response.status_code == 200:
return response.text
except Exception as e:
print(e)
def parse_page(html):
soup = BeautifulSoup(html, 'html.parser')
items = soup.find_all('div', class_='HotItem-content')
for item in items:
title = item.find('h2').text.strip()
content = item.find('p').text.strip()
print(f"{title}\n{content}\n")
if __name__ == '__main__':
urls = ['https://www.zhihu.com/hot', 'https://www.zhihu.com/hot/rank']
with ThreadPoolExecutor(max_workers=2) as executor:
for url in urls:
executor.submit(parse_page, get_page(url))
```
这里使用了`requests`库的代理功能,并且使用了`concurrent.futures`库的`ThreadPoolExecutor`类来实现多线程。同时,将获取页面和解析页面的代码分别封装在两个函数中,使得代码更加清晰易懂。