用python写一个爬取代理IP的脚本并验证IP是否能用,将能用的代理IP保存到IP.txt

动态 未结 0 111
小小草
小小草 LV1 2023年9月23日 23:40 发表
以下是一个简单的爬取代理IP并验证可用性的Python脚本,将可用的代理IP保存到IP.txt文件中: ```python import requests from bs4 import BeautifulSoup # 获取代理IP列表 def get_proxy_list(): url = 'https://www.xicidaili.com/nn/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table', attrs={'id': 'ip_list'}) trs = table.find_all('tr')[1:] proxy_list = [] for tr in trs: tds = tr.find_all('td') ip = tds[1].text.strip() port = tds[2].text.strip() protocol = tds[5].text.strip().lower() proxy = protocol + '://' + ip + ':' + port proxy_list.append(proxy) return proxy_list # 验证代理IP是否可用 def check_proxy(proxy): url = 'https://www.baidu.com/' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} try: response = requests.get(url, headers=headers, proxies={'http': proxy, 'https': proxy}, timeout=5) if response.status_code == 200: return True else: return False except: return False # 保存可用的代理IP到文件中 def save_proxy(proxy_list): with open('IP.txt', 'w') as f: for proxy in proxy_list: if check_proxy(proxy): f.write(proxy + '\n') if __name__ == '__main__': proxy_list = get_proxy_list() save_proxy(proxy_list) ``` 该脚本首先通过requests库和BeautifulSoup库获取西刺代理网
收藏(0)  分享
相关标签: 智能问答
问题没解决?让AI助手帮你作答 AI助手
0个回复
  • 消灭零回复