python 爬虫入门：获取在百度图片搜索的时候第一页的所有图片并下载

url 为：

http://image.baidu.com/search/flip?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1460997499750_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=xxx

其中结尾的 xxx 代表要搜索的图片，比如: 闪电侠等

以下代码是，提醒要爬取什么图片之后，再自动下载采集，只采集其中的一页

# coding:utf8

import re
import requests
import os

name = input("请输入你想要的图片：")

url = "http://image.baidu.com/search/flip?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1460997499750_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word={}".format(name)

html = requests.get(url).text

image_path = os.path.join(os.path.dirname(__file__),"images/{}".format(name))

pic_url = re.findall('"objURL":"(.*?)",',html,re.S)

if not os.path.exists(image_path):
    os.makedirs(image_path)

i = 0
for each in pic_url:
    file_name = image_path + '/' + str(i) + '.jpg'
    print(each)
    try:
        pic = requests.get(each,timeout=10)
    except:
        print('当前图片无法下载')
        continue
    f = open(file_name,'wb')
    f.write(pic.content)
    f.close()
    i += 1

上面的代码采用的是 requests + re 来获取到所有图片的链接，并下载，思路：

requests 获取到网页内容
用 re 正则来获取网页中图片的链接
再使用 requests 来下载图片

注意

采用 python3.6，python2 的需要注意编码问题
如果没有 requests 包的话，请 pip install requests 安装

本文作者为 olei，转载请注明。

爬虫