Scrapy Pipelines documentation

Installation

ItemPipeline

class scrapy_pipelines.pipelines.ItemPipeline(settings: scrapy.settings.Settings = None)

Abstract Class for the item pipeline

abstract close_spider(spider: scrapy.spiders.Spider)
Parameters

spider (Spider) –

Returns

Return type

classmethod from_crawler(crawler: scrapy.crawler.Crawler)
Parameters

crawler (Crawler) –

Returns

Return type

abstract classmethod from_settings(settings: scrapy.settings.Settings)
Parameters

settings (Settings) –

Returns

Return type

abstract open_spider(spider: scrapy.spiders.Spider)
Parameters

spider (Spider) –

Returns

Return type

abstract process_item(item: scrapy.item.Item, spider: scrapy.spiders.Spider) → scrapy.item.Item
Parameters
  • item (Item) –

  • spider (Spider) –

Returns

Return type

Item

Pipeline MongoDB

class scrapy_pipelines.pipelines.mongo.MongoPipeline(uri: str, settings: scrapy.settings.Settings)

A pipeline saved into MongoDB asynchronously with txmongo

close_spider(spider: scrapy.spiders.Spider)
Parameters

spider (Spider) –

Returns

Return type

create_indexes(spider: scrapy.spiders.Spider)
Parameters

spider (Spider) –

Returns

Return type

classmethod from_crawler(crawler: scrapy.crawler.Crawler)
Parameters

crawler (Crawler) –

Returns

Return type

MongoPipeline

classmethod from_settings(settings: scrapy.settings.Settings)
Parameters

settings (Settings) –

Returns

Return type

MongoPipeline

item_completed(result: str, item: scrapy.item.Item, spider: scrapy.spiders.Spider) → scrapy.item.Item
Parameters
  • result (str) –

  • item (Item) –

  • spider (Spider) –

Returns

Return type

Item

open_spider(spider: scrapy.spiders.Spider)
Parameters

spider (Spider) –

Returns

Return type

process_item(item: scrapy.item.Item, spider: scrapy.spiders.Spider) → scrapy.item.Item
Parameters
  • item (Item) –

  • spider (Spider) –

Returns

Return type

Item

process_item_id(signal: object, sender: scrapy.crawler.Crawler, item: scrapy.item.Item, spider: scrapy.spiders.Spider) → pymongo.results.InsertOneResult
Parameters
  • signal (object) –

  • sender (Crawler) –

  • item (Item) –

  • spider (Spider) –

Returns

Return type

InsertOneResult

Items

A customized item for MongoDB

class scrapy_pipelines.items.BSONItem(*args, **kwargs)

Pymongo creates _id automatcially in the object after inserting

Settings

The utilities used in settings module

scrapy_pipelines.settings.unfreeze_settings(settings: scrapy.settings.Settings) → Generator[scrapy.settings.Settings, None, None]
Parameters

settings (Settings) –

Returns

Return type

Generator[Settings, None, None]

Signals

Signals for the pipelines

Installation

Installation

ItemPipeline

The root class for all pipelines

Pipeline MongoDB

Save items into MongoDB

Items

Items used in these pipelines

Settings

Settings for these pipelines

Signals

Signals used in these pipelines

Indices and tables