Scrapy Pipelines documentation¶
Installation¶
ItemPipeline¶
-
class
scrapy_pipelines.pipelines.
ItemPipeline
(settings: scrapy.settings.Settings = None)¶ Abstract Class for the item pipeline
-
abstract
close_spider
(spider: scrapy.spiders.Spider)¶ - Parameters
spider (Spider) –
- Returns
- Return type
-
classmethod
from_crawler
(crawler: scrapy.crawler.Crawler)¶ - Parameters
crawler (Crawler) –
- Returns
- Return type
-
abstract classmethod
from_settings
(settings: scrapy.settings.Settings)¶ - Parameters
settings (Settings) –
- Returns
- Return type
-
abstract
open_spider
(spider: scrapy.spiders.Spider)¶ - Parameters
spider (Spider) –
- Returns
- Return type
-
abstract
process_item
(item: scrapy.item.Item, spider: scrapy.spiders.Spider) → scrapy.item.Item¶ - Parameters
item (Item) –
spider (Spider) –
- Returns
- Return type
Item
-
abstract
Pipeline MongoDB¶
-
class
scrapy_pipelines.pipelines.mongo.
MongoPipeline
(uri: str, settings: scrapy.settings.Settings)¶ A pipeline saved into MongoDB asynchronously with txmongo
-
close_spider
(spider: scrapy.spiders.Spider)¶ - Parameters
spider (Spider) –
- Returns
- Return type
-
create_indexes
(spider: scrapy.spiders.Spider)¶ - Parameters
spider (Spider) –
- Returns
- Return type
-
classmethod
from_crawler
(crawler: scrapy.crawler.Crawler)¶ - Parameters
crawler (Crawler) –
- Returns
- Return type
-
classmethod
from_settings
(settings: scrapy.settings.Settings)¶ - Parameters
settings (Settings) –
- Returns
- Return type
-
item_completed
(result: str, item: scrapy.item.Item, spider: scrapy.spiders.Spider) → scrapy.item.Item¶ - Parameters
result (str) –
item (Item) –
spider (Spider) –
- Returns
- Return type
Item
-
open_spider
(spider: scrapy.spiders.Spider)¶ - Parameters
spider (Spider) –
- Returns
- Return type
-
process_item
(item: scrapy.item.Item, spider: scrapy.spiders.Spider) → scrapy.item.Item¶ - Parameters
item (Item) –
spider (Spider) –
- Returns
- Return type
Item
-
process_item_id
(signal: object, sender: scrapy.crawler.Crawler, item: scrapy.item.Item, spider: scrapy.spiders.Spider) → pymongo.results.InsertOneResult¶ - Parameters
signal (object) –
sender (Crawler) –
item (Item) –
spider (Spider) –
- Returns
- Return type
InsertOneResult
-
Items¶
A customized item for MongoDB
-
class
scrapy_pipelines.items.
BSONItem
(*args, **kwargs)¶ Pymongo creates _id automatcially in the object after inserting
Settings¶
The utilities used in settings module
-
scrapy_pipelines.settings.
unfreeze_settings
(settings: scrapy.settings.Settings) → Generator[scrapy.settings.Settings, None, None]¶ - Parameters
settings (Settings) –
- Returns
- Return type
Generator[Settings, None, None]
Signals¶
Signals for the pipelines
- Installation
Installation
- ItemPipeline
The root class for all pipelines
- Pipeline MongoDB
Save items into MongoDB
- Items
Items used in these pipelines
- Settings
Settings for these pipelines
- Signals
Signals used in these pipelines