python-webscraping-quickstart

Python based Web-scraping Quick Start Project.

For Scraping the project uses Selenium & Scrapy framework.

Setup

  • Clone this repositary
 git clone "https://github.com/dileep-gadiraju/python-webscraping-quickstart"
  • After cloning, Install python packages by running the following command from ./src.
pip install -r "requirements.txt"
python app.py

Successful local deployment should show Server is up on port 5001.

Documentation

For Scripting and configuration documentation, refer Documentation.

API Reference

Get all Agents

 GET /general/agents

No paramenters Required

Start a Scraping Job

  POST /general/run

The following are mandatory Request Body Parameters

Parameter Type Description
agentId string Valid AGENT-ID
type string Valid Type Of JOB
search string my search query

Get Job Status

 GET /general/status
Parameter Type Description
JobId string (required) uuid of a job

API Authorization

Currently the projects uses basic aurthorization for authentication.

Set the following environment_variable:

Variables Type Description
BASIC_HTTP_USERNAME string username for server
BASIC_HTTP_PASSWORD string password for server

Authors