Find file Blame History Permalink

Merge branch 'develop' of... · 56401bcf

pushkar191098 authored 2 years ago

Merge branch 'develop' of https://github.com/dileep-gadiraju/python-webscraping-quickstart into develop

56401bcf

README.md 2.23 KiB

python-webscraping-quickstart

Python based Web-scraping Quick Start Project.

For Scraping the project uses Selenium & Scrapy framework.

Setup

Clone this repositary

 git clone "https://github.com/dileep-gadiraju/python-webscraping-quickstart"

After cloning, Install python packages by running the following command from ./src.

pip install -r "requirements.txt"

Start ElasticSearch,Kibana services as docker-containers.

(refer: https://www.elastic.co/guide/en/kibana/current/docker.html)
Import API-collections from ./test for REST client tool.
Set required global variables
Run below command from ./src to start the Server.

python app.py

Successful local deployment should show Server is up on port 5001.

Documentation

For Scripting and configuration documentation, refer Documentation.

API Reference

Get all Agents

 GET /general/agents

No paramenters Required

Start a Scraping Job

  POST /general/run

The following are mandatory Request Body Parameters

Parameter	Type	Description
`agentId`	`string`	`Valid AGENT-ID`
`type`	`string`	`Valid Type Of JOB`
`search`	`string`	`my search query`

Get Job Status

 GET /general/status

Parameter	Type	Description
`JobId`	`string`	`(required) uuid of a job`

API Authorization

Currently the projects uses basic aurthorization for authentication.

Set the following environment_variable:

Variables	Type	Description
`BASIC_HTTP_USERNAME`	`string`	username for server
`BASIC_HTTP_PASSWORD`	`string`	password for server

Authors

Menu

Explore