From c13bb37e23ea37084729f77ee409e3b9ebaacda0 Mon Sep 17 00:00:00 2001 From: Pushkar Chauhan <pushkar.mahi8@gmail.com> Date: Fri, 27 May 2022 16:33:15 +0530 Subject: [PATCH] Update README.md Added Setup, Documentation, API Reference, Authors --- README.md | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 77 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 075bb0a..c65a4d5 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,78 @@ + + + # python-crawler-quickstart -Python based Web crawler Quick Start Project + +Python based Web crawler Quick Start Project. + +For Scraping the project uses Selenium & Scrapy framework. + + +## Setup + +* Clone this repositary +``` + git clone "https://github.com/dileep-gadiraju/python-webscraping-quickstart" +``` + +* After cloning, Install python packages by running the following command from `./src`. +``` +pip install -r "requirements.txt" +``` + +* Start ElasticSearch,Kibana services as docker-containers. + + (refer: https://www.elastic.co/guide/en/kibana/current/docker.html) + +* Import API-collections from `./test` for REST client tool. + +* Set required global variables + +* Run below command from `./src` to start the Server. +``` +python app.py +``` +Successful local deployment should show Server is up on port 5001. +## Documentation + +For Scripting and configuration documentation, refer `./docs` folder + +## API Reference + +#### Get all Agents + +``` + GET /general/agents +``` +_No paramenters Required_ + +#### Start a Scraping Job + +``` + POST /general/run +``` +_The following are mandatory Request Body Parameters_ +| Parameter | Type | Description | +| :-------- | :------- | :-------------------------------- | +| `agentId` | `string` | `Valid AGENT-ID` | +| `type` | `string` | `Valid Type Of JOB` | +| `search` | `string` | `my search query` | + + +#### Get Job Status + +``` + GET /general/status +``` + +| Parameter | Type | Description | +| :-------- | :------- | :-------------------------------- | +| `JobId` | `string` | `(required) uuid of a job` | + +## Authors + +- [@dileep-gadiraju](https://github.com/dileep-gadiraju) +- [@Pushkar-Chauhan](https://github.com/Pushkar191098) +- [@dhiru579](https://github.com/dhiru579) +- [@ArchakGAmruth](https://github.com/ArchakGAmruth) + -- GitLab