Commit 786adeb1 authored by pushkar191098's avatar pushkar191098
Browse files

Addition: Documentation

1 merge request!1addition: main_server_code, scripts, docs
Showing with 671 additions and 49 deletions
+671 -49
......@@ -35,7 +35,7 @@ python app.py
Successful local deployment should show Server is up on port 5001.
## Documentation
For Scripting and configuration documentation, refer `./docs` folder
For Scripting and configuration documentation, refer `./docs` folder. [Go to Documentation](docs/README.md)
## API Reference
......
# Configuration README
[Configure config.py](config.md)
[Configure agents](agents.md)
[Configure azure](azure.md)
[Configure Environment Variables](env-variables.md)
[Configure ElasticSearch Log](eslog.md)
[Configure scripts.py](scripts.md)
\ No newline at end of file
# Agent Configurations
To include new agents, add the agent_data to `/static/agents.json`
format:
```
{
"agentId": "MY-AGENT-1",
"description": "Crawler For my_agent_1",
"provider": "AGENT-PROVIDER-X",
"URL": "https://www.my-agent.com",
"scripts": {
"scriptType1": "myAgentScript1",
"scriptType2": "myAgentScript2",
"scriptType3": "myAgentScript3",
...
}
}
```
example:
```
[
{
"agentId": "APPLIED-SELENIUM",
"description": "Crawler For Applied",
"provider": "Applied",
"URL": "https://www.applied.com",
"scripts": {
"info": "AppliedSelenium",
"pdf": "AppliedSelenium"
}
},
{
"agentId": "GRAINGER-SELENIUM",
"description": "Crawler For Grainger",
"provider": "Grainger",
"URL": "https://www.grainger.com",
"scripts": {
"info": "GraingerSelenium",
"pdf": "GraingerSelenium"
}
}
]
```
\ No newline at end of file
docs/azure.md 0 → 100644
# Azure
1. Initialize BlobStorage object.
```
blob_storage = BlobStorage()
```
2. Set the folder for storage.
```
blob_storage.set_agent_folder(folder_name)
```
arguments:
* folder_name : Name of the folder.
3. Upload it to BlobStorage.
```
b_status, b_str = blob_storage.upload_file(file_name, data, overwrite)
```
arguments:
* file_name : Name of the file.
* data : data to be uploaded.
* overwrite : (boolean), flag for overwriting the data to BlobStorage.
return:
* b_status : (boolean), if the data has uploaded or not.
* b_str : Exception if the data is not uploaded.
# Configure config.py
* [Server configuration](#Server-configuration)
* [Agent configuration](#Agent-configuration)
* [AzureBlob configuration](#AzureBlob-configuration)
* [ElasticSearch variables](#ElasticSearch-DB-variables)
* [Logging configuration](#Logging-configuration)
## Server configuration
| Variables | Type | Description |
| :-------- | :------- | :------------------------- |
| `SERVER_HOST` | `string` | host for Server |
| `SERVER_PORT` | `string` | port for Server |
| `SERVER_DEBUG` | `bool` | debugging for Server |
| `SERVER_CORS` | `bool` | CORS policy for Server |
| `SERVER_STATIC_PATH` | `string` | static folder path for Server |
| `API_URL_PREFIX` | `string` | url prefix for Server |
| `API_MANDATORY_PARAMS`| `list` | mandatory parameters for request |
| `BASIC_HTTP_USERNAME` | `string` | username to access Server |
| `BASIC_HTTP_PASSWORD` | `string` | password to access Server |
## Agent configuration
| Variables | Type | Description |
| :-------- | :------- | :------------------------- |
| `AGENT_SCRIPT_TYPES` | `dict` | types of scraping_scripts |
| `AGENT_CONFIG_PATH` | `string` | file_path for agent_configuration(json file) |
| `AGENT_CONFIG_PKL_PATH`| `string` | file_path for agent_configuration(pickle file) |
## AzureBlob configuration
| Variables | Type | Description |
| :-------- | :------- | :------------------------- |
| `BLOB_INTIGRATION` | `bool` | enable/disable AzureBlob Storage |
| `BLOB_SAS_TOKEN` | `string` | SAS Token for AzureBlob Storage |
| `BLOB_ACCOUNT_URL` | `string` | Account URL for AzureBlob Storage|
| `BLOB_CONTAINER_NAME` | `string` | Container for AzureBlob Storage |
## ElasticSearch DB variables
| Variables | Type | Description |
| :-------- | :------- | :------------------------- |
| `ELASTIC_DB_URL` | `string` | URL of ElasticSearch Server |
| `ES_LOG_INDEX` | `string` | Info Logging Index in ElasticSearch |
| `ES_JOB_INDEX` | `string` | Job Logging Index in ElasticSearch |
| `ES_DATA_INDEX` | `string` | Data Logging Index in ElasticSearch |
## Logging configuration
| Variables | Type | Description |
| :-------- | :------- | :------------------------- |
| `JOB_OUTPUT_PATH` | `string` | folder_path for JOB output |
| `MAX_RUNNING_JOBS` | `int` | Max No. of Running Jobs |
| `MAX_WAITING_JOBS` | `int` | Max No. of Waiting Jobs |
| `JOB_RUNNING_STATUS` | `string` | Status for Running Jobs |
| `JOB_COMPLETED_SUCCESS_STATUS`| `string` | Status for Successfull Jobs |
| `JOB_COMPLETED_FAILED_STATUS` | `string` | Status for Failed Jobs |
# Environment-Variables
The Following are supported Environment-Variables
| Variables | Type | Description |
| :-------- | :------- | :------------------------- |
| `BASIC_HTTP_PASSWORD` | `string` | username for server |
| `BASIC_HTTP_USERNAME` | `string` | password for server |
| `ELASTIC_DB_URL` | `string` | URL of elasticsearch_DB |
| `BLOB_SAS_TOKEN` | `string` | azure blob_storage SAS token |
| `BLOB_ACCOUNT_URL` | `string` | azure blob_storage account_URL |
| `BLOB_CONTAINER_NAME` | `string` | azure blob_storage container_name|
| `MAX_RUNNING_JOBS` | `int` | maximum jobs running at a time |
| `MAX_WAITING_JOBS` | `int` | maximum jobs waiting at a time |
docs/eslog.md 0 → 100644
# ElasticSearch Log
* Initialize Log object.
```
log = Log(agentRunContext)
```
* Types of logs:
1. log.job : it shows the job status, logs are shown in `general-job-stats`
Syntax:
```
log.job(status, message)
```
Examples:
```
log.job(config.JOB_RUNNING_STATUS, 'job Started')
# your code goes here
try:
log.job(config.JOB_COMPLETED_SUCCESS_STATUS, 'Job Completed')
except:
log.job(config.JOB_COMPLETED_FAILED_STATUS, 'Job Failed')
```
2. log.info : it shows the job info, logs are shown in `general-app-logs`
Syntax:
```
log.info(info_type, message)
```
Examples:
```
log.info('info', 'This is generalization project')
log.info('warning', 'Script is taking more than usual time')
log.info('exception', 'No Products Available')
```
3. log.data : it shows the job data, logs are shown in `general-acrawled-data`
Syntax:
```
log.data(data)
```
Example:
```
data = {
"A" : "123",
"B" : "Generic Project"
}
log.data(data)
```
# Scripts
1. Create `python_file` in the respective scriptType folder in `./src/scripts`.
2. Format of the script `my_agent_script.py`.
```
# imports
# create a function
def myAgentScript(agentRunContext):
log = Log(agentRunContext)
try:
log.job(config.JOB_RUNNING_STATUS, Job Started')
# Your script
# Goes here
log.job(config.JOB_COMPLETED_SUCCESS_STATUS, Successfully Scraped Dats')
except Exception as e:
log.job(config.JOB_COMPLETED_FAILED_STATUS, str(e))
log.info('exception', traceback.format_exc())
```
3. Add script to `init.py` as
```
from .my_agent_script import myAgentScript
```
......@@ -5,34 +5,36 @@ from azure.storage.blob import BlobServiceClient
class BlobStorage(object):
def __init__(self):
self.blob_service_client = BlobServiceClient(account_url=config.BLOB_ACCOUNT_URL, credential=config.BLOB_SAS_TOKEN)
def __init__(self,overwrite=False):
self.blob_service_client = BlobServiceClient(
account_url=config.BLOB_ACCOUNT_URL, credential=config.BLOB_SAS_TOKEN)
self.root_folder = None
self.overwrite = overwrite
@property
def root_folder(self):
return self._root_folder
@root_folder.setter
def root_folder(self,rf):
def root_folder(self, rf):
self._root_folder = rf
@property
def blob_service_client(self):
return self._blob_service_client
@blob_service_client.setter
def blob_service_client(self,bsc):
def blob_service_client(self, bsc):
self._blob_service_client = bsc
def set_agent_folder(self,agent_id):
self.root_folder = agent_id
def set_agent_folder(self, agent_folder):
self.root_folder = agent_folder
def upload_file(self,file_name,file_contents):
upload_file_path = os.path.join(self.root_folder,file_name)
blob_client = self.blob_service_client.get_blob_client(container=config.BLOB_CONTAINER_NAME,blob=upload_file_path)
blob_client = self.blob_service_client.get_blob_client(container=config.CONTAINER_NAME,blob=upload_file_path)
try:
blob_client.upload_blob(file_contents)
blob_client.upload_blob(file_contents,overwrite=self.overwrite)
except Exception as e:
return False,str(e)
return True,'true'
# scrapy config goes here !
\ No newline at end of file
import threading
import time
import uuid
from concurrent.futures import ThreadPoolExecutor
import config
from common.elastic_wrapper import Log
......@@ -10,8 +9,7 @@ from models import AgentUtils
class AgentRepo:
def __init__(self):
self.agentUtils = AgentUtils()
self.activeThreads = []
self.waitThreads = []
self.executor = ThreadPoolExecutor(max_workers=config.MAX_RUNNING_JOBS)
def list(self, filepath):
self.agentUtils.filepath = filepath
......@@ -20,29 +18,6 @@ class AgentRepo:
agent.pop('scripts')
return result
def waitAndStart(self, agentRunContext, target_script):
# log waiting state
log = Log(agentRunContext)
log.job(config.JOB_RUNNING_STATUS, "JOB in waiting state.")
del log
# code to check and run if activeThreads is empty
while True:
if len(self.activeThreads) < config.MAX_RUNNING_JOBS:
self.activeThreads.append(agentRunContext.jobId)
self.waitThreads.remove(agentRunContext.jobId)
thread = threading.Thread(target=target_script, args=(
agentRunContext,), name=agentRunContext.jobId)
thread.start()
# check if thread alive
while thread.is_alive():
time.sleep(10)
# remove thread after completion
self.activeThreads.remove(agentRunContext.jobId)
break
else:
time.sleep(10)
return None
def run(self, agentRunContext, filepath):
threadStarted = False
agentRunContext.jobId = str(uuid.uuid4())
......@@ -53,11 +28,12 @@ class AgentRepo:
if agent['agentId'] == agentRunContext.requestBody['agentId']:
agentRunContext.URL = agent['URL']
threadStarted = True
if len(self.waitThreads) < config.MAX_WAITING_JOBS:
self.waitThreads.append(agentRunContext.jobId)
thread = threading.Thread(target=self.waitAndStart, args=(
agentRunContext, agent['scripts'][config.AGENT_SCRIPT_TYPES[agentRunContext.jobType]]), name=str('wait-'+agentRunContext.jobId))
thread.start()
if self.executor._work_queue.qsize() < config.MAX_WAITING_JOBS:
log = Log(agentRunContext)
log.job(config.JOB_RUNNING_STATUS, "JOB in waiting state.")
del log
self.executor.submit(
agent['scripts'][config.AGENT_SCRIPT_TYPES[agentRunContext.jobType]], agentRunContext)
else:
return {'message': 'Already many jobs are in Waiting ... Please retry after some time.'}
if threadStarted:
......
......@@ -15,4 +15,4 @@ python-dateutil==2.8.1
beautifulsoup4==4.9.3
azure-storage-blob==12.10.0b1
lxml==4.5.1
scrapy==2.6.1
# Scrapy
from .applied_scrapy import AppliedScrapy
from .grainger_scrapy import GraingerScrapy
# Selenium
from .applied_selenium import AppliedSelenium
\ No newline at end of file
from .applied_selenium import AppliedSelenium
from .grainger_selenium import GraingerSelenium
\ No newline at end of file
import config
import scrapy
from common import Log
from scrapy.crawler import CrawlerRunner
from twisted.internet import reactor
# search_param=do630 voltage regulator (via category list)
# search_param=do 360 voltage (via product list)
# search_param=61HH68 (via direct product page)
null = 'null'
true = 'true'
false = 'false'
def GraingerScrapy(agentRunContext):
log = Log(agentRunContext)
class GraingerScrapy(scrapy.Spider):
name = 'GraingerScrapy'
user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
main_url = 'https://www.grainger.com/'
def __init__(self, search_param):
self.start_urls = [
"https://www.grainger.com/search?searchQuery="+search_param]
super().__init__()
def parse(self, response):
if 'search?' not in response.url:
yield scrapy.Request(url=response.url, callback=self.collect_data)
else:
if len(response.css('section[aria-label="Category products"]')) > 0:
script = [i.strip() for i in response.css('script::text').extract(
) if i.strip().startswith('window.__PRELOADED_STATE__')][0]
script = eval(script.split(
'=', 1)[-1].split('window.__UI_CONFIG__')[0].strip()[:-1])
products = list(script['category']['category']
['skuToProductMap'].keys())
href = '/product/info?productArray='+','.join(products)
yield scrapy.Request(url=self.main_url+href, callback=self.get_products)
else:
# iterate every categories
for href in response.css('a.route::attr(href)').extract():
yield scrapy.Request(url=self.main_url+href, callback=self.parse_category_page)
def parse_category_page(self, response):
script = [i.strip() for i in response.css('script::text').extract(
) if i.strip().startswith('window.__PRELOADED_STATE__')][0]
script = eval(script.split('=', 1)
[-1].split('window.__UI_CONFIG__')[0].strip()[:-1])
cat_id = script['category']['category']['id']
for i in script['category']['collections']:
coll_id = i['id']
url1 = self.main_url + \
'/experience/pub/api/products/collection/{0}?categoryId={1}'
yield scrapy.Request(url=url1.format(coll_id, cat_id), callback=self.get_products)
def get_products(self, response):
data = response.json()
if 'products' in data.keys():
for i in data['products']:
yield scrapy.Request(url=self.main_url+i['productDetailUrl'], callback=self.collect_data)
else:
for i in data.values():
if type(i) == dict and 'productDetailUrl' in i.keys():
yield scrapy.Request(url=self.main_url+i['productDetailUrl'], callback=self.collect_data)
def collect_data(self, response):
data = dict()
main_content = response.css('.product-detail__content--large')
spec = response.css('.specifications')
data = {
'brand': main_content.css('.product-detail__brand--link::text').get().strip(),
'product-heading': main_content.css('.product-detail__heading::text').get().strip(),
'url': response.url
}
for li in main_content.css('.product-detail__product-identifiers-content'):
key = li.css(
'.product-detail__product-identifiers-label::text').get().strip()
value = li.css(
'.product-detail__product-identifiers-description::text').extract()
value = [str(i).strip() for i in value] if len(
value) > 1 else str(value[0]).strip()
data[key] = value
for li in spec.css('.specifications__item'):
key = li.css('.specifications__description::text').get()
value = li.css('.specifications__value::text').extract()
value = [str(i).strip() for i in value] if len(
value) > 1 else str(value[0]).strip()
data[key] = value
log.data(data)
log.job(config.JOB_RUNNING_STATUS, 'Job Started')
runner = CrawlerRunner()
d = runner.crawl(
GraingerScrapy, search_param=agentRunContext.requestBody.get('search'))
d.addBoth(lambda _: reactor.stop())
reactor.run()
log.job(config.JOB_COMPLETED_SUCCESS_STATUS,
'Successfully scraped all data')
\ No newline at end of file
import os
import time
import traceback
import config
from common import Log, get_driver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
def GraingerSelenium(agentRunContext):
log = Log(agentRunContext)
log.job(config.JOB_RUNNING_STATUS, 'Job Started')
log.job(config.JOB_RUNNING_STATUS, 'Script Under Development')
log.job(config.JOB_COMPLETED_SUCCESS_STATUS,
'Successfully scraped all data')
# Scrapy
from .applied_scrapy import AppliedScrapy
from .grainger_scrapy import GraingerScrapy
# Selenium
from .applied_selenium import AppliedSelenium
\ No newline at end of file
from .applied_selenium import AppliedSelenium
from .grainger_selenium import GraingerSelenium
\ No newline at end of file
import scrapy
from scrapy.pipelines.files import FilesPipeline
# search_param=do630 voltage regulator (via category list)
# search_param=do 360 voltage (via product list)
# search_param=61HH68 (via direct product page)
# variables for eval() to parse
null = 'null'
true = 'true'
false = 'false'
def GraingerScrapy(agentRunContext):
class GeneralFilesItem(scrapy.Item):
file_name = scrapy.Field()
file_urls = scrapy.Field()
files = scrapy.Field
class GenreralFilesPipeline(FilesPipeline):
def get_media_requests(self, item, info):
for my_url in item.get('file_urls', []):
yield scrapy.Request(my_url, meta={'file_name': item.get('file_name')})
def file_path(self, request, response=None, info=None):
return request.meta['file_name']
class GriengerPDFScrapy(scrapy.Spider):
name = 'GriengerPDFScrapy'
user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
main_url = 'https://www.grainger.com/'
custom_settings = {
'ITEM_PIPELINES': {'grienger_scrapy_pdf.GenreralFilesPipeline': 1},
'FILES_STORE': '/home/test/Music/down/'
}
def __init__(self, agentRunContext):
self.start_urls = [
"https://www.grainger.com/search?searchQuery="+agentRunContext.requestBody['search']]
super().__init__()
self
def parse(self, response):
if 'search?' not in response.url:
yield scrapy.Request(url=response.url, callback=self.collect_data)
else:
if len(response.css('section[aria-label="Category products"]')) > 0:
script = [i.strip() for i in response.css('script::text').extract(
) if i.strip().startswith('window.__PRELOADED_STATE__')][0]
script = eval(script.split(
'=', 1)[-1].split('window.__UI_CONFIG__')[0].strip()[:-1])
products = list(script['category']['category']
['skuToProductMap'].keys())
href = '/product/info?productArray='+','.join(products)
yield scrapy.Request(url=self.main_url+href, callback=self.get_products)
else:
# iterate every categories
for href in response.css('a.route::attr(href)').extract():
yield scrapy.Request(url=self.main_url+href, callback=self.parse_category_page)
def parse_category_page(self, response):
script = [i.strip() for i in response.css('script::text').extract(
) if i.strip().startswith('window.__PRELOADED_STATE__')][0]
script = eval(script.split('=', 1)
[-1].split('window.__UI_CONFIG__')[0].strip()[:-1])
cat_id = script['category']['category']['id']
for i in script['category']['collections']:
coll_id = i['id']
url1 = self.main_url + \
'/experience/pub/api/products/collection/{0}?categoryId={1}'
yield scrapy.Request(url=url1.format(coll_id, cat_id), callback=self.get_products)
def get_products(self, response):
data = response.json()
if 'products' in data.keys():
for i in data['products']:
yield scrapy.Request(url=self.main_url+i['productDetailUrl'], callback=self.collect_data)
else:
for i in data.values():
if type(i) == dict and 'productDetailUrl' in i.keys():
yield scrapy.Request(url=self.main_url+i['productDetailUrl'], callback=self.collect_data)
def collect_data(self, response):
data = dict()
main_content = response.css('.product-detail__content--large')
for li in main_content.css('.product-detail__product-identifiers-content'):
key = li.css(
'.product-detail__product-identifiers-label::text').get().strip()
value = li.css(
'.product-detail__product-identifiers-description::text').extract()
value = [str(i).strip() for i in value] if len(
value) > 1 else str(value[0]).strip()
data[key] = value
for a_tag in response.css('a.documentation__link'):
a_href = a_tag.xpath('./@href').get()
a_name = a_tag.xpath('./@title').get().strip()
filename = data['Item #']+'-'+a_name+'.'+a_href.split('.')[-1]
item = GeneralFilesItem()
item['file_name'] = filename
item['file_urls'] = ['https:'+a_href]
yield item
import os
import shutil
import time
import traceback
import config
from common import Log, get_driver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
def single_product(log, driver, download_dir, new_output_dir):
try:
doc_section = driver.find_elements(
By.XPATH, '//ul[@class="documentation__content"]//li')
for link in doc_section:
download_link = link.find_element_by_tag_name(
'a').get_attribute('href')
product_name = str(driver.current_url).split('-')[-1].strip()
try:
product_name = product_name.split('?')[1].strip
except:
pass
driver.switch_to.new_window()
driver.get(download_link)
time.sleep(5)
file_name = os.listdir(download_dir)[0]
new_file_name = product_name + "-" + file_name
os.rename(os.path.join(download_dir, file_name),
os.path.join(download_dir, new_file_name))
shutil.move(os.path.join(download_dir, new_file_name),
os.path.join(new_output_dir, new_file_name))
log.info('info', '{0} Downloaded'.format(new_file_name))
time.sleep(2)
driver.close()
driver.switch_to.window(driver.window_handles[2])
except:
log.info('exception', traceback.format_exc())
def multi_product(log, wait, driver, download_dir, new_output_dir):
# Collecting details for all products available
wait.until(EC.visibility_of_element_located(
(By.XPATH, '//div[@class = "multi-tiered-category"]')))
all_product = driver.find_elements_by_xpath(
'//div[@class = "multi-tiered-category"]//ul//li/a')
all_product = [i.get_attribute('href') for i in all_product]
c_url = driver.current_url
for p_url in all_product:
driver.switch_to.new_window()
driver.get(p_url)
time.sleep(2)
try:
wait.until(EC.element_to_be_clickable(
(By.XPATH, '//div[@id="feedbackBrowseModal"]//div[@class="modal-footer"]//a[@class = "close"]')))
driver.find_element_by_xpath(
'//div[@id="feedbackBrowseModal"]//div[@class="modal-footer"]//a[@class = "close"]').click()
time.sleep(2)
except:
pass
for a_tag in driver.find_elements(By.XPATH, "//tbody//a"):
product_url = str(a_tag.get_attribute('href'))
driver.switch_to.new_window()
driver.get(product_url)
time.sleep(2)
single_product(log, driver, download_dir, new_output_dir)
driver.close()
driver.switch_to.window(driver.window_handles[1])
driver.close()
driver.switch_to.window(driver.window_handles[0])
driver.get(c_url)
time.sleep(5)
def GraingerSelenium(agentRunContext):
log = Log(agentRunContext)
try:
download_dir_id = str(agentRunContext.jobId)
download_dir = os.path.join(
os.getcwd(), 'temp', 'temp-' + download_dir_id)
# Creating an output directory for storing PDFs
try:
os.mkdir(os.path.normpath(os.getcwd() +
os.sep + os.pardir) + '\\output')
except:
pass
output_dir = os.path.normpath(
os.getcwd() + os.sep + os.pardir) + '\\output\\'
os.mkdir(output_dir + download_dir_id)
new_output_dir = os.path.join(output_dir, download_dir_id)
driver = get_driver(download_dir)
driver.maximize_window()
driver.get(agentRunContext.URL)
wait = WebDriverWait(driver, 20)
log.job(config.JOB_RUNNING_STATUS, 'Job Started')
# Inputing Search-Param
driver.find_element_by_xpath(
'//input[@aria-label="Search Query"]').send_keys(agentRunContext.requestBody['search'])
time.sleep(2)
driver.find_element_by_xpath(
'//button[@aria-label="Submit Search Query"]').click()
time.sleep(5)
check_url = str(driver.current_url)
# If multi_products are there in search params
if '?search' in check_url:
multi_product(log, wait, driver, download_dir, new_output_dir)
# If single_products are there in search params
else:
single_product(log, driver, download_dir, new_output_dir)
log.job(config.JOB_RUNNING_STATUS, 'Downloaded All Invoices')
except Exception as e:
log.job(config.JOB_COMPLETED_FAILED_STATUS, str(e))
log.info('exception', traceback.format_exc())
driver.quit()
......@@ -18,5 +18,25 @@
"info": "AppliedScrapy",
"pdf": "AppliedScrapy"
}
},
{
"agentId": "GRAINGER-SELENIUM",
"description": "Crawler For Grainger",
"provider": "Grainger",
"URL": "https://www.grainger.com",
"scripts": {
"info": "GraingerSelenium",
"pdf": "GraingerSelenium"
}
},
{
"agentId": "GRAINGER-SCRAPY",
"description": "Crawler For Grainger",
"provider": "Grainger",
"URL": "https://www.grainger.com",
"scripts": {
"info": "GraingerScrapy",
"pdf": "GraingerScrapy"
}
}
]
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment