Addition: Documentation

786adeb1 · pushkar191098 · df95d0bc · 786adeb1 · 786adeb1 · 786adeb1
Commit 786adeb1 authored 2 years ago by pushkar191098
Hide whitespace changes
Inline Side-by-side

Showing

with 671 additions and 49 deletions
+671 -49
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ python app.py
 Successful local deployment should show Server is up on port 5001.
 ## Documentation

-For Scripting and configuration documentation, refer `./docs` folder
+For Scripting and configuration documentation, refer `./docs` folder. [Go to Documentation](docs/README.md)

 ## API Reference


--- a/docs/README.md
+++ b/docs/README.md
+# Configuration README
+
+[Configure config.py](config.md)
+
+[Configure agents](agents.md)
+
+[Configure azure](azure.md)
+
+[Configure Environment Variables](env-variables.md)
+
+[Configure ElasticSearch Log](eslog.md)
+
+[Configure scripts.py](scripts.md)
\ No newline at end of file
--- a/docs/agents.md
+++ b/docs/agents.md
+
+# Agent Configurations
+
+To include new agents, add the agent_data to `/static/agents.json`
+
+format: 
+
+```
+    {
+        "agentId": "MY-AGENT-1",
+        "description": "Crawler For my_agent_1",
+        "provider": "AGENT-PROVIDER-X",
+        "URL": "https://www.my-agent.com",
+        "scripts": {
+            "scriptType1": "myAgentScript1",
+            "scriptType2": "myAgentScript2",
+            "scriptType3": "myAgentScript3",
+            ...
+        }
+    }
+```
+
+example: 
+
+```
+    [
+        {
+            "agentId": "APPLIED-SELENIUM",
+            "description": "Crawler For Applied",
+            "provider": "Applied",
+            "URL": "https://www.applied.com",
+            "scripts": {
+                "info": "AppliedSelenium",
+                "pdf": "AppliedSelenium"
+            }
+        },
+        {
+            "agentId": "GRAINGER-SELENIUM",
+            "description": "Crawler For Grainger",
+            "provider": "Grainger",
+            "URL": "https://www.grainger.com",
+            "scripts": {
+                "info": "GraingerSelenium",
+                "pdf": "GraingerSelenium"
+            }
+        }
+    ]
+```
\ No newline at end of file
--- a/docs/azure.md
+++ b/docs/azure.md
+# Azure
+
+1. Initialize BlobStorage object.
+```
+blob_storage = BlobStorage()
+```
+
+2. Set the folder for storage.
+```
+blob_storage.set_agent_folder(folder_name)
+```
+arguments:
+    
+* folder_name : Name of the folder.
+
+
+3. Upload it to BlobStorage.
+
+```
+b_status, b_str = blob_storage.upload_file(file_name, data, overwrite)
+```
+arguments:
+    
+* file_name : Name of the file.
+* data : data to be uploaded.
+* overwrite : (boolean), flag for overwriting the data to BlobStorage.
+
+return:
+
+* b_status : (boolean), if the data has uploaded or not.
+* b_str : Exception if the data is not uploaded.
--- a/docs/config.md
+++ b/docs/config.md
+
+# Configure config.py
+
+* [Server configuration](#Server-configuration)
+* [Agent configuration](#Agent-configuration)
+* [AzureBlob configuration](#AzureBlob-configuration)
+* [ElasticSearch variables](#ElasticSearch-DB-variables)
+* [Logging configuration](#Logging-configuration)
+
+## Server configuration
+
+| Variables             | Type      | Description                       |
+| :--------             | :-------  | :-------------------------        |
+| `SERVER_HOST`         | `string`  |  host for Server                  |
+| `SERVER_PORT`         | `string`  |  port for Server                  |
+| `SERVER_DEBUG`        | `bool`    |  debugging for Server             |
+| `SERVER_CORS`         | `bool`    |  CORS policy for Server           |
+| `SERVER_STATIC_PATH`  | `string`  |  static folder path for Server    |
+| `API_URL_PREFIX`      | `string`  |  url prefix for Server            |
+| `API_MANDATORY_PARAMS`| `list`    |  mandatory parameters for request |
+| `BASIC_HTTP_USERNAME` | `string`  |  username to access Server        |
+| `BASIC_HTTP_PASSWORD` | `string`  |  password to access Server        |
+
+## Agent configuration
+
+| Variables              | Type      | Description                                     |
+| :--------              | :-------  | :-------------------------                      |
+| `AGENT_SCRIPT_TYPES`   | `dict`    |  types of scraping_scripts                      |
+| `AGENT_CONFIG_PATH`    | `string`  |  file_path for agent_configuration(json file)   |
+| `AGENT_CONFIG_PKL_PATH`| `string`  |  file_path for agent_configuration(pickle file) |
+
+## AzureBlob configuration
+
+| Variables             | Type     | Description                       |
+| :--------             | :------- | :-------------------------        |
+| `BLOB_INTIGRATION`    | `bool`   |  enable/disable AzureBlob Storage |
+| `BLOB_SAS_TOKEN`      | `string` |  SAS Token for AzureBlob Storage  |
+| `BLOB_ACCOUNT_URL`    | `string` |  Account URL for AzureBlob Storage|
+| `BLOB_CONTAINER_NAME` | `string` |  Container for AzureBlob Storage  |
+
+## ElasticSearch DB variables
+
+| Variables        | Type     | Description                          |
+| :--------        | :------- | :-------------------------           |
+| `ELASTIC_DB_URL` | `string` |  URL of ElasticSearch Server         |
+| `ES_LOG_INDEX`   | `string` |  Info Logging Index in ElasticSearch |
+| `ES_JOB_INDEX`   | `string` |  Job  Logging Index in ElasticSearch |
+| `ES_DATA_INDEX`  | `string` |  Data Logging Index in ElasticSearch |
+
+## Logging configuration
+
+| Variables                     | Type     | Description                    |
+| :--------                     | :------- | :-------------------------     |
+| `JOB_OUTPUT_PATH`             | `string` |  folder_path for JOB output    |
+| `MAX_RUNNING_JOBS`            | `int`    |  Max No. of Running Jobs       |
+| `MAX_WAITING_JOBS`            | `int`    |  Max No. of Waiting Jobs       |
+| `JOB_RUNNING_STATUS`          | `string` |  Status for Running Jobs       |
+| `JOB_COMPLETED_SUCCESS_STATUS`| `string` |  Status for Successfull Jobs   |
+| `JOB_COMPLETED_FAILED_STATUS` | `string` |  Status for Failed Jobs        |
--- a/docs/env-variables.md
+++ b/docs/env-variables.md
+
+# Environment-Variables
+
+The Following are supported Environment-Variables
+
+| Variables             | Type      | Description                       |
+| :--------             | :-------  | :-------------------------        |
+| `BASIC_HTTP_PASSWORD` | `string`  |  username for server              |
+| `BASIC_HTTP_USERNAME` | `string`  |  password for server              |
+| `ELASTIC_DB_URL`      | `string`  |  URL of elasticsearch_DB          |
+| `BLOB_SAS_TOKEN`      | `string`  |  azure blob_storage SAS token     |
+| `BLOB_ACCOUNT_URL`    | `string`  |  azure blob_storage account_URL   |
+| `BLOB_CONTAINER_NAME` | `string`  |  azure blob_storage container_name|
+| `MAX_RUNNING_JOBS`    | `int`     |  maximum jobs running at a time   |
+| `MAX_WAITING_JOBS`    | `int`     |  maximum jobs waiting at a time   |
--- a/docs/eslog.md
+++ b/docs/eslog.md
+
+# ElasticSearch Log
+
+* Initialize Log object.
+```
+log = Log(agentRunContext)
+```
+
+* Types of logs:
+    
+    1. log.job : it shows the job status, logs are shown in `general-job-stats`
+        
+        Syntax:
+        ```
+        log.job(status, message)
+        ```
+        
+        Examples:
+        ```
+        log.job(config.JOB_RUNNING_STATUS, 'job Started')
+        # your code goes here
+        try:
+            log.job(config.JOB_COMPLETED_SUCCESS_STATUS, 'Job Completed')
+        except:
+            log.job(config.JOB_COMPLETED_FAILED_STATUS, 'Job Failed')
+        ```
+
+    2. log.info : it shows the job info, logs are shown in `general-app-logs`
+
+        Syntax:
+        ```
+        log.info(info_type, message)
+        ```
+        Examples:
+        ```
+        log.info('info', 'This is generalization project')
+        log.info('warning', 'Script is taking more than usual time')
+        log.info('exception', 'No Products Available')
+        ```
+    3. log.data : it shows the job data, logs are shown in `general-acrawled-data`
+        
+        Syntax:
+        ```
+        log.data(data)
+        ```
+        Example:
+        ```
+        data = {
+            "A" : "123",
+            "B" : "Generic Project"
+        }
+        log.data(data)
+        ```
--- a/docs/scripts.md
+++ b/docs/scripts.md
+
+# Scripts
+
+1. Create `python_file` in the respective scriptType folder in `./src/scripts`.
+
+2. Format of the script `my_agent_script.py`.
+```
+# imports
+
+# create a function
+def myAgentScript(agentRunContext):
+    log = Log(agentRunContext)
+    try:
+    
+        log.job(config.JOB_RUNNING_STATUS, Job Started')
+
+        # Your script
+        # Goes here
+
+        log.job(config.JOB_COMPLETED_SUCCESS_STATUS, Successfully Scraped Dats')
+
+    except Exception as e:
+        log.job(config.JOB_COMPLETED_FAILED_STATUS, str(e))
+        log.info('exception', traceback.format_exc())
+
+```
+3. Add script to `init.py` as 
+```
+from .my_agent_script import myAgentScript
+```
+
+
--- a/src/common/blob_storage.py
+++ b/src/common/blob_storage.py
@@ -5,34 +5,36 @@ from azure.storage.blob import BlobServiceClient


 class BlobStorage(object):
-    def __init__(self):
-        self.blob_service_client = BlobServiceClient(account_url=config.BLOB_ACCOUNT_URL, credential=config.BLOB_SAS_TOKEN)
+    def __init__(self,overwrite=False):
+        self.blob_service_client = BlobServiceClient(
+            account_url=config.BLOB_ACCOUNT_URL, credential=config.BLOB_SAS_TOKEN)
        self.root_folder = None
-    
+        self.overwrite = overwrite
+
    @property
    def root_folder(self):
        return self._root_folder
-    
+
    @root_folder.setter
-    def root_folder(self,rf):
+    def root_folder(self, rf):
        self._root_folder = rf
-    
+
    @property
    def blob_service_client(self):
        return self._blob_service_client
-    
+
    @blob_service_client.setter
-    def blob_service_client(self,bsc):
+    def blob_service_client(self, bsc):
        self._blob_service_client = bsc

-    def set_agent_folder(self,agent_id):
-        self.root_folder = agent_id
-    
+    def set_agent_folder(self, agent_folder):
+        self.root_folder = agent_folder
+
    def upload_file(self,file_name,file_contents):
        upload_file_path = os.path.join(self.root_folder,file_name)
-        blob_client = self.blob_service_client.get_blob_client(container=config.BLOB_CONTAINER_NAME,blob=upload_file_path)
+        blob_client = self.blob_service_client.get_blob_client(container=config.CONTAINER_NAME,blob=upload_file_path)
        try:
-            blob_client.upload_blob(file_contents)
+            blob_client.upload_blob(file_contents,overwrite=self.overwrite)
        except Exception as e:
            return False,str(e)
        return True,'true'
--- a/src/common/scrapy_utils.py
+++ b/src/common/scrapy_utils.py
+ # scrapy config goes here !
\ No newline at end of file
--- a/src/repositories/agent.py
+++ b/src/repositories/agent.py
-import threading
-import time
 import uuid
+from concurrent.futures import ThreadPoolExecutor

 import config
 from common.elastic_wrapper import Log
@@ -10,8 +9,7 @@ from models import AgentUtils
 class AgentRepo:
    def __init__(self):
        self.agentUtils = AgentUtils()
-        self.activeThreads = []
-        self.waitThreads = []
+        self.executor = ThreadPoolExecutor(max_workers=config.MAX_RUNNING_JOBS)

    def list(self, filepath):
        self.agentUtils.filepath = filepath
@@ -20,29 +18,6 @@ class AgentRepo:
            agent.pop('scripts')
        return result

-    def waitAndStart(self, agentRunContext, target_script):
-        # log waiting state
-        log = Log(agentRunContext)
-        log.job(config.JOB_RUNNING_STATUS, "JOB in waiting state.")
-        del log
-        # code to check and run if activeThreads is empty
-        while True:
-            if len(self.activeThreads) < config.MAX_RUNNING_JOBS:
-                self.activeThreads.append(agentRunContext.jobId)
-                self.waitThreads.remove(agentRunContext.jobId)
-                thread = threading.Thread(target=target_script, args=(
-                    agentRunContext,), name=agentRunContext.jobId)
-                thread.start()
-                # check if thread alive
-                while thread.is_alive():
-                    time.sleep(10)
-                # remove thread after completion
-                self.activeThreads.remove(agentRunContext.jobId)
-                break
-            else:
-                time.sleep(10)
-        return None
-
    def run(self, agentRunContext, filepath):
        threadStarted = False
        agentRunContext.jobId = str(uuid.uuid4())
@@ -53,11 +28,12 @@ class AgentRepo:
            if agent['agentId'] == agentRunContext.requestBody['agentId']:
                agentRunContext.URL = agent['URL']
                threadStarted = True
-                if len(self.waitThreads) < config.MAX_WAITING_JOBS:
-                    self.waitThreads.append(agentRunContext.jobId)
-                    thread = threading.Thread(target=self.waitAndStart, args=(
-                        agentRunContext, agent['scripts'][config.AGENT_SCRIPT_TYPES[agentRunContext.jobType]]), name=str('wait-'+agentRunContext.jobId))
-                    thread.start()
+                if self.executor._work_queue.qsize() < config.MAX_WAITING_JOBS:
+                    log = Log(agentRunContext)
+                    log.job(config.JOB_RUNNING_STATUS, "JOB in waiting state.")
+                    del log
+                    self.executor.submit(
+                        agent['scripts'][config.AGENT_SCRIPT_TYPES[agentRunContext.jobType]], agentRunContext)
                else:
                    return {'message': 'Already many jobs are in Waiting ... Please retry after some time.'}
        if threadStarted:

--- a/src/requirements.txt
+++ b/src/requirements.txt
@@ -15,4 +15,4 @@ python-dateutil==2.8.1
 beautifulsoup4==4.9.3
 azure-storage-blob==12.10.0b1
 lxml==4.5.1
-
+scrapy==2.6.1
--- a/src/scripts/info/__init__.py
+++ b/src/scripts/info/__init__.py
 # Scrapy
 from .applied_scrapy import AppliedScrapy
+from .grainger_scrapy import GraingerScrapy

 # Selenium
-from .applied_selenium import AppliedSelenium
\ No newline at end of file
+from .applied_selenium import AppliedSelenium
+from .grainger_selenium import GraingerSelenium
\ No newline at end of file
--- a/src/scripts/info/grainger_scrapy.py
+++ b/src/scripts/info/grainger_scrapy.py
+import config
+import scrapy
+from common import Log
+from scrapy.crawler import CrawlerRunner
+from twisted.internet import reactor
+
+# search_param=do630 voltage regulator (via category list)
+# search_param=do 360 voltage (via product list)
+# search_param=61HH68 (via direct product page)
+
+null = 'null'
+true = 'true'
+false = 'false'
+
+
+
+def GraingerScrapy(agentRunContext):
+    
+    log = Log(agentRunContext)
+
+    class GraingerScrapy(scrapy.Spider):
+        name = 'GraingerScrapy'
+        user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
+        main_url = 'https://www.grainger.com/'
+
+        def __init__(self, search_param):
+            self.start_urls = [
+                "https://www.grainger.com/search?searchQuery="+search_param]
+            super().__init__()
+
+        def parse(self, response):
+            if 'search?' not in response.url:
+                yield scrapy.Request(url=response.url, callback=self.collect_data)
+            else:
+                if len(response.css('section[aria-label="Category products"]')) > 0:
+                    script = [i.strip() for i in response.css('script::text').extract(
+                    ) if i.strip().startswith('window.__PRELOADED_STATE__')][0]
+                    script = eval(script.split(
+                        '=', 1)[-1].split('window.__UI_CONFIG__')[0].strip()[:-1])
+                    products = list(script['category']['category']
+                                    ['skuToProductMap'].keys())
+                    href = '/product/info?productArray='+','.join(products)
+                    yield scrapy.Request(url=self.main_url+href, callback=self.get_products)
+                else:
+                    # iterate every categories
+                    for href in response.css('a.route::attr(href)').extract():
+                        yield scrapy.Request(url=self.main_url+href, callback=self.parse_category_page)
+
+        def parse_category_page(self, response):
+            script = [i.strip() for i in response.css('script::text').extract(
+            ) if i.strip().startswith('window.__PRELOADED_STATE__')][0]
+            script = eval(script.split('=', 1)
+                        [-1].split('window.__UI_CONFIG__')[0].strip()[:-1])
+            cat_id = script['category']['category']['id']
+            for i in script['category']['collections']:
+                coll_id = i['id']
+                url1 = self.main_url + \
+                    '/experience/pub/api/products/collection/{0}?categoryId={1}'
+                yield scrapy.Request(url=url1.format(coll_id, cat_id), callback=self.get_products)
+
+        def get_products(self, response):
+            data = response.json()
+            if 'products' in data.keys():
+                for i in data['products']:
+                    yield scrapy.Request(url=self.main_url+i['productDetailUrl'], callback=self.collect_data)
+            else:
+                for i in data.values():
+                    if type(i) == dict and 'productDetailUrl' in i.keys():
+                        yield scrapy.Request(url=self.main_url+i['productDetailUrl'], callback=self.collect_data)
+
+        def collect_data(self, response):
+            data = dict()
+            main_content = response.css('.product-detail__content--large')
+            spec = response.css('.specifications')
+            data = {
+                'brand': main_content.css('.product-detail__brand--link::text').get().strip(),
+                'product-heading': main_content.css('.product-detail__heading::text').get().strip(),
+                'url': response.url
+            }
+            for li in main_content.css('.product-detail__product-identifiers-content'):
+                key = li.css(
+                    '.product-detail__product-identifiers-label::text').get().strip()
+                value = li.css(
+                    '.product-detail__product-identifiers-description::text').extract()
+                value = [str(i).strip() for i in value] if len(
+                    value) > 1 else str(value[0]).strip()
+                data[key] = value
+
+            for li in spec.css('.specifications__item'):
+                key = li.css('.specifications__description::text').get()
+                value = li.css('.specifications__value::text').extract()
+                value = [str(i).strip() for i in value] if len(
+                    value) > 1 else str(value[0]).strip()
+                data[key] = value
+
+            log.data(data)
+
+    log.job(config.JOB_RUNNING_STATUS, 'Job Started')
+
+    runner = CrawlerRunner()
+
+    d = runner.crawl(
+        GraingerScrapy, search_param=agentRunContext.requestBody.get('search'))
+    d.addBoth(lambda _: reactor.stop())
+    reactor.run()
+
+    log.job(config.JOB_COMPLETED_SUCCESS_STATUS,
+            'Successfully scraped all data')
\ No newline at end of file
--- a/src/scripts/info/grainger_selenium.py
+++ b/src/scripts/info/grainger_selenium.py
+import os
+import time
+import traceback
+import config
+
+from common import Log, get_driver
+from selenium.webdriver.common.by import By
+from selenium.webdriver.support import expected_conditions as EC
+from selenium.webdriver.support.ui import WebDriverWait
+
+def GraingerSelenium(agentRunContext):
+    log = Log(agentRunContext)
+
+    log.job(config.JOB_RUNNING_STATUS, 'Job Started')
+
+    log.job(config.JOB_RUNNING_STATUS, 'Script Under Development')
+
+    log.job(config.JOB_COMPLETED_SUCCESS_STATUS,
+            'Successfully scraped all data')
--- a/src/scripts/pdf/__init__.py
+++ b/src/scripts/pdf/__init__.py
 # Scrapy
 from .applied_scrapy import AppliedScrapy
+from .grainger_scrapy import GraingerScrapy

 # Selenium
-from .applied_selenium import AppliedSelenium
\ No newline at end of file
+from .applied_selenium import AppliedSelenium
+from .grainger_selenium import GraingerSelenium
\ No newline at end of file
--- a/src/scripts/pdf/grainger_scrapy.py
+++ b/src/scripts/pdf/grainger_scrapy.py
+
+import scrapy
+from scrapy.pipelines.files import FilesPipeline
+
+# search_param=do630 voltage regulator (via category list)
+# search_param=do 360 voltage (via product list)
+# search_param=61HH68 (via direct product page)
+
+# variables for eval() to parse
+null = 'null'
+true = 'true'
+false = 'false'
+
+def GraingerScrapy(agentRunContext):
+
+    class GeneralFilesItem(scrapy.Item):
+        file_name = scrapy.Field()
+        file_urls = scrapy.Field()
+        files = scrapy.Field
+
+
+    class GenreralFilesPipeline(FilesPipeline):
+        def get_media_requests(self, item, info):
+            for my_url in item.get('file_urls', []):
+                yield scrapy.Request(my_url, meta={'file_name': item.get('file_name')})
+
+        def file_path(self, request, response=None, info=None):
+            return request.meta['file_name']
+
+
+    class GriengerPDFScrapy(scrapy.Spider):
+        name = 'GriengerPDFScrapy'
+        user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
+        main_url = 'https://www.grainger.com/'
+        custom_settings = {
+            'ITEM_PIPELINES': {'grienger_scrapy_pdf.GenreralFilesPipeline': 1},
+            'FILES_STORE': '/home/test/Music/down/'
+        }
+
+        def __init__(self, agentRunContext):
+            self.start_urls = [
+                "https://www.grainger.com/search?searchQuery="+agentRunContext.requestBody['search']]
+            super().__init__()
+            self
+
+        def parse(self, response):
+            if 'search?' not in response.url:
+                yield scrapy.Request(url=response.url, callback=self.collect_data)
+            else:
+                if len(response.css('section[aria-label="Category products"]')) > 0:
+                    script = [i.strip() for i in response.css('script::text').extract(
+                    ) if i.strip().startswith('window.__PRELOADED_STATE__')][0]
+                    script = eval(script.split(
+                        '=', 1)[-1].split('window.__UI_CONFIG__')[0].strip()[:-1])
+                    products = list(script['category']['category']
+                                    ['skuToProductMap'].keys())
+                    href = '/product/info?productArray='+','.join(products)
+                    yield scrapy.Request(url=self.main_url+href, callback=self.get_products)
+                else:
+                    # iterate every categories
+                    for href in response.css('a.route::attr(href)').extract():
+                        yield scrapy.Request(url=self.main_url+href, callback=self.parse_category_page)
+
+        def parse_category_page(self, response):
+            script = [i.strip() for i in response.css('script::text').extract(
+            ) if i.strip().startswith('window.__PRELOADED_STATE__')][0]
+            script = eval(script.split('=', 1)
+                        [-1].split('window.__UI_CONFIG__')[0].strip()[:-1])
+            cat_id = script['category']['category']['id']
+            for i in script['category']['collections']:
+                coll_id = i['id']
+                url1 = self.main_url + \
+                    '/experience/pub/api/products/collection/{0}?categoryId={1}'
+                yield scrapy.Request(url=url1.format(coll_id, cat_id), callback=self.get_products)
+
+        def get_products(self, response):
+            data = response.json()
+            if 'products' in data.keys():
+                for i in data['products']:
+                    yield scrapy.Request(url=self.main_url+i['productDetailUrl'], callback=self.collect_data)
+            else:
+                for i in data.values():
+                    if type(i) == dict and 'productDetailUrl' in i.keys():
+                        yield scrapy.Request(url=self.main_url+i['productDetailUrl'], callback=self.collect_data)
+
+        def collect_data(self, response):
+            data = dict()
+            main_content = response.css('.product-detail__content--large')
+            for li in main_content.css('.product-detail__product-identifiers-content'):
+                key = li.css(
+                    '.product-detail__product-identifiers-label::text').get().strip()
+                value = li.css(
+                    '.product-detail__product-identifiers-description::text').extract()
+                value = [str(i).strip() for i in value] if len(
+                    value) > 1 else str(value[0]).strip()
+                data[key] = value
+
+            for a_tag in response.css('a.documentation__link'):
+                a_href = a_tag.xpath('./@href').get()
+                a_name = a_tag.xpath('./@title').get().strip()
+                filename = data['Item #']+'-'+a_name+'.'+a_href.split('.')[-1]
+                item = GeneralFilesItem()
+                item['file_name'] = filename
+                item['file_urls'] = ['https:'+a_href]
+                yield item
--- a/src/scripts/pdf/grainger_selenium.py
+++ b/src/scripts/pdf/grainger_selenium.py
+import os
+import shutil
+import time
+import traceback
+
+
+import config
+from common import Log, get_driver
+from selenium.webdriver.common.by import By
+from selenium.webdriver.support import expected_conditions as EC
+from selenium.webdriver.support.ui import WebDriverWait
+
+
+def single_product(log, driver, download_dir, new_output_dir):
+    try:
+        doc_section = driver.find_elements(
+            By.XPATH, '//ul[@class="documentation__content"]//li')
+        for link in doc_section:
+            download_link = link.find_element_by_tag_name(
+                'a').get_attribute('href')
+            product_name = str(driver.current_url).split('-')[-1].strip()
+            try:
+                product_name = product_name.split('?')[1].strip
+            except:
+                pass
+            driver.switch_to.new_window()
+            driver.get(download_link)
+            time.sleep(5)
+
+            file_name = os.listdir(download_dir)[0]
+            new_file_name = product_name + "-" + file_name
+            os.rename(os.path.join(download_dir, file_name),
+                      os.path.join(download_dir, new_file_name))
+
+            shutil.move(os.path.join(download_dir, new_file_name),
+                        os.path.join(new_output_dir, new_file_name))
+
+            log.info('info', '{0} Downloaded'.format(new_file_name))
+
+            time.sleep(2)
+            driver.close()
+            driver.switch_to.window(driver.window_handles[2])
+    except:
+        log.info('exception', traceback.format_exc())
+
+
+def multi_product(log, wait, driver, download_dir, new_output_dir):
+    # Collecting details for all products available
+    wait.until(EC.visibility_of_element_located(
+        (By.XPATH, '//div[@class = "multi-tiered-category"]')))
+    all_product = driver.find_elements_by_xpath(
+        '//div[@class = "multi-tiered-category"]//ul//li/a')
+
+    all_product = [i.get_attribute('href') for i in all_product]
+
+    c_url = driver.current_url
+
+    for p_url in all_product:
+        driver.switch_to.new_window()
+        driver.get(p_url)
+        time.sleep(2)
+
+        try:
+            wait.until(EC.element_to_be_clickable(
+                (By.XPATH, '//div[@id="feedbackBrowseModal"]//div[@class="modal-footer"]//a[@class = "close"]')))
+            driver.find_element_by_xpath(
+                '//div[@id="feedbackBrowseModal"]//div[@class="modal-footer"]//a[@class = "close"]').click()
+            time.sleep(2)
+        except:
+            pass
+
+        for a_tag in driver.find_elements(By.XPATH, "//tbody//a"):
+            product_url = str(a_tag.get_attribute('href'))
+            driver.switch_to.new_window()
+            driver.get(product_url)
+            time.sleep(2)
+            single_product(log, driver, download_dir, new_output_dir)
+            driver.close()
+            driver.switch_to.window(driver.window_handles[1])
+
+        driver.close()
+        driver.switch_to.window(driver.window_handles[0])
+        driver.get(c_url)
+        time.sleep(5)
+
+
+def GraingerSelenium(agentRunContext):
+    log = Log(agentRunContext)
+    try:
+        download_dir_id = str(agentRunContext.jobId)
+        download_dir = os.path.join(
+            os.getcwd(), 'temp', 'temp-' + download_dir_id)
+
+        # Creating an output directory for storing PDFs
+        try:
+            os.mkdir(os.path.normpath(os.getcwd() +
+                                      os.sep + os.pardir) + '\\output')
+        except:
+            pass
+        output_dir = os.path.normpath(
+            os.getcwd() + os.sep + os.pardir) + '\\output\\'
+        os.mkdir(output_dir + download_dir_id)
+        new_output_dir = os.path.join(output_dir, download_dir_id)
+
+        driver = get_driver(download_dir)
+        driver.maximize_window()
+        driver.get(agentRunContext.URL)
+
+        wait = WebDriverWait(driver, 20)
+
+        log.job(config.JOB_RUNNING_STATUS, 'Job Started')
+
+        # Inputing Search-Param
+        driver.find_element_by_xpath(
+            '//input[@aria-label="Search Query"]').send_keys(agentRunContext.requestBody['search'])
+        time.sleep(2)
+        driver.find_element_by_xpath(
+            '//button[@aria-label="Submit Search Query"]').click()
+        time.sleep(5)
+
+        check_url = str(driver.current_url)
+        # If multi_products are there in search params
+        if '?search' in check_url:
+            multi_product(log, wait, driver, download_dir, new_output_dir)
+        
+        # If single_products are there in search params
+        else:
+            single_product(log, driver, download_dir, new_output_dir)
+
+        log.job(config.JOB_RUNNING_STATUS, 'Downloaded All Invoices')
+
+    except Exception as e:
+        log.job(config.JOB_COMPLETED_FAILED_STATUS, str(e))
+        log.info('exception', traceback.format_exc())
+
+    driver.quit()
--- a/src/static/agent_configs/agents.json
+++ b/src/static/agent_configs/agents.json
@@ -18,5 +18,25 @@
            "info": "AppliedScrapy",
            "pdf": "AppliedScrapy"
        }
+    },
+    {
+        "agentId": "GRAINGER-SELENIUM",
+        "description": "Crawler For Grainger",
+        "provider": "Grainger",
+        "URL": "https://www.grainger.com",
+        "scripts": {
+            "info": "GraingerSelenium",
+            "pdf": "GraingerSelenium"
+        }
+    },
+    {
+        "agentId": "GRAINGER-SCRAPY",
+        "description": "Crawler For Grainger",
+        "provider": "Grainger",
+        "URL": "https://www.grainger.com",
+        "scripts": {
+            "info": "GraingerScrapy",
+            "pdf": "GraingerScrapy"
+        }
    }
 ]
\ No newline at end of file