Addtion: training material for scraping.

bac78a9c · pushkar191098 · 31a40a1c · bac78a9c · bac78a9c · bac78a9c
Commit bac78a9c authored 2 years ago by pushkar191098
Hide whitespace changes
Inline Side-by-side

Showing

with 1421 additions and 8 deletions
+1421 -8
--- a/src/scripts/pdf/grainger_selenium.py
+++ b/src/scripts/pdf/grainger_selenium.py
@@ -11,7 +11,7 @@ from selenium.webdriver.support import expected_conditions as EC
 from selenium.webdriver.support.ui import WebDriverWait


-def single_product(log, driver, download_dir, new_output_dir):
+def single_product(log, driver, download_dir, new_output_dir, win_handle=2):
    try:
        doc_section = driver.find_elements(
            By.XPATH, '//ul[@class="documentation__content"]//li')
@@ -20,7 +20,7 @@ def single_product(log, driver, download_dir, new_output_dir):
                'a').get_attribute('href')
            product_name = str(driver.current_url).split('-')[-1].strip()
            try:
-                product_name = product_name.split('?')[1].strip
+                product_name = product_name.split('-')[-1].split('?')[:1][0]
            except:
                pass
            driver.switch_to.new_window()
@@ -39,8 +39,8 @@ def single_product(log, driver, download_dir, new_output_dir):

            time.sleep(2)
            driver.close()
-            driver.switch_to.window(driver.window_handles[2])
-    except:
+            driver.switch_to.window(driver.window_handles[win_handle])
+    except Exception as e:
        log.info('exception', traceback.format_exc())


@@ -118,14 +118,12 @@ def GraingerSelenium(agentRunContext):
            '//button[@aria-label="Submit Search Query"]').click()
        time.sleep(5)

-        check_url = str(driver.current_url)
        # If multi_products are there in search params
-        if '?search' in check_url:
+        if len(driver.find_elements(By.XPATH, '//div[@class = "multi-tiered-category"]')) > 0:
            multi_product(log, wait, driver, download_dir, new_output_dir)
-        
        # If single_products are there in search params
        else:
-            single_product(log, driver, download_dir, new_output_dir)
+            single_product(log, driver, download_dir, new_output_dir, 0)

        log.job(config.JOB_RUNNING_STATUS, 'Downloaded All Invoices')


--- a/training/Web Scraping.pptx
+++ b/training/Web Scraping.pptx
--- a/training/scrapy/scrapy_basics.ipynb
+++ b/training/scrapy/scrapy_basics.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Scrapy documentation\n",
+    "\n",
+    "Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.\n",
+    "\n",
+    "It can be used for a wide range of purposes, from data mining to monitoring and automated testing."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## INSTALLATION\n",
+    "\n",
+    "you can install Scrapy and its dependencies from PyPI with:\n",
+    "\n",
+    "> pip install Scrapy\n",
+    "\n",
+    "For more information see [Installation documentation](https://docs.scrapy.org/en/latest/intro/install.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### SAMPLE SPIDER CODE\n",
+    "\n",
+    "\n",
+    "```\n",
+    "# file_name = quotes_spider.py\n",
+    "import scrapy\n",
+    "\n",
+    "\n",
+    "class QuotesSpider(scrapy.Spider):\n",
+    "    name = 'quotes'\n",
+    "    start_urls = [\n",
+    "        'https://quotes.toscrape.com/tag/humor/',\n",
+    "    ]\n",
+    "\n",
+    "    def parse(self, response):\n",
+    "        for quote in response.css('div.quote'):\n",
+    "            yield {\n",
+    "                'author': quote.xpath('span/small/text()').get(),\n",
+    "                'text': quote.css('span.text::text').get(),\n",
+    "            }\n",
+    "\n",
+    "        next_page = response.css('li.next a::attr(\"href\")').get()\n",
+    "        if next_page is not None:\n",
+    "            yield response.follow(next_page, self.parse)\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "to run your scrapy spider:\n",
+    "> scrapy runspider quotes_spider.py -o quotes.json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## What just happened?\n",
+    "\n",
+    "When you ran the command `scrapy runspider quotes_spider.py`, Scrapy looked for a Spider definition inside it and ran it through its crawler engine.\n",
+    "\n",
+    "The crawl started by making requests to the URLs defined in the start_urls attribute (in this case, only the URL for quotes in humor category) and called the default callback method parse, passing the response object as an argument. In the parse callback, we loop through the quote elements using a CSS Selector, yield a Python dict with the extracted quote text and author, look for a link to the next page and schedule another request using the same parse method as callback.\n",
+    "\n",
+    "Here you notice one of the main advantages about Scrapy: requests are scheduled and processed asynchronously. This means that Scrapy doesn’t need to wait for a request to be finished and processed, it can send another request or do other things in the meantime. This also means that other requests can keep going even if some request fails or an error happens while handling it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Simplest way to dump all my scraped items into a JSON/CSV/XML file?\n",
+    "\n",
+    "To dump into a JSON file:\n",
+    "\n",
+    "> scrapy crawl myspider -O items.json\n",
+    "\n",
+    "To dump into a CSV file:\n",
+    "\n",
+    "> scrapy crawl myspider -O items.csv\n",
+    "\n",
+    "To dump into a XML file:\n",
+    "\n",
+    "> scrapy crawl myspider -O items.xml\n",
+    "\n",
+    "For more information see [Feed exports](https://docs.scrapy.org/en/latest/topics/feed-exports.html)\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "scrapy project example : [quotesbot](https://github.com/scrapy/quotesbot)\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Learn to Extract data\n",
+    "\n",
+    "The best way to learn how to extract data with Scrapy is trying selectors using the Scrapy shell. \n",
+    "\n",
+    "Run:\n",
+    "\n",
+    "> scrapy shell 'https://quotes.toscrape.com/page/1/'\n",
+    "\n",
+    "Using the shell, you can try selecting elements using CSS with the response object:\n",
+    "\n",
+    "> ->>> response.css('title')\n",
+    "\n",
+    "> [< Selector xpath='descendant-or-self::title' data='< title >Quotes to Scrape</ title>'>]\n",
+    "\n",
+    "The result of running response.css('title') is a list-like object called SelectorList, which represents a list of Selector objects that wrap around XML/HTML elements and allow you to run further queries to fine-grain the selection or extract the data.\n",
+    "\n",
+    "To extract the text from the title above, you can do:\n",
+    "\n",
+    "> ->>>response.css('title::text').getall()\n",
+    "\n",
+    "> ['Quotes to Scrape']\n",
+    "\n",
+    "There are two things to note here: one is that we’ve added ::text to the CSS query, to mean we want to select only the text elements directly inside < title> element. \n",
+    "\n",
+    "The other thing is that the result of calling .getall() is a list: it is possible that a selector returns more than one result, so we extract them all. When you know you just want the first result, as in this case, you can do:\n",
+    "\n",
+    "> ->>>response.css('title::text').get()\n",
+    "\n",
+    "> 'Quotes to Scrape'\n",
+    "\n",
+    "As an alternative, you could’ve written:\n",
+    "\n",
+    "> ->>>response.css('title::text')[0].get()\n",
+    "\n",
+    "> 'Quotes to Scrape'\n",
+    "\n",
+    "---\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Run Scrapy from a script\n",
+    "\n",
+    "You can use the API to run Scrapy from a script, instead of the typical way of running Scrapy via `scrapy crawl`.\n",
+    "\n",
+    "Remember that Scrapy is built on top of the Twisted asynchronous networking library, so you need to run it inside the Twisted reactor.\n",
+    "\n",
+    "The first utility you can use to run your spiders is `scrapy.crawler.CrawlerProcess`. \n",
+    "\n",
+    "This class will start a Twisted reactor for you, configuring the logging and setting shutdown handlers. This class is the one used by all Scrapy commands.\n",
+    "\n",
+    "Note that you will also have to shutdown the Twisted reactor yourself after the spider is finished. This can be achieved by adding callbacks to the deferred returned by the `CrawlerRunner.crawl` method.\n",
+    "\n",
+    "Here’s an example of its usage, along with a callback to manually stop the reactor after MySpider has finished running.\n",
+    "\n",
+    "```\n",
+    "from twisted.internet import reactor\n",
+    "import scrapy\n",
+    "from scrapy.crawler import CrawlerRunner\n",
+    "from scrapy.utils.log import configure_logging\n",
+    "\n",
+    "class MySpider(scrapy.Spider):\n",
+    "    # Your spider definition\n",
+    "    ...\n",
+    "\n",
+    "configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})\n",
+    "runner = CrawlerRunner()\n",
+    "\n",
+    "d = runner.crawl(MySpider)\n",
+    "d.addBoth(lambda _: reactor.stop())\n",
+    "reactor.run() # the script will block here until the crawling is finished\n",
+    "```"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
+%% Cell type:markdown id: tags:
+
+# Scrapy documentation
+
+Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
+
+It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
+
+%% Cell type:markdown id: tags:
+
+---
+
+%% Cell type:markdown id: tags:
+
+## INSTALLATION
+
+you can install Scrapy and its dependencies from PyPI with:
+
+> pip install Scrapy
+
+For more information see [Installation documentation](https://docs.scrapy.org/en/latest/intro/install.html)
+
+%% Cell type:markdown id: tags:
+
+----
+
+%% Cell type:markdown id: tags:
+
+### SAMPLE SPIDER CODE
+
+
+```
+# file_name = quotes_spider.py
+import scrapy
+
+
+class QuotesSpider(scrapy.Spider):
+    name = 'quotes'
+    start_urls = [
+        'https://quotes.toscrape.com/tag/humor/',
+    ]
+
+    def parse(self, response):
+        for quote in response.css('div.quote'):
+            yield {
+                'author': quote.xpath('span/small/text()').get(),
+                'text': quote.css('span.text::text').get(),
+            }
+
+        next_page = response.css('li.next a::attr("href")').get()
+        if next_page is not None:
+            yield response.follow(next_page, self.parse)
+```
+
+%% Cell type:markdown id: tags:
+
+to run your scrapy spider:
+> scrapy runspider quotes_spider.py -o quotes.json
+
+%% Cell type:markdown id: tags:
+
+## What just happened?
+
+When you ran the command `scrapy runspider quotes_spider.py`, Scrapy looked for a Spider definition inside it and ran it through its crawler engine.
+
+The crawl started by making requests to the URLs defined in the start_urls attribute (in this case, only the URL for quotes in humor category) and called the default callback method parse, passing the response object as an argument. In the parse callback, we loop through the quote elements using a CSS Selector, yield a Python dict with the extracted quote text and author, look for a link to the next page and schedule another request using the same parse method as callback.
+
+Here you notice one of the main advantages about Scrapy: requests are scheduled and processed asynchronously. This means that Scrapy doesn’t need to wait for a request to be finished and processed, it can send another request or do other things in the meantime. This also means that other requests can keep going even if some request fails or an error happens while handling it.
+
+%% Cell type:markdown id: tags:
+
+---
+
+%% Cell type:markdown id: tags:
+
+### Simplest way to dump all my scraped items into a JSON/CSV/XML file?
+
+To dump into a JSON file:
+
+> scrapy crawl myspider -O items.json
+
+To dump into a CSV file:
+
+> scrapy crawl myspider -O items.csv
+
+To dump into a XML file:
+
+> scrapy crawl myspider -O items.xml
+
+For more information see [Feed exports](https://docs.scrapy.org/en/latest/topics/feed-exports.html)
+
+---
+
+%% Cell type:markdown id: tags:
+
+scrapy project example : [quotesbot](https://github.com/scrapy/quotesbot)
+
+---
+
+%% Cell type:markdown id: tags:
+
+### Learn to Extract data
+
+The best way to learn how to extract data with Scrapy is trying selectors using the Scrapy shell.
+
+Run:
+
+> scrapy shell 'https://quotes.toscrape.com/page/1/'
+
+Using the shell, you can try selecting elements using CSS with the response object:
+
+> ->>> response.css('title')
+
+> [< Selector xpath='descendant-or-self::title' data='< title >Quotes to Scrape</ title>'>]
+
+The result of running response.css('title') is a list-like object called SelectorList, which represents a list of Selector objects that wrap around XML/HTML elements and allow you to run further queries to fine-grain the selection or extract the data.
+
+To extract the text from the title above, you can do:
+
+> ->>>response.css('title::text').getall()
+
+> ['Quotes to Scrape']
+
+There are two things to note here: one is that we’ve added ::text to the CSS query, to mean we want to select only the text elements directly inside < title> element.
+
+The other thing is that the result of calling .getall() is a list: it is possible that a selector returns more than one result, so we extract them all. When you know you just want the first result, as in this case, you can do:
+
+> ->>>response.css('title::text').get()
+
+> 'Quotes to Scrape'
+
+As an alternative, you could’ve written:
+
+> ->>>response.css('title::text')[0].get()
+
+> 'Quotes to Scrape'
+
+---
+
+%% Cell type:markdown id: tags:
+
+## Run Scrapy from a script
+
+You can use the API to run Scrapy from a script, instead of the typical way of running Scrapy via `scrapy crawl`.
+
+Remember that Scrapy is built on top of the Twisted asynchronous networking library, so you need to run it inside the Twisted reactor.
+
+The first utility you can use to run your spiders is `scrapy.crawler.CrawlerProcess`.
+
+This class will start a Twisted reactor for you, configuring the logging and setting shutdown handlers. This class is the one used by all Scrapy commands.
+
+Note that you will also have to shutdown the Twisted reactor yourself after the spider is finished. This can be achieved by adding callbacks to the deferred returned by the `CrawlerRunner.crawl` method.
+
+Here’s an example of its usage, along with a callback to manually stop the reactor after MySpider has finished running.
+
+```
+from twisted.internet import reactor
+import scrapy
+from scrapy.crawler import CrawlerRunner
+from scrapy.utils.log import configure_logging
+
+class MySpider(scrapy.Spider):
+    # Your spider definition
+    ...
+
+configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
+runner = CrawlerRunner()
+
+d = runner.crawl(MySpider)
+d.addBoth(lambda _: reactor.stop())
+reactor.run() # the script will block here until the crawling is finished
+```
--- a/training/scrapy/scripts/AppliedScrapy.py
+++ b/training/scrapy/scripts/AppliedScrapy.py
+
+import json
+import time
+
+from elasticsearch import Elasticsearch
+
+import scrapy
+from scrapy import Request
+
+class AppliedSpider(scrapy.Spider):
+    name = 'applied'
+    user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
+
+    def __init__(self, search_param=''):
+        self.api_url = 'https://www.applied.com'
+        self.start_urls = [
+            'https://www.applied.com/search?page=0&search-category=all&override=true&isLevelUp=false&q='+search_param]
+        super().__init__()
+
+    def collect_data(self, response):
+        # product url parsing
+
+        # specification data
+        spec = dict()
+        for trs in response.xpath('//*[@id="specifications"]//table//tr'):
+            key = trs.xpath('./td[1]/text()').get().strip()
+            value = trs.xpath('./td[2]/text()').get().strip()
+            spec[key] = value
+
+        # final data
+        data = {
+            'company': response.xpath('//h1[@itemprop="brand"]/a/text()').get().strip(),
+            'product': response.xpath('//span[@itemprop="mpn name"]/text()').get().strip(),
+            'details': response.xpath('//div[@class="details"]//text()').get().strip(),
+            'item': response.xpath('//div[@class="customer-part-number"]/text()').get().strip(),
+            'description': [x.strip() for x in response.xpath('//div[@class="short-description"]/ul/li/text()').extract()],
+            'specification': spec,
+            'url': response.url.strip(),
+            'timestamp': int(time.time()*1000)
+        }
+        yield data
+
+    def parse(self, response):
+        # search url parsing
+        for scrape_url in response.xpath('//a[@class="hide-for-print more-detail"]/@href').extract():
+            # extract product url
+            yield Request(self.api_url+scrape_url, self.collect_data)
+
+        # extract next page url and re-run function
+        next_page = response.xpath('//a[@class="next"]/@href').get()
+        if next_page is not None:
+            yield Request(self.api_url+next_page, self.parse)
--- a/training/scrapy/scripts/RSSpiderScrapy.py
+++ b/training/scrapy/scripts/RSSpiderScrapy.py
+import scrapy
+
+class RSSpider(scrapy.Spider):
+    crawler = 'RSSpider'
+    name = 'RSSpider'
+    main_domain = 'https://in.rsdelivers.com'
+    start_urls = ['https://in.rsdelivers.com/productlist/search?query=749']
+
+    def parse(self,response):
+        for ele in response.css('a.snippet'):
+            my_href = ele.xpath('./@href').get()
+            yield scrapy.Request(url=self.main_domain+my_href,callback=self.collect_data)
+
+    def collect_data(self,response):
+        data = dict()
+        meta_data = response.css('div.row-inline::text').extract()
+        for i in range(0,100,3):
+            try:
+                data[meta_data[i]] = meta_data[i+2]
+            except:
+                break
+        data['title'] = str(response.css('h1.title::text').get()).strip()
+        data['url'] = response.url
+        yield data
--- a/training/selenium/selenium_automation_selenium.ipynb
+++ b/training/selenium/selenium_automation_selenium.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## SELENIUM AUTOMATION AND WEBSCRAPING"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Load the Driver "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from selenium import webdriver\n",
+    "from selenium.webdriver.chrome.service import Service\n",
+    "\n",
+    "my_service = Service('/home/amruth/Music/chromedriver')\n",
+    "driver = webdriver.Chrome(service=my_service)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Extra imports"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# to set for supported locators\n",
+    "from selenium.webdriver.common.by import By\n",
+    "\n",
+    "# to handle time related tasks\n",
+    "import time\n",
+    "\n",
+    "# create creds.py with USERNAME,PASSWORD variables\n",
+    "import creds\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To fetch the home URL"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.get(\"https://kronos.tarento.com/login\")\n",
+    "driver.maximize_window()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Login Content"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "time.sleep(1)\n",
+    "driver.find_element(By.XPATH, '//*[@type=\"email\"]').send_keys(creds.USERNAME)\n",
+    "\n",
+    "time.sleep(1)\n",
+    "driver.find_element(By.XPATH, '//*[@type=\"password\"]').send_keys(creds.PASSWORD)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To check if element selected or not"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "time.sleep(2)\n",
+    "driver.execute_script('arguments[0].click();',driver.find_element(By.XPATH, '//*[@type=\"checkbox\"]'))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To click on the login button"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_element(By.XPATH, '//*[@type=\"submit\"]').click()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Scraping the data for the browser"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Login failed\n"
+     ]
+    }
+   ],
+   "source": [
+    "time.sleep(2)\n",
+    "try:\n",
+    "    my_username = driver.find_element(By.XPATH, '//a[@role=\"button\"]').text.strip()\n",
+    "    output = 'logged in as:' + my_username\n",
+    "except:\n",
+    "    output = 'Login failed'    \n",
+    "print(output)    \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To close the window"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.close()\n",
+    "driver.quit()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "selenium scripts\n",
+    "generalation pdf_scripts\n",
+    "scrapy docs\n",
+    "general refactor"
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.10.4 64-bit",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.4"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
+%% Cell type:markdown id: tags:
+
+## SELENIUM AUTOMATION AND WEBSCRAPING
+
+%% Cell type:markdown id: tags:
+
+Load the Driver
+
+%% Cell type:code id: tags:
+
+``` python
+from selenium import webdriver
+from selenium.webdriver.chrome.service import Service
+
+my_service = Service('/home/amruth/Music/chromedriver')
+driver = webdriver.Chrome(service=my_service)
+```
+
+%% Cell type:markdown id: tags:
+
+Extra imports
+
+%% Cell type:code id: tags:
+
+``` python
+# to set for supported locators
+from selenium.webdriver.common.by import By
+
+# to handle time related tasks
+import time
+
+# create creds.py with USERNAME,PASSWORD variables
+import creds
+```
+
+%% Cell type:markdown id: tags:
+
+To fetch the home URL
+
+%% Cell type:code id: tags:
+
+``` python
+driver.get("https://kronos.tarento.com/login")
+driver.maximize_window()
+```
+
+%% Cell type:markdown id: tags:
+
+Login Content
+
+%% Cell type:code id: tags:
+
+``` python
+time.sleep(1)
+driver.find_element(By.XPATH, '//*[@type="email"]').send_keys(creds.USERNAME)
+
+time.sleep(1)
+driver.find_element(By.XPATH, '//*[@type="password"]').send_keys(creds.PASSWORD)
+```
+
+%% Cell type:markdown id: tags:
+
+To check if element selected or not
+
+%% Cell type:code id: tags:
+
+``` python
+time.sleep(2)
+driver.execute_script('arguments[0].click();',driver.find_element(By.XPATH, '//*[@type="checkbox"]'))
+```
+
+%% Cell type:markdown id: tags:
+
+To click on the login button
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_element(By.XPATH, '//*[@type="submit"]').click()
+```
+
+%% Cell type:markdown id: tags:
+
+Scraping the data for the browser
+
+%% Cell type:code id: tags:
+
+``` python
+time.sleep(2)
+try:
+    my_username = driver.find_element(By.XPATH, '//a[@role="button"]').text.strip()
+    output = 'logged in as:' + my_username
+except:
+    output = 'Login failed'
+print(output)
+```
+
+%% Output
+
+    Login failed
+
+%% Cell type:markdown id: tags:
+
+To close the window
+
+%% Cell type:code id: tags:
+
+``` python
+driver.close()
+driver.quit()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+selenium scripts
+generalation pdf_scripts
+scrapy docs
+general refactor
--- a/training/selenium/selenium_basics.ipynb
+++ b/training/selenium/selenium_basics.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# SELENIUM-WEBDRIVER-BASICS"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### To install Selenium\n",
+    "\n",
+    "pip install selenium\n",
+    "\n",
+    "for more details refer this link - https://selenium-python.readthedocs.io/ "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### NOTES\n",
+    "1. different versions of chrome and chromedriver will not work\n",
+    "2. for firefox,profile_path is mandatory"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "--------------------------------------------------------------------"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To Initialize the driver"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from selenium import webdriver\n",
+    "from selenium.webdriver.chrome.service import Service\n",
+    "\n",
+    "my_service = Service('/home/amruth/Music/chromedriver')\n",
+    "driver = webdriver.Chrome(service=my_service)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To fetch a URL\n",
+    "\n",
+    "syntax: driver.get('my_url')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 58,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.get('https://www.google.com/')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To get the current URl"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'https://www.google.com/'"
+      ]
+     },
+     "execution_count": 59,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "driver.current_url"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To maximum the window"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.maximize_window()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To get bak to previous page"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#get a new page\n",
+    "driver.get(\"https://www.cricbuzz.com/\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#back to previous page with back()\n",
+    "driver.back()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To go to forward page"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.forward()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To refresh the page"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 63,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.refresh()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To take the screenshot"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 64,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 64,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "driver.save_screenshot(filename='/home/amruth/Pictures/2.png')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To get the sessionID"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 65,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'52cb5dafe613edf285132391b58ed44a'"
+      ]
+     },
+     "execution_count": 65,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "driver.session_id"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To view page source"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.page_source"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To create and switch to new_tab"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.switch_to.new_window()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To get list of tabs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['CDwindow-7A88A3B7E81EE88473EFA8F5FB49CD5D',\n",
+       " 'CDwindow-585DC24AC56D0BC0A12B3FA2796921EF']"
+      ]
+     },
+     "execution_count": 68,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "driver.window_handles"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To close the tab"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To switch an old tab"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.switch_to.window(driver.window_handles[0])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To quit the browser"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 71,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.quit()"
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.10.4 64-bit",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.4"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
+%% Cell type:markdown id: tags:
+
+# SELENIUM-WEBDRIVER-BASICS
+
+%% Cell type:markdown id: tags:
+
+### To install Selenium
+
+pip install selenium
+
+for more details refer this link - https://selenium-python.readthedocs.io/
+
+%% Cell type:markdown id: tags:
+
+#### NOTES
+1. different versions of chrome and chromedriver will not work
+2. for firefox,profile_path is mandatory
+
+%% Cell type:markdown id: tags:
+
+--------------------------------------------------------------------
+
+%% Cell type:markdown id: tags:
+
+To Initialize the driver
+
+%% Cell type:code id: tags:
+
+``` python
+from selenium import webdriver
+from selenium.webdriver.chrome.service import Service
+
+my_service = Service('/home/amruth/Music/chromedriver')
+driver = webdriver.Chrome(service=my_service)
+```
+
+%% Cell type:markdown id: tags:
+
+To fetch a URL
+
+syntax: driver.get('my_url')
+
+%% Cell type:code id: tags:
+
+``` python
+driver.get('https://www.google.com/')
+```
+
+%% Cell type:markdown id: tags:
+
+To get the current URl
+
+%% Cell type:code id: tags:
+
+``` python
+driver.current_url
+```
+
+%% Output
+
+    'https://www.google.com/'
+
+%% Cell type:markdown id: tags:
+
+To maximum the window
+
+%% Cell type:code id: tags:
+
+``` python
+driver.maximize_window()
+```
+
+%% Cell type:markdown id: tags:
+
+To get bak to previous page
+
+%% Cell type:code id: tags:
+
+``` python
+#get a new page
+driver.get("https://www.cricbuzz.com/")
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#back to previous page with back()
+driver.back()
+```
+
+%% Cell type:markdown id: tags:
+
+To go to forward page
+
+%% Cell type:code id: tags:
+
+``` python
+driver.forward()
+```
+
+%% Cell type:markdown id: tags:
+
+To refresh the page
+
+%% Cell type:code id: tags:
+
+``` python
+driver.refresh()
+```
+
+%% Cell type:markdown id: tags:
+
+To take the screenshot
+
+%% Cell type:code id: tags:
+
+``` python
+driver.save_screenshot(filename='/home/amruth/Pictures/2.png')
+```
+
+%% Output
+
+    True
+
+%% Cell type:markdown id: tags:
+
+To get the sessionID
+
+%% Cell type:code id: tags:
+
+``` python
+driver.session_id
+```
+
+%% Output
+
+    '52cb5dafe613edf285132391b58ed44a'
+
+%% Cell type:markdown id: tags:
+
+To view page source
+
+%% Cell type:code id: tags:
+
+``` python
+driver.page_source
+```
+
+%% Cell type:markdown id: tags:
+
+To create and switch to new_tab
+
+%% Cell type:code id: tags:
+
+``` python
+driver.switch_to.new_window()
+```
+
+%% Cell type:markdown id: tags:
+
+To get list of tabs
+
+%% Cell type:code id: tags:
+
+``` python
+driver.window_handles
+```
+
+%% Output
+
+    ['CDwindow-7A88A3B7E81EE88473EFA8F5FB49CD5D',
+     'CDwindow-585DC24AC56D0BC0A12B3FA2796921EF']
+
+%% Cell type:markdown id: tags:
+
+To close the tab
+
+%% Cell type:code id: tags:
+
+``` python
+driver.close()
+```
+
+%% Cell type:markdown id: tags:
+
+To switch an old tab
+
+%% Cell type:code id: tags:
+
+``` python
+driver.switch_to.window(driver.window_handles[0])
+```
+
+%% Cell type:markdown id: tags:
+
+To quit the browser
+
+%% Cell type:code id: tags:
+
+``` python
+driver.quit()
+```
--- a/training/selenium/selenium_locators.ipynb
+++ b/training/selenium/selenium_locators.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# *What are Locators?*\n",
+    "    \n",
+    "* Locator is a command that tells Selenium IDE which GUI elements its needs to operate on.(say Text Box, Buttons, Check Boxes etc)  \n",
+    "* Identification of correct GUI elements is a prerequisite to creating an automation script."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from selenium import webdriver\n",
+    "from selenium.webdriver.common.by import By\n",
+    "from selenium.webdriver.chrome.service import Service"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "my_service = Service(r\"C:\\Drivers\\chromedriver_win32\\chromedriver.exe\")\n",
+    "driver = webdriver.Chrome(service=my_service)\n",
+    "driver.get(\"https://www.amazon.in/\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Types of Locators"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 1. By Tag Name"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "      \n",
+    "        Syntax: driver.find_element(By.TAG_NAME, 'tag_name')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_elements(By.TAG_NAME, 'input')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_element(By.TAG_NAME, 'input')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_elements(By.TAG_NAME, 'a')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# <button> tag is not available, will throw an error.\n",
+    "driver.find_element(By.TAG_NAME, 'button')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 2. By Name"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "        Syntax: driver.find_element(By.NAME, 'my_name')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# <input type=\"text\" id=\"twotabsearchtextbox\" value=\"\" name=\"field-keywords\" \n",
+    "# autocomplete=\"off\" placeholder=\"\" class=\"nav-input nav-progressive-attribute\" dir=\"auto\" tabindex=\"0\" aria-label=\"Search\">\n",
+    "driver.find_element(By.NAME, 'field-keywords')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# <input data-addnewaddress=\"add-new\" id=\"unifiedLocation1ClickAddress\" name=\"dropdown-selection\" \n",
+    "# type=\"hidden\" value=\"add-new\" class=\"nav-progressive-attribute\">\n",
+    "driver.find_element(By.NAME, 'dropdown-selection')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 3. By ID"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* The ids are generally unique for an element.\n",
+    "\n",
+    "        Syntax: driver.find_element(By.ID, 'my_id')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# <input type=\"text\" id=\"twotabsearchtextbox\" value=\"\" name=\"field-keywords\" \n",
+    "# autocomplete=\"off\" placeholder=\"\" class=\"nav-input nav-progressive-attribute\" dir=\"auto\" tabindex=\"0\" aria-label=\"Search\">\n",
+    "driver.find_element(By.ID, 'twotabsearchtextbox')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# <div id=\"nav-cart-count-container\">\n",
+    "driver.find_element(By.ID, 'nav-cart-count-container')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 4. By Class Name"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "        Syntax: driver.find_element(By.CLASS_NAME, 'class_name')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# single word class_name\n",
+    "\n",
+    "# <div class=\"nav-search-field \">\n",
+    "driver.find_elements(By.CLASS_NAME, 'nav-search-field ')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# <div class=\"nav-left\">\n",
+    "driver.find_elements(By.CLASS_NAME, 'nav-left')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# multi word class_name"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If class_name have spaces between them, then exact same class_name (will not work).\n",
+    "driver.find_element(By.CLASS_NAME, 'nav-search-submit nav-sprite')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# put dot \".\" instead on spaces (will work).\n",
+    "driver.find_element(By.CLASS_NAME, 'nav-search-submit.nav-sprite')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5. By Link Text"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* The text enclosed within an anchor tag is used to identify a link or hyperlink. \n",
+    "\n",
+    "        Syntax: driver.find_element(By.LINK_TEXT, 'text')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_elements(By.LINK_TEXT, 'Best')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_elements(By.LINK_TEXT, 'Best Sellers')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 6. By Partial Link Text"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* The partial text enclosed within an anchor tag is used to identify a link or hyperlink.\n",
+    "\n",
+    "        Syntax: driver.find_element(By.PARTIAL_LINK_TEXT, 'text')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_element(By.PARTIAL_LINK_TEXT, 'Best')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_elements(By.PARTIAL_LINK_TEXT, 'Best Sellers')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 7. By XPATH"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* The element is identified with the XPath created with the help of HTML attribute, value, and tagName. \n",
+    "* Xpath is of two types absolute and relative. \n",
+    "    * For absolute XPath, we have to traverse from root to the element.\n",
+    "    * For relative XPath, we can start from any position in DOM. \n",
+    "    \n",
+    "* An XPath expression should follow a particular rule- // tagname [@attribute=’value’]. The Tag Name is optional. If it is omitted, the expression should //*[@attribute=’value’].\n",
+    "\n",
+    "        Syntax: driver.find_element(By.XPATH, '//XPATH')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# //tag_name\n",
+    "driver.find_elements(By.XPATH, '//input')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#  //tag_name[@attribute = \"value\"]\n",
+    "driver.find_elements(By.XPATH, '//input[@type=\"text\"]')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# //*[@attribute = \"value\"] \n",
+    "# * means it will search all the tags and will stop at where it finds the attribute+value\n",
+    "driver.find_elements(By.XPATH, '//*[@id=\"nav-xshop\"]')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# //*[@attribute = \"value\"]/tag_name\n",
+    "# / next child\n",
+    "driver.find_elements(By.XPATH, '//div[@class=\"nav-fill\"]/div')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# //*[@attribute = \"value\"]//tag_name\n",
+    "# // consider all child\n",
+    "driver.find_elements(By.XPATH, '//div[@class=\"nav-fill\"]//div')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# //tagname[. = \"text\"]\n",
+    "driver.find_elements(By.XPATH, '//a[. = \"Best Sellers\"]')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# //tag_name/..\n",
+    "# .. means parent of the tag_name\n",
+    "driver.find_elements(By.XPATH, '//input/..')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_elements(By.XPATH, '//*[@id=\"nav-tools\"]/a')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_elements(By.XPATH, '//*[@id=\"nav-tools\"]/a[1]')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "driver.find_elements(By.XPATH, '//*[@id=\"nav-tools\"]/a[last()]')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 8. By CSS Locator"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* The element is identified with the CSS created with the help of HTML attribute, value, or tagName.\n",
+    "\n",
+    "        Syntax: driver.find_elements(By.CSS_SELECTOR, 'input#txt')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# tag_name\n",
+    "driver.find_elements(By.CSS_SELECTOR, 'input')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# tag_name.class1.class2\n",
+    "driver.find_elements(By.CSS_SELECTOR, 'input.nav-input.nav-progressive-attribute')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# tag_name#id\n",
+    "driver.find_elements(By.CSS_SELECTOR, 'input#twotabsearchtextbox')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# parent_tag_name > child_tag_name\n",
+    "driver.find_elements(By.CSS_SELECTOR, 'div > input')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# #id\n",
+    "driver.find_elements(By.CSS_SELECTOR, '#twotabsearchtextbox')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# #id > parent_tag_name > child_tag_name\n",
+    "driver.find_elements(By.CSS_SELECTOR, '#CardInstanceQNqkNMgnYMdkg9dk0pUzTQ > div > div')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# tag_name[attribute = \"value\"]\n",
+    "driver.find_elements(By.CSS_SELECTOR, 'input[aria-label=\"Search\"]')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "14026295373d426cc26cb234867bcbb453b58a52d594499ebe3dcd2adfc69f37"
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
+%% Cell type:markdown id: tags:
+
+# *What are Locators?*
+
+* Locator is a command that tells Selenium IDE which GUI elements its needs to operate on.(say Text Box, Buttons, Check Boxes etc)
+* Identification of correct GUI elements is a prerequisite to creating an automation script.
+
+%% Cell type:code id: tags:
+
+``` python
+from selenium import webdriver
+from selenium.webdriver.common.by import By
+from selenium.webdriver.chrome.service import Service
+```
+
+%% Cell type:code id: tags:
+
+``` python
+my_service = Service(r"C:\Drivers\chromedriver_win32\chromedriver.exe")
+driver = webdriver.Chrome(service=my_service)
+driver.get("https://www.amazon.in/")
+```
+
+%% Cell type:markdown id: tags:
+
+# Types of Locators
+
+%% Cell type:markdown id: tags:
+
+# 1. By Tag Name
+
+%% Cell type:markdown id: tags:
+
+
+        Syntax: driver.find_element(By.TAG_NAME, 'tag_name')
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_elements(By.TAG_NAME, 'input')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_element(By.TAG_NAME, 'input')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_elements(By.TAG_NAME, 'a')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# <button> tag is not available, will throw an error.
+driver.find_element(By.TAG_NAME, 'button')
+```
+
+%% Cell type:markdown id: tags:
+
+# 2. By Name
+
+%% Cell type:markdown id: tags:
+
+        Syntax: driver.find_element(By.NAME, 'my_name')
+
+%% Cell type:code id: tags:
+
+``` python
+# <input type="text" id="twotabsearchtextbox" value="" name="field-keywords"
+# autocomplete="off" placeholder="" class="nav-input nav-progressive-attribute" dir="auto" tabindex="0" aria-label="Search">
+driver.find_element(By.NAME, 'field-keywords')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# <input data-addnewaddress="add-new" id="unifiedLocation1ClickAddress" name="dropdown-selection"
+# type="hidden" value="add-new" class="nav-progressive-attribute">
+driver.find_element(By.NAME, 'dropdown-selection')
+```
+
+%% Cell type:markdown id: tags:
+
+# 3. By ID
+
+%% Cell type:markdown id: tags:
+
+* The ids are generally unique for an element.
+
+        Syntax: driver.find_element(By.ID, 'my_id')
+
+%% Cell type:code id: tags:
+
+``` python
+# <input type="text" id="twotabsearchtextbox" value="" name="field-keywords"
+# autocomplete="off" placeholder="" class="nav-input nav-progressive-attribute" dir="auto" tabindex="0" aria-label="Search">
+driver.find_element(By.ID, 'twotabsearchtextbox')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# <div id="nav-cart-count-container">
+driver.find_element(By.ID, 'nav-cart-count-container')
+```
+
+%% Cell type:markdown id: tags:
+
+# 4. By Class Name
+
+%% Cell type:markdown id: tags:
+
+        Syntax: driver.find_element(By.CLASS_NAME, 'class_name')
+
+%% Cell type:code id: tags:
+
+``` python
+# single word class_name
+
+# <div class="nav-search-field ">
+driver.find_elements(By.CLASS_NAME, 'nav-search-field ')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# <div class="nav-left">
+driver.find_elements(By.CLASS_NAME, 'nav-left')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# multi word class_name
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# If class_name have spaces between them, then exact same class_name (will not work).
+driver.find_element(By.CLASS_NAME, 'nav-search-submit nav-sprite')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# put dot "." instead on spaces (will work).
+driver.find_element(By.CLASS_NAME, 'nav-search-submit.nav-sprite')
+```
+
+%% Cell type:markdown id: tags:
+
+# 5. By Link Text
+
+%% Cell type:markdown id: tags:
+
+* The text enclosed within an anchor tag is used to identify a link or hyperlink.
+
+        Syntax: driver.find_element(By.LINK_TEXT, 'text')
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_elements(By.LINK_TEXT, 'Best')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_elements(By.LINK_TEXT, 'Best Sellers')
+```
+
+%% Cell type:markdown id: tags:
+
+# 6. By Partial Link Text
+
+%% Cell type:markdown id: tags:
+
+* The partial text enclosed within an anchor tag is used to identify a link or hyperlink.
+
+        Syntax: driver.find_element(By.PARTIAL_LINK_TEXT, 'text')
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_element(By.PARTIAL_LINK_TEXT, 'Best')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_elements(By.PARTIAL_LINK_TEXT, 'Best Sellers')
+```
+
+%% Cell type:markdown id: tags:
+
+# 7. By XPATH
+
+%% Cell type:markdown id: tags:
+
+* The element is identified with the XPath created with the help of HTML attribute, value, and tagName.
+* Xpath is of two types absolute and relative.
+    * For absolute XPath, we have to traverse from root to the element.
+    * For relative XPath, we can start from any position in DOM.
+
+* An XPath expression should follow a particular rule- // tagname [@attribute=’value’]. The Tag Name is optional. If it is omitted, the expression should //*[@attribute=’value’].
+
+        Syntax: driver.find_element(By.XPATH, '//XPATH')
+
+%% Cell type:code id: tags:
+
+``` python
+# //tag_name
+driver.find_elements(By.XPATH, '//input')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+#  //tag_name[@attribute = "value"]
+driver.find_elements(By.XPATH, '//input[@type="text"]')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# //*[@attribute = "value"]
+# * means it will search all the tags and will stop at where it finds the attribute+value
+driver.find_elements(By.XPATH, '//*[@id="nav-xshop"]')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# //*[@attribute = "value"]/tag_name
+# / next child
+driver.find_elements(By.XPATH, '//div[@class="nav-fill"]/div')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# //*[@attribute = "value"]//tag_name
+# // consider all child
+driver.find_elements(By.XPATH, '//div[@class="nav-fill"]//div')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# //tagname[. = "text"]
+driver.find_elements(By.XPATH, '//a[. = "Best Sellers"]')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# //tag_name/..
+# .. means parent of the tag_name
+driver.find_elements(By.XPATH, '//input/..')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_elements(By.XPATH, '//*[@id="nav-tools"]/a')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_elements(By.XPATH, '//*[@id="nav-tools"]/a[1]')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+driver.find_elements(By.XPATH, '//*[@id="nav-tools"]/a[last()]')
+```
+
+%% Cell type:markdown id: tags:
+
+# 8. By CSS Locator
+
+%% Cell type:markdown id: tags:
+
+* The element is identified with the CSS created with the help of HTML attribute, value, or tagName.
+
+        Syntax: driver.find_elements(By.CSS_SELECTOR, 'input#txt')
+
+%% Cell type:code id: tags:
+
+``` python
+# tag_name
+driver.find_elements(By.CSS_SELECTOR, 'input')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# tag_name.class1.class2
+driver.find_elements(By.CSS_SELECTOR, 'input.nav-input.nav-progressive-attribute')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# tag_name#id
+driver.find_elements(By.CSS_SELECTOR, 'input#twotabsearchtextbox')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# parent_tag_name > child_tag_name
+driver.find_elements(By.CSS_SELECTOR, 'div > input')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# #id
+driver.find_elements(By.CSS_SELECTOR, '#twotabsearchtextbox')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# #id > parent_tag_name > child_tag_name
+driver.find_elements(By.CSS_SELECTOR, '#CardInstanceQNqkNMgnYMdkg9dk0pUzTQ > div > div')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# tag_name[attribute = "value"]
+driver.find_elements(By.CSS_SELECTOR, 'input[aria-label="Search"]')
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```