Most of the software applications built today are written to run as web applications in the browsers. In this age of highly interactive and responsive software processes where many organizations are using some form of Agile methodology, using automated checks in Testing is becoming a must-requirement for them. Selenium is possibly the most widely-used open source solution to perform automation testing of the web-based applications. Although used primarily for UI testing, Selenium at its core is browser user-agent library.

As told earlier, Selenium is a collection of many things but at its core it is a collection of tools for web browser automation that uses the best techniques available to remote control browser instances and emulate a user’s interaction with the browser. It allows users to simulate actions performed by end-users like clicking on a button, entering text into text fields, selecting drop-down values, checking or unchecking checkboxes, clicking links etc. It also provides automation of other actions like hovering over an element, mouse movement, javascript execution etc.

SELENIUM HISTORY

Selenium was born in 2004 when Jason Huggins was testing an internal application at Thoughtworks. He created a JavaScript library that could drive interactions with the webpage, allowing him to automatically run tests against multiple browsers. It was out of necessity to reduce time spent manually verifying consistent behavior in the front-end of the web application. While JavaScript is a good tool to let you inspect the DOM properties and perform certain client-side observations that you would otherwise not be able to do, it falls short on the ability to naturally replicate a user’s interactions.

Since then, Selenium has grown and matured a lot with the introduction of Selenium IDE, Selenium RC, Selenium WebDriver and Selenium Grid. Selenium WebDriver has now become a World Wide Web Consortium (W3C) recommendation which means that it is now officially supported and endorsed by W3C. You can read about the detailed changes due to this from here.

SELENIUM FEATURES

Some of the key features of Selenium are mentioned below:
– Selenium is an Open Source project which means that it is free-to-use
– Selenium IDE has the ability to record and playback automation steps and generate codes in C#, Java, Python and Ruby
– Selenium Grid is used to run parallel tests on multiple machines having multiple browsers and multiple OS
– Selenium supports the following programming languages:
   Java, Python, C#, Ruby, JavaScript
– Selenium can run on the following operating systems:
   Windows, Linux, macOS, Android, iOS
– Selenium scripts can run on the following internet browsers:
  Google Chrome, Mozilla Firefox, Internet Explorer, Microsoft Edge, Opera, Safari
– Selenium can be integrated with other tools/libraries/frameworks like TestNG, Junit, Maven, Gradle, Ant, Jenkins, Docker etc.
– Selenium WebDriver does not require server installation as it can interact directly with the browsers.

SELENIUM WEBDRIVER ARCHITECTURE

The Selenium WebDriver Architecture follows the popular Client-Server architecture and consists mainly of four components:

  • Selenium Client and Language Bindings
  • JSON Wire Protocol over HTTP
  • Browser Drivers
  • Web Browsers

Below diagram shows in detail the Selenium WebDriver Framework Architecture with its components:

Selenium Client and Language Bindings
Selenium Client is responsible for sending out requests to perform Selenium WebDriver execution commands. The Selenium WebDriver bindings are code libraries developed and maintained by the developers of the Selenium project and are available in different programming languages namely Java, C#, Python, Ruby and JavaScript. Consider an example of opening a webpage using Java bindings in Selenium. The same operation of opening the webpage can also be performed using Selenium bindings of the other languages like C#, Python, Ruby and JavaScript.

JSON Wire Protocol over HTTP
All implementations of the Selenium WebDriver that communicate with the browser drivers use a common wire HTTP (Hyper Text Transfer Protocol) protocol. This wire protocol defines a RESTful Web Service using JSON (JavaScript Object Notation) over HTTP and is commonly known as JSON Wire Protocol. This JSON Wire Protocol is implemented in request/response pairs of “commands” and “responses”. The JSON wire protocol makes API calls for every Selenium command coming from the Selenium WebDriver API.

Browser Drivers
The browser drivers are servers that implement the JSON wire protocol and they know how to convert the Selenium commands into specific browser’s proprietary native APIs without revealing the internal logic of browser’s functionality. The browser drivers which are used along with the Selenium client libraries are ChromeDriver, FirefoxDriver, EdgeDriver, SafariDriver, OperaDriver, HTMLUnitDriver and GhostDriver. The browser drivers act as servers and receive HTTP requests from the selenium client in the form of URLs and send HTTP responses back to them thereby implementing the Client-Server architecture through the JSON Wire Protocol.

Web Browsers
The web browsers are software programs that allow users to locate, access and display web pages as well as other contents created using HTML (Hyper Text Markup Language) and XML (Extensible Markup Language) languages. All the executions of the Selenium commands are performed in the Web Browsers (Chrome, Firefox, Edge, Safari, Opera and Internet Explorer) through their respective browser drivers which act as middlemen.

Let’s now understand the flow through an example.

Suppose you write the below Selenium code (using its Java binding) in an IDE (Integrated Development Environment) of your choice.

WebDriver driver=new ChromeDriver();
driver.get(“https://www.google.com”);


Once you run this code, Chrome browser will get launched and you will be navigated to the home page of google. Internally, what happens is that – Every statement of the code gets converted to an URL with the help of JSON Wire Protocol over HTTP. The URLs are then passed to the Browser Drivers. In the above case, the Java client library will convert the Java code statements to JSON format and communicate with the “ChromeDriver” browser driver executable file. The URL will look like below:

http://localhost:8080/{“url”:”https://www.google.com”}

Every browser driver uses a HTTP server to receive the HTTP requests. Once the URL reaches the browser driver, then the browser driver will pass that request to the respective web browser over HTTP and the selenium commands will get executed on the browser. For an HTTP POST request, there will be an action on the browser and for an HTTP GET request, the response will get generated at the browser end and will be sent over HTTP to the browser driver. The browser driver will then send the response to the IDE via the JSON Wire Protocol.

Credits:
https://www.selenium.dev/documentation/