SeleniumĀ
Selenium is a somewhat modularly composed toolset for automating web browsers that is built on the WebDriver spec that browser vendors should adhere to. Oftentimes in Rails apps like this one, Selenium WebDriver is an API used by the Capybara gem to allow specification of automating the web browser using a domain language that also can be used to drive the web browser using a few other open-source browser drivers. To use Selenium WebDriver, you need to install the language bindings. This is similar to how Anki can use Qt with Python instead of C++ (it does this using the PyQt binding).Ā For Ruby, the Selenium WebDriver bindings are installed as a gem called selenium-webdriver. A related gem is webdrivers which can fetch the chromedriver provided by the browser vendor that is used by the Selenium WebDriver bindings. With the newest versions of selenium-webdriver, you may not need webdrivers anymore.
Here, we will look at Selenium WebDriver using C# instead of Ruby because I think it will add a lot to the overall explanation.
Example: Logging into Anki Books
namespace Tests; using OpenQA.Selenium; using OpenQA.Selenium.Chrome; [TestFixture] public class LoggingInAsTestUser { private IWebDriver driver; public IDictionary<string, object> Vars {get; private set;} private IJavaScriptExecutor js; [SetUp] public void SetUp() { driver = new ChromeDriver(); driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(2); js = (IJavaScriptExecutor)driver; Vars = new Dictionary<string, object>(); } [TearDown] protected void TearDown() { driver.Quit(); } [Test] public void Test() { driver.Navigate().GoToUrl("http://localhost:3000/"); driver.Manage().Window.Size = new System.Drawing.Size(948, 1003); driver.FindElement(By.LinkText("Login")).Click(); IWebElement emailInput = driver.FindElement(By.Id("email")); IWebElement passwordInput = driver.FindElement(By.Id("password")); emailInput.SendKeys("test@example.com"); passwordInput.SendKeys("1234asdf!!!!"); driver.FindElement(By.Id("login")).Click(); } }
This could be a shared thing for tests that need a user to be logged in. driver.Quit() is what ends the test by closing the web browser which you can watch as it runs the tests if you want. If you have the app running locally on localhost and port 3000, then this test has a chance to pass because it will immediately fail if the browser can't even get there. By calling FindElement on the driver, we can find an element on the page via an identifier that can be thought of as a CSS selector. If you call FindElement on a different element on the page (an object that implements the IWebElement interface) then you can scope the search to only descendents of that one. FindElements is similar to FindElement but it returns a container of all the elements that matched instead of one. A common way tests like these could fail is that a method like FindElement found multiple, and not a single element. The selectors here include one that matches a link (or HTML <a> element) by text content and three that use an id selector. Id selectors are useful for this kind of thing and if web standards are being followed, every id on the web page should be unique. The SendKeys() method is part of the Actions API which is a slightly lower-level way to control the browser in Selenium WebDriver. It is pretty much just shorthand for using the KeyDown() and KeyUp() methods which do exactly what they sound. Those ones can take members of the Keys enum as arguments (Keys.Tab is an example). I believe calling Click() on an IWebElement will click it in the middle.Ā
Article notes
What gem installs the Selenium bindings for Ruby?
selenium-webdriver
Why are the browser drivers generally not included in the standard Selenium distribution?
They are provided by the browser vendors
What refers to both the language bindings and the implementations of the individual browser controlling code in Selenium?
WebDriver
What is a specific WebDriver implementation that each web browser is backed with?
A driver