Tutorial: Finding Web Races with EventRacer

Audience: This tutorial is intended for web developers. Some basic knowledge of HTML and JavaScript is required to understand the topic.

Introduction

What is a Web Race ?

Tasks in a web application

Example: harmful races

Analyzing with EventRacer

Fixing races with synchronization

Conclusion

Introduction

When a user navigates to a web page, the browser executes sequences of operations coded in the page’s HTML, CSS and JavaScript. In this tutorial, we’ll discuss the possible orders in which the browser can execute these operations, and how unexpected orderings can lead to nasty “web race” bugs.

What is a Web Race ?

We say that two web page operations participate in a race if (1) the operations can execute in either order, and (2) the website behaves differently depending on the execution order. As an example of a web race that can lead to unexpected, buggy behavior, we made a simple web site that behaves differently depending on the speed at which the user clicks:

 

http://www.eventracer.org/speed_demo.html

Web races can lead to bugs that negatively affect the user experience. The chances of a web race bug affecting a particular user depend on several factors:

Certain implementation strategies can decrease the likelihood of web races arising. For example, using well tested libraries (e.g. jQuery) and following their guidelines partially mitigates the problem. Yet, we have found that one fifth of the websites of Fortune 100 companies still have web races that may negatively impact user experience.

A serious problem is that testing with all possible configurations to discover a web race is completely impractical, even for big companies. In addition, it can be hard to isolate a web race and reproduce it. This makes the use of advanced analysis tools necessary for discovering the sources of such errors. In this tutorial, we demonstrate how to use EventRacer to find and analyze web races.

Tasks in a web application

To understand the root cause of web races, we must first explore the execution model at the core of modern browsers.  The HTML5 specification states that compliant browsers must be based on an event loop that performs different types of tasks.  Tasks go beyond JavaScript execution to encompass all operations required to render a web site.  For example, parsing of an HTML tag is a task that often does not involve any JavaScript. Other kinds of tasks include handling user click events, receiving data over the network, executing scripts, or invoking callbacks due to firing of HTML events like ‘onload’ or ‘onreadystatechange’.  

Is there an order in which tasks must execute?  Certain tasks do have a fixed ordering. For example, consider the following HTML fragment:

<div>Hello world!</div>

Parsing this fragment in fact requires three tasks:

  1. Parse the <div> tag to create a div element in the DOM
  2. Parse the “Hello world!” text and set it as the text content of the div DOM node
  3. Parse the closing </div> tag

According to the HTML5 specification, tasks 1, 2, and 3 must always execute in the above order.  However, other tasks can be interleaved between the above tasks. For example, between tasks 1 and 2, the browser is free to execute a JavaScript event handler due to a user interaction or the completion of some image loading operation.  These tricky interleavings are often the root cause of web race bugs.

As another example, consider the following HTML:

<input type="button"

  onclick="document.getElementById('x').innerHTML = 'Clicked'; ">

<div id="x">Not clicked yet</div>

The intended behavior is to create a button that, when clicked, changes the “Not clicked yet” text in the div to “Clicked”.  While this code will almost always work as intended, there is a buggy corner case: what happens if the user clicks the button before the div node is present in the DOM?  While unlikely for this small test, according to the HTML5 specification this is a possible behavior.  In such a case, document.getElementById(‘x’) is undefined, and hence the click will cause a JavaScript error.

In practice, the likelihood of this error depends on many factors. If many other HTML nodes were declared between the input and div tags, the network delay to load this HTML may be sufficient to increase the chance that the user might click the button before the div node is parsed.  In fact, the problem may occur only on certain browsers, or only when other parts of the page have certain content. These complications make testing for such bugs quite tricky.

Example: harmful races

Now, consider a bigger example, in which races may cause the webapp to malfunction. Assume the developer wants to create two buttons that look like regular images, such that clicking them redirects to different pages. A simple code to do this is the following:

<html>

<head></head>

<body>

  <img src="http://eventracer.org/img/eventracer-logo.png"

      onload="javascript:init();" onerror="javascript:init();">

  <script>

    function init() {

      var g = document.getElementById("g");

      var y = document.getElementById("y");

      // Add click listeners to the images g and y.

      g.addEventListener("click", function(e){

        window.location.href = "http://www.google.com/";

      });

      y.addEventListener("click", function(e){

        window.location.href = "http://www.yahoo.com/";

      });

    }

  </script>

  <br><br>Example site with races. Click on a search engine icon:

  <img src="google.png" id="g"> <!--button g-->

  <img src="yahoo.png" id="y"> <!--button y-->

</body>

</html>

In this example, there is an init function that attaches click listeners to the two buttons (img tags with ids “g” and “y”). The init function is called when a logo image eventracer-logo.png is loaded, or if it fails to load. At first glance, it may seem that the buttons “g” and “y” will have their click handlers attached when the page loads.

However, if we consider all possible task orderings, there are in fact several possibilities:

This is an interesting example since despite its small size, it can be broken in so many ways. But to a web developer it may seem as it works and only after some unrelated modification of the website, it will break sometimes - e.g. if the size of the image file for the logo is changed, some of the broken behaviors may become very frequent for some users.

Analyzing with EventRacer

We have made the above example available online here. To analyze it, you can either load the url of the page ( http://www.eventracer.org/race_bug.html ) into eventracer.org or if you download the instrumented browser as a binary, type:

./auto_explore_site.sh http://www.eventracer.org/race_bug.html

./raceanalyzer ER_actionlog

and then open localhost:8000 in a normal browser to observe the found races.

When you select to see the races, you will see a table that looks like:

The Type column shows the type of the memory location. The Name column shows the name of the variable. The Num. races shows the number of races on that variable. The Num. uncovered races shows the number of races that are guaranteed to be real races on that variable. Typically, one would first explore variables at have at least one uncovered race. Also, this number is always less than or equal to Num. races. Finally, the race classes shows whether the race fits into some known classes of races and lists those classes.

These are the three elements (memory locations) from our example page that have races. The Window[26].init is the JavaScript function init. Clicking on this memory location will show one race for it, that can be further inspected. The detailed inspection shows two operations that happen on this variable along with their call traces. The first operation (op1) is where we see the variable init initialized - this happens when the JavaScript function init is declared. The second operation (op2) is when the load event of the image was dispatched. This calls the init function (it reads the init variable). Clicking on the race on Window[26].init shows information like this:

EventRacer provides detailed information about races and enables web developers to inspect more details about the race. One can click on the JavaScript functions that performed the reads and writes of the init variable to look into the code.

In a similar manner, one can inspect the race on the other two memory locations - the DOM nodes g and y (Their names are Tree[0x7f64400a8458]:g and Tree[0x7f64400a8458]:y in our example) and one can see that their races are harmful as well.

Finally, we should note that not all races are harmful. EventRacer gives different colors to memory locations depending on how likely we think a harmful race is possible for a given type of memory location. However, to understand the actual effect of each race, a developer must inspect them manually.

Fixing races with synchronization

As our example has three races, here we suggest ways to fix the races in the code. Of course, it is possible to remove the races completely by rearranging the code. If the logo image is after the buttons, there will be no races. However, such changes are not always possible and even if they work, often they lead to slower loading time of the page.

Here, we suggest one simple way to fix some bad behaviors: by adding synchronization. Let us first solve the races on the “g” and “y” DOM elements. We can do this by changing the script slightly:

    function init() {

      var y = document.getElementById("y");

      if (!y) { setTimeout(init, 50); return; }

      var g = document.getElementById("g");

      ...

    }

With this simple fix, we check if the element y is present in the DOM and if it is not yet present, we delay the initialization and the attachment of the event handler. We do not need to check for g, because when y is present, g must be present as well.

We can now analyze the fixed webpage and observe that the race on g has disappeared. However, the race on y did not.  The issue here is that while the JavaScript code includes logic to handle the case where y is not yet initialized, EventRacer is not yet able to prove that this logic is correct.  Since there is a real race on JavaScript variable y, depending on whether the init function is invoked before or after the img with “id” y is present in the DOM, EventRacer still reports the race.  It is still useful for the developer to inspect this type of race and ensure that the synchronization logic is correct.  Also, note that EventRacer is able to successfully understand that there is no race on the element g if y was synchronized properly.  This is a unique feature of EventRacer: standard race detectors will show races on both g and y, but EventRacer knows that the race on g is “covered” by the race on y, and hence only shows the “uncovered” race on y at first.

Going back to fixing our example, we can use logic similar to that above to fix the race on the init function. In fact, we can leverage the synchronization logic on y and call init directly from the script instead of at the onload event of the logo image. See the fully fixed example here. It still has the race on y, but that race now reflects useful synchronization.

Conclusion

EventRacer is a tool to analyze websites for races. Races can be harmful bugs, or they can be used to synchronize operations. When a web developer analyzes a website with EventRacer, she should read the code around the race to see if the code can handle correctly both orders of the operations in the race.

EventRacer is the first freely available tool for analyzing races in web applications. To help the web developers do their job quickly, EventRacer provides a powerful user interface that enables quick exploration of the races and enables the web developer to investigate the concurrency in web applications.