ARIA Grid: Supporting nonvisual layout and keyboard traversal

Modern web applications contain complex and dense user interface patterns — infinitely scrolling lists of content, menu bars, and complex data tables with interactive controls in cells, to name a few components. With a mouse pointer, a person can easily traverse the controls and items of an application. For a keyboard user, traversing a page via the Tab key becomes more cumbersome as the number of controls and items increases. A modern web page may contain hundreds of tab stops — elements that can be traversed using the Tab key.

At Facebook, we are experimenting with a user interface pattern for traversing a page with a keyboard that we call a logical grid. A logical grid reduces numerous tab stops to a single tab stop within a part of the interface designated as a grid. From the single tab stop, a person can traverse items in the grid using arrow keys. In addition, Accessible Rich Internet Application (ARIA) Grid semantics express the structure to assistive technology users.

The logical grid is a generic UI design pattern that web applications can adopt to reduce interface complexity and improve keyboard interaction support. Implementors should consult the WAI-ARIA Authoring Practices Guide for specific recommendations on key and behavior mapping. In this post, we provide helpful examples and considerations for applying this pattern.

Applying a logical grid to a to-do list

Consider a to-do item list application on the web. In this application, a person can tab to the Mark Finished and Delete button of each item. First let’s look at the HTML markup for this list:


<ul>
  <li>
    <span>Buy milk</span>
    <button>Mark Finished</button>
    <button>Delete</button>
  </li>
  <li>
    <span>Recycle the used batteries</span>
    <button>Mark Finished</button>
    <button>Delete</button>
  </li>
  <li>
    <span>Write thank you cards</span>
    <button>Mark Finished</button>
    <button>Delete</button>
  </li>
</ul>

The linear ordering of DOM elements imposes a natural tabbing order in our list. Tabbing through this UI, a person would focus on the following elements: Mark Finished, Delete, Mark Finished, Delete, Mark Finished, Delete. HTML and browsers give us no easy way to traverse just the Mark Finished buttons or the Delete buttons in a linear order.

Additionally, for each item added to the list, we add two more tabbable controls to the page. Traversing to the Mark Finished button of the 10th item requires tabbing through the 18 buttons of the preceding 9 items first. There is no easy way to traverse the text of each task with the keyboard either, since these text elements have no tabindex.

We have identified three issues:

Linear tabbing paths through the DOM are cluttered; intermediary nodes must be tabbed through from point A to B.
Adding tabbable elements to a user interface increases traversal path lengths for keyboard users.
Non-tabbable elements cannot be reached via keyboard traversal.

Now, let’s image this to-do list laid out in a tabular layout. Column 1 contains the text of the item; column 2, the Mark Finished buttons; column 3, the Delete buttons. Each row is an item in the to-do list. This layout is depicted schematically below.

         | Column 1                   | Column 2        | Column 3 |
| Row 1  | Buy milk (tabbable)        | [Mark Finished] | [Delete] |
| Row 2  | Recycle the used batteries | [Mark Finished] | [Delete] |
| Row 3  | Write thank you cards      | [Mark Finished] | [Delete] |

Keeping this schematic in mind, someone would arrive at Row 1/Column 1 when when tabbing into this grid. From there, she uses the arrow keys to traverse left, right, up, and down through the grid. When a cell is focused, the cell’s focus strategy determines where the focus is placed. In the case of the text in Column 1, the cell itself is focused. In the case of the buttons in columns 2 and 3, the buttons are focused. Their behavior is maintained; pressing Enter activates the button. The user can traverse to the Mark Finished button in Row 1, mark the item finished, then press down to focus the Mark Finished button in Row 2.

A grid provides us with generic and predictable spatial relationships that afford traversal. The logical grid emerges from the management of focus in the DOM. When focus is given to an item in a grid, the grid toggles into a state of managing focus; pressing Tab exits the grid and it ceases to manage focus. When a grid is managing focus, traversal key presses (up, down, left, right) are translated into focus movement from cell to cell. We can also provide key combinations to move to the start and end of rows and columns.

ARIA Grid semantics

As a web developer, you are probably familiar with the adage that tables are for tabular data, not layout. The semantics of a <table> element do in fact imply the presentation of data. The semantics afforded by ARIA Grid do not have this limitation. ARIA Grid expresses a two-dimensionality — in contrast to the one-dimensional paths through a DOM — without also requiring that the grid cell contain data exclusively. Cells in an ARIA Grid may contain any content, even operable elements.

The roles that ARIA Grid provides should be recognizable: grid, row, grid cell, row header, and column header. One might use them like this to grid the to-do list example above.


<div role="grid">
  <div role="row" class="accessible_elem">
    <div role="columnheader">Items</div>
    <div role="columnheader">Mark Finished actions</div>
    <div role="columnheader">Delete actions</div>
  </div>
  <ul role="presentation">
    <li role="row">
      <span role="gridcell">Buy milk</span>
      <span role="gridcell">
        <button>Mark Finished</button>
      </span>
      <span role="gridcell">
        <button>Delete</button>
      </span>
     </li>
     <li role="row">
       <span role="gridcell">Recycle the used batteries</span>
       <span role="gridcell">
         <button>Mark Finished</button>
       </span>
       <span role="gridcell">
         <button>Delete</button>
       </span>
     </li>
     <li role="row">
       <span role="gridcell">Write thank you cards</span>
       <span role="gridcell">
         <button>Mark Finished</button>
       </span>
       <span role="gridcell">
         <button>Delete</button>
       </span>
     </li>
   </ul> 
</div>

The ARIA Grid roles inflect the HTML. The role of the list element <li> is recast as “row.” The inherent “list” role of the <ul> element is eliminated with “presentation.” The <button> elements are wrapped in <span> elements that have the “gridcell” role. The column headers are listed inside a row div using an offscreen hiding technique so that the text is rendered in the DOM but is invisible.

For a visual user, the ARIA Grid semantics are imperceptible. There is no change to the visual user interface. For a visitor using assistive technology such as a screen reader, the column and row information is available to query. The to-do list expresses a structure that it could not have expressed using a one-dimensional list element.

Keep in mind that the semantics of ARIA Grid imply that your application provides support for traversing cells in the grid via arrow keys, as was discussed in the to-do list example above. The full list of traversal actions is documented in the Data Grid section of the WAI-ARIA Authoring Practices guide.

Example implementation in React.js

Our implementation manifested as three wrapping components: LogicalGrid, LogicalGridRow, and LogicalGridCell. Each component corresponds roughly to an ARIA role: grid, row, and grid cell.


<LogicalGrid initialFocusCellCoordinate={[0, 0]}>
  <div>
    <LogicalGridRow rowIndex={0}>
      <LogicalGridCell columnIndex={0}>
        <a href="#">Figs</a>
      </LogicalGridCell>
      <div>
        <div>
          <LogicalGridCell columnIndex={1}>
            <a href="#">Limes</a>
          </LogicalGridCell>
        </div>
      </div>
    </LogicalGridRow>
    <LogicalGridRow rowIndex={1}>
      ...
    </LogicalGridRow>
  </div>
</LogicalGrid>

This would produce output HTML like this:


<div role="grid">
  <div>
    <div role="row">
      <span role="gridcell">
        <a href="#">Figs</a>
      </span>
      <div>
        <div>
          <span role="gridcell">
            <a href="#">Limes</a>
          </span>
        </div>
      </div>
    </div>
  </div>
</div>

Authors specify the index of each row through the rowIndex prop and the index of each cell in each row through the columnIndex prop. In practice, the row indices are often just the 0-based index argument passed to the callback function in an Array map invocation, and the column indices are often constants defined by the cell type. The index values allow the grid controller to construct a two-dimensional map of the cells in the grid.

The three LogicalGrid components were designed to allow product authors to easily layer the grid behavior into their UIs without concern for managing focus or keyboard interaction. The LogicalGrid components can be included in the same module or spread across separate modules as long as they render in an ancestor relationship (grid ← row ← cell), intervening components allowed, in the same React DOM fragment.

Challenges we faced

In the course of developing a generic grid traversal approach using React components, we had to overcome several obstacles. These included:

Managing focus across grid cells and relinquishing this control.
Managing focus inside grid cells.
Muting tabbable elements outside of cells, within the grid DOM tree.
Gaps in the grid.

Managing focus across grid cells

One of the primary features of the LogicalGrid component is managing focus. Each grid has a single entry point — either the initial cell to focus or the cell that last had focus. Focus might arrive through a tab action from outside the grid or a click action to any focusable element of the grid. In both cases, the grid responds by switching to a state of active focus management. Pressing an arrow key while the grid is managing focus will move focus from cell to cell.

A grid can lose focus if the person tabs out of the grid or clicks on an element of the page outside the grid. In this case, the grid must return to a state of not managing focus; it no longer listens for arrow key presses. The grid root element listens to focus and blur events (React polyfills focusin and focusout events, providing what amounts to focus and blur bubbling) and sets the managing focus state accordingly.

The first contradicting case we ran into during development was cell or row deletion. When an interaction results in a piece of the UI being removed or replaced, a blur event is emitted when a focused DOM node is removed. In these cases, we don’t want to relinquish focus management in the grid because a cell was deleted and its backing node blurs. Instead, we want to move focus to a neighboring cell, or place it on the cell that replaces the deleted cell. Focus recovery is tied to the React component mounting, updating, and unmounting life cycle events, which give us insight into DOM mutation actions.

Managing focus inside grid cells

Simply focusing a cell is not in itself sufficient to provide interactivity. The cells in a logical grid might contain anything — text, links, forms, complex widgets. We decided that it would be impossible to provide a complete solution to what ultimately ends up being business logic. So each LogicalGridCell accepts a FocusStrategy object. For cells that contain text, the simplest focus strategy is “focus the cell,” and it is the default. We also provided a second strategy object — “focus the first tabbable element in the cell.” In practice, this often ends up being a single link in a cell.

We anticipate more strategies will follow as teams begin to adopt the pattern. One pattern might be “focus the first tabbable element and allow the user to use the Tab key to move in a closed tab ring in the cell.” This strategy would be useful in cells that contain form elements. Another strategy might be “press Esc to exit the cell,” which would defer arrow key presses to a widget in the cell such as a spinner.

Muting tabbable elements outside of cells

One of the more challenging aspects of designing a focus manager that is declarative was making sure that it had full control of every element in the grid. Once someone tabs into a grid, a second press of the Tab key (with or without the Shift modifier) should move focus out of the grid to the next focusable item.

In some of our implementations, we encountered rogue focusable elements that were not part of the logical grid structure. For instance, in a large interactive unit like a list of search results, a result type that we had not wrapped in grid cells would render, resulting in tabbable content inside the grid but outside a cell.

We solved this problem by neutralizing all tabbable elements in a grid with a negative tab index to prevent them from being focusable. In order to restore the original tab index values for these elements (on unmount or deactivation of the grid behavior), we stored the original tab indices in a maplike data structure. Preventing these kinds of side effects was necessary to make the component fully declarative.

Gaps in the grid

Let’s consider a special use case of a logical grid — a list of homogenous search results. Since every search result contains a title and a description, our search results would look like this, as perceived through a logical grid:

         | Column 1 | Column 2    |
| Row 1  | Title    | Description |
| Row 2  | Title    | Description |
| Row 3  | Title    | Description |

In our internal search page, every search result has different fields depending on whether it is referring to an internal wiki article, an employee profile, or one of our internal tools. It might look like this, as perceived through a logical grid:

         | Column 1 | Column 2    | Column 3 | Column 4      | Column 5 | Column 6        
| Row 1  | Title    | Description | Type     | Creation date |          |
| Row 2  | Title    |             | Type     |               |          |
| Row 3  | Title    | Description | Type     |               | Location | Available

We wanted each column to contain field values of a single kind: title, description, type, location, etc. As illustrated above, this approach led to gaps in the underlying logical grid model. We endeavored to develop traversal behavior to jump these gaps in a way that feels intuitive. For right/left movement, we find the next or previous column with content in it. If none exists, we wrap to the previous or next row and continue the search. Jumping gaps up and down columns requires an opinion on how best to locate a cell with content. We decided not to skip a row when moving up and down columns. Instead, we look for a cell with content in the next row that precedes the empty cell (leftward movement). One could easily argue that it makes sense to skip rows until the previous or next row with a cell in the same columns is found. It may turn out that our component should provide a configuration setting to switch between these approaches to suit the data set.

Applying this UX pattern widely

There are still a few open-ended questions with the logical grid interaction pattern. One of these is discoverability. For people traversing the page with a screen reader, the presence of a grid is announced. One can reasonably expect a grid to have standard key traversal support. For a screen user, a logical grid is transparent. Currently there is no established visual cue to indicate that a grid exists and that the user has tabbed into it. We have yet to land on an acceptable visual design to indicate a grid and the support of arrow-key traversal.

For that reason, we’re introducing the pattern slowly and with limited exposure. Each time we expand the audience or the implement the grid in a new product, we discover new edge cases and learn new lessons. Our hope is that the grid interaction pattern will become a recognizable and expected pattern of traversing through UI components on the web.