Michael Kelly

External Tilesets with Tiled and Phaser

March 3, 2019 phaser gamedev

Tiled is a popular tilemap editor, and Phaser has great built-in support for it. One feature of Tiled that Phaser doesn't support is external tilesets.

In Tiled, a tileset can either be internal, meaning all the data for the tileset is included in the tilemap itself, or external, meaning that the tileset is a standalone file separate from the tilemap. The main benefit of external tilesets is that they can be shared between maps. You can update and change the tileset without having to update per-tilemap copies everywhere.

Phaser, however, requires that tilesets be stored internally in the tilemaps they're used in. I finally ran into a point where I wanted multiple tilemaps in my game and wrote a custom loader that supports external tilesets.

It's called phaser-tiled-json-external-loader, and you can install it via NPM or manually download a JavaScript bundle to load in your game's HTML file. The README has more information on how to install and use the library. I've also got a Glitch project showing the library in action:

Why doesn't Phaser support external tilesets?

I can only speculate based on the code1. Phaser loads tilemaps as JSON, and doesn't actually parse that JSON until you attempt to create a tilemap object during a scene's create phase. While parsing the Tiled JSON, it tries to load each tileset:

//  name, firstgid, width, height, margin, spacing, properties
var set = json.tilesets[i];

if (set.source)
{
    console.warn('Phaser can\'t load external tilesets. Use the Embed Tileset button and then export the map again.');
}

At this point, we're past the preload step of the scene, and the API for creating tilemaps isn't asynchronous, so going back and loading another external JSON file isn't really an option at this point.

In my opinion, a "proper" fix would be similar to how the images for tilemaps are handled. Even with internal tilesets, the images used in the tilesets must be loaded separately and passed when creating a tileset:

const scene = {
  preload() {
    // The tileset image is not automagically loaded by Phaser
    this.load.image('tilesetImage', 'https://cdn.glitch.com/1780c601-5e7d-42f6-8757-c55452affe65%2Ftiles.png?1551608607854');
    this.load.tilemapTiled('tilemap', 'tilemap.json');
  },

  create() {
    const tilemap = this.make.tilemap({key: 'tilemap'});
    // tilesetImage here is referring to the manually-loaded tileset image above
    const tileset = tilemap.addTilesetImage('tiles', 'tilesetImage');
    tilemap.createStaticLayer('layer1', tileset, 0, 0);
  },
};

Similarly, external tilesets should probably be a new type of thing that you could load in the preload step and associate with one (or many) tilemaps.

I tried to figure out how to write a patch like this to fix Phaser directly, but there are multiple types of tilemaps and tilesets supported in Phaser, and I don't really understand the internals well enough yet.

So how does the loader work?

So if Phaser only supports internal tilesets, and doesn't parse the tilemap until the create step, what if we loaded and inserted the external tilesets into our tilemaps before Phaser tried parsing them? Some people on the Phaser Discord recommend I write a preprocessor to do this (as they had been doing for a while), but I wanted to build something a bit more broadly reusable.

I spent a few hours reading the code for how loaders work in Phaser and found out that there are things called MultiFile loaders that support loading dependent files based on the contents of a manifest-like file. Using the MutliAtlasFile loader as a based, I wrote a new loader that:

  1. Loads the tilemap JSON
  2. Finds all tilesets that have a source property
  3. Processes each source property as relative to the tilemap's URL to get the URL for each tileset
  4. Loads each external tileset
  5. Inserts each loaded tileset back into the tilemap JSON
  6. Adds the modified tilemap JSON into the tilemap cache

I tested with my own game and it seemed to work fine. The remaining steps were to add a webpage-ready bundle for projects that aren't using NPM or a JavaScript bundler, write instructions, and publish the package on NPM.

Caveats

There's still some caveats to this method of loading external tilesets:

But it works for me and I did it for free so whooooooooooo caressssssss


  1. I mean I actually could just ask the maintainer if I really wanted to.

Phaser Tutorial Series: Finite State Machine

February 23, 2019 mozilla phaser gamedev

I've been working on a game using Phaser in my spare time:

One thing that's made adding new features really easy is using finite-state machines to model behavior. Almost everything in the animation above is backed by a state machine: the player, the platform, the grappling hook, the statue, and the fireballs.

This post is going to assume some familiarity with the basics of Phaser, such as the preload/create/update steps, Arcade physics, and keyboard input. You may also be able to follow along if you're not familiar with Phaser, but it's okay if not! This use of state machines isn't specific to Phaser.

What is a finite-state machine? Fuck that let's make games

Let's start with a fairly empty example project. Here it is on Glitch. You can use the remix button to create your own copy and follow along the tutorial as we go:

Pretty much all of our work is happening in client.js. It starts out looking something like this:

/* global Phaser */

const config = {
  type: Phaser.AUTO,
  width: 400,
  height: 300,
  pixelArt: true,
  zoom: 2,
  physics: {
    default: 'arcade'
  },
  scene: {
    preload() {
      this.load.spritesheet('hero', 'https://cdn.glitch.com/59aa1c5f-c16d-41a1-bfd2-09072e84a538%2Fhero.png?1551136698770', {
        frameWidth: 32,
        frameHeight: 32,
      });
      this.load.image('bg', 'https://cdn.glitch.com/59aa1c5f-c16d-41a1-bfd2-09072e84a538%2Fbg.png?1551136995353');
    },

    create() {
      // Static background
      this.add.image(200, 200, 'bg');

      // The movable character
      this.hero = this.physics.add.sprite(200, 150, 'hero', 0);
    },

    update() {

    },
  }
};

window.game = new Phaser.Game(config);

We're loading some images in the preload step, and adding the background and hero sprite in the create step. The hero is drawn on the background, but nothing else happens.

MAKE IT WALK

Let's add a this.keys variable for reading input from the keyboard. We can use that in the update method to check which keys are being pressed and set the hero's velocity appropriately:

@@ -19,6 +19,8 @@
     },

     create() {
+      this.keys = this.input.keyboard.createCursorKeys();
+
       // Static background
       this.add.image(200, 200, 'bg');

@@ -27,7 +29,20 @@
     },

     update() {
-
+      // Stop movement from last update
+      this.hero.setVelocity(0);
+
+      // Set new velocity based on input
+      if (this.keys.up.isDown) {
+        this.hero.setVelocityY(-100);
+      } else if (this.keys.down.isDown) {
+        this.hero.setVelocityY(100);
+      }
+      if (this.keys.left.isDown) {
+        this.hero.setVelocityX(-100);
+      } else if (this.keys.right.isDown) {
+        this.hero.setVelocityX(100);
+      }
     },
   }
 };

MAKE IT LOOK LIKE IT'S WALKING

Now the hero is moving about the map, but it doesn't look like he's walking. To do that, we'll need to do two things:

  1. Define some animations from our sprite sheet in the create function. Our sheet is split into 32x32 pixel squares, so we can use generateFrameNumbers to generate animation data by giving it start and end indexes for the animation frames. These are numbered from top left to bottom right.
  2. Trigger the proper animations in the update function. We also track whether the player is moving or not, and if they aren't, we stop the current animation to stop the player from walking. Note the true passed to the play function: this tells Phaser to not restart the animation if it's already playing.
@@ -26,22 +26,61 @@

       // The movable character
       this.hero = this.physics.add.sprite(200, 150, 'hero', 0);
+
+      // Animation definitions
+      this.anims.create({
+        key: 'walk-down',
+        frameRate: 8,
+        repeat: -1,
+        frames: this.anims.generateFrameNumbers('hero', {start: 0, end: 3}),
+      });
+      this.anims.create({
+        key: 'walk-right',
+        frameRate: 8,
+        repeat: -1,
+        frames: this.anims.generateFrameNumbers('hero', {start: 4, end: 7}),
+      });
+      this.anims.create({
+        key: 'walk-up',
+        frameRate: 8,
+        repeat: -1,
+        frames: this.anims.generateFrameNumbers('hero', {start: 8, end: 11}),
+      });
+      this.anims.create({
+        key: 'walk-left',
+        frameRate: 8,
+        repeat: -1,
+        frames: this.anims.generateFrameNumbers('hero', {start: 12, end: 15}),
+      });
     },

     update() {
       // Stop movement from last update
+      let moving = false;
       this.hero.setVelocity(0);

       // Set new velocity based on input
       if (this.keys.up.isDown) {
         this.hero.setVelocityY(-100);
+        this.hero.anims.play('walk-up', true);
+        moving = true;
       } else if (this.keys.down.isDown) {
         this.hero.setVelocityY(100);
+        this.hero.anims.play('walk-down', true);
+        moving = true;
       }
       if (this.keys.left.isDown) {
         this.hero.setVelocityX(-100);
+        this.hero.anims.play('walk-left', true);
+        moving = true;
       } else if (this.keys.right.isDown) {
         this.hero.setVelocityX(100);
+        this.hero.anims.play('walk-right', true);
+        moving = true;
+      }
+
+      if (!moving) {
+        this.hero.anims.stop();
       }
     },
   }

MAKE IT UNNECESSARILY VIOLENT

Next, let's make the player swing their sword when we press the space key. This actually involves a few steps:

  1. Check if the space key is pressed.
  2. Stop player movement while the sword is being swung.

    We'll need to know if the hero is currently swinging their sword, so we'll add a swinging variable on this.hero that determines if the swinging animation is still playing.

  3. Determine which direction the player is facing.

    Figuring out the direction requires that we add a new variable called direction to keep track between walking and swinging. Storing this on the this.hero object makes it clear that the direction isn't for, say, an enemy we may add later.

  4. Play the sword-swinging animation for the appropriate direction.
  5. Once the animation is done playing, switch back to the non-sword-swinging sprites and allow movement again.

Doing all of this with the movement code is tricky, and difficult to split into single code changes. You may want to take a bit to look over the diff to understand the changes:

@@ -26,6 +26,8 @@

       // The movable character
       this.hero = this.physics.add.sprite(200, 150, 'hero', 0);
+      this.hero.direction = 'down';
+      this.hero.swinging = false;

       // Animation definitions
       this.anims.create({
@@ -52,6 +54,32 @@
         repeat: -1,
         frames: this.anims.generateFrameNumbers('hero', {start: 12, end: 15}),
       });
+
+      // NOTE: Sword animations do not repeat
+      this.anims.create({
+        key: 'swing-down',
+        frameRate: 8,
+        repeat: 0,
+        frames: this.anims.generateFrameNumbers('hero', {start: 16, end: 19}),
+      });
+      this.anims.create({
+        key: 'swing-up',
+        frameRate: 8,
+        repeat: 0,
+        frames: this.anims.generateFrameNumbers('hero', {start: 20, end: 23}),
+      });
+      this.anims.create({
+        key: 'swing-right',
+        frameRate: 8,
+        repeat: 0,
+        frames: this.anims.generateFrameNumbers('hero', {start: 24, end: 27}),
+      });
+      this.anims.create({
+        key: 'swing-left',
+        frameRate: 8,
+        repeat: 0,
+        frames: this.anims.generateFrameNumbers('hero', {start: 28, end: 31}),
+      });
     },

     update() {
@@ -59,28 +87,43 @@
       let moving = false;
       this.hero.setVelocity(0);

-      // Set new velocity based on input
-      if (this.keys.up.isDown) {
-        this.hero.setVelocityY(-100);
-        this.hero.anims.play('walk-up', true);
-        moving = true;
-      } else if (this.keys.down.isDown) {
-        this.hero.setVelocityY(100);
-        this.hero.anims.play('walk-down', true);
-        moving = true;
-      }
-      if (this.keys.left.isDown) {
-        this.hero.setVelocityX(-100);
-        this.hero.anims.play('walk-left', true);
-        moving = true;
-      } else if (this.keys.right.isDown) {
-        this.hero.setVelocityX(100);
-        this.hero.anims.play('walk-right', true);
-        moving = true;
-      }
-
-      if (!moving) {
-        this.hero.anims.stop();
+      // If we're swinging a sword, wait for the animation to finish
+      if (!this.hero.swinging) {
+        // Swinging a sword overrides movement
+        if (this.keys.space.isDown) {
+          this.hero.swinging = true;
+          this.hero.anims.play(`swing-${this.hero.direction}`, true);
+          this.hero.once('animationcomplete', () => {
+            this.hero.anims.play(`walk-${this.hero.direction}`, true);
+            this.hero.swinging = false;
+          });
+        } else {
+          // Set new velocity based on input
+          if (this.keys.up.isDown) {
+            this.hero.setVelocityY(-100);
+            this.hero.direction = 'up';
+            moving = true;
+          } else if (this.keys.down.isDown) {
+            this.hero.setVelocityY(100);
+            this.hero.direction = 'down';
+            moving = true;
+          }
+          if (this.keys.left.isDown) {
+            this.hero.setVelocityX(-100);
+            this.hero.direction = 'left';
+            moving = true;
+          } else if (this.keys.right.isDown) {
+            this.hero.setVelocityX(100);
+            this.hero.direction = 'right';
+            moving = true;
+          }
+
+          if (!moving) {
+            this.hero.anims.stop();
+          } else {
+            this.hero.anims.play(`walk-${this.hero.direction}`, true);
+          }
+        }
       }
     },
   }

MAKE IT DO MORE?

Okay so the hero is now swinging their sword, next we want to add the ability for them to jump, or maybe we want to handle collision detection, or maybe add some enemy logic to the update loop, or... well, you get the idea. We've barely added some basic functionality to the game and already the update loop is getting difficult to manage.

The core problem here is that, to add some new feature to the player, like a new weapon or ability, we need to think about every other thing the player can do. What happens if the player uses a hookshot while moving? What if they use a jump power while moving? One may freeze the player in place while the other retains their momentum. There's too much state to keep in our heads.

Enter state machines. The idea is to model the player's behavior by assigning them a single "state" to be in. When a player is in a "state", they can "transition" to another state if a condition is met, which replaces the current state with a new one. If we design our states and transitions correctly, we can control the amount of info we need to keep in our head when writing new features.

I find the state machine from the Wikipedia article on state machines to be a great example:

A state machine modelling a turnstile
A state machine diagram for a subway turnstile. The "Locked" state is the initial state.

The diagram above illustrates a subway turnstile that is locked until you drop a coin into it, which unlocks it and allows one person to walk through before becoming locked again. The state machine has two states:

In the same way that this diagram models the behavior of the real turnstile, we can create a similar diagram that models how we want our player to behave:

A state machine modelling the hero
I am not the best diagram-maker.

The entire diagram itself is a little messy, but the point is that this model allows us to implement each state in isolation, resulting in cleaner, easier-to-maintain code.

Coding a State Machine

We're going to create a StateMachine class that handles storing the current active state, storing a list of all possible states, and transitioning from the current state to a new state. But transitioning alone doesn't really do anything.

Besides transitioning, we also want to:

There are several options for how to represent a state in our code. One is to use classes, which allows us to inherit from a base State class to get default enter and execute functions.

@@ -1,5 +1,46 @@
 /* global Phaser */

+class StateMachine {
+  constructor(initialState, possibleStates, stateArgs=[]) {
+    this.initialState = initialState;
+    this.possibleStates = possibleStates;
+    this.stateArgs = stateArgs;
+    this.state = null;
+
+    // State instances get access to the state machine via this.stateMachine.
+    for (const state of Object.values(this.possibleStates)) {
+      state.stateMachine = this;
+    }
+  }
+
+  step() {
+    // On the first step, the state is null and we need to initialize the first state.
+    if (this.state === null) {
+      this.state = this.initialState;
+      this.possibleStates[this.state].enter(...this.stateArgs);
+    }
+
+    // Run the current state's execute
+    this.possibleStates[this.state].execute(...this.stateArgs);
+  }
+
+  transition(newState, ...enterArgs) {
+    this.state = newState;
+    this.possibleStates[this.state].enter(...this.stateArgs, ...enterArgs);
+  }
+}
+
+class State {
+  enter() {
+
+  }
+
+  execute() {
+
+  }
+}
+
+
 const config = {
   type: Phaser.AUTO,
   width: 400,

There are two things to note in the code above:

With this state machine implementation, we can replace our nest of if statements with classes for each state we modeled on our diagram:

@@ -27,7 +68,14 @@
       // The movable character
       this.hero = this.physics.add.sprite(200, 150, 'hero', 0);
       this.hero.direction = 'down';
-      this.hero.swinging = false;
+
+      // The state machine managing the hero
+      this.stateMachine = new StateMachine('idle', {
+        idle: new IdleState(),
+        move: new MoveState(),
+        swing: new SwingState(),
+      }, [this, this.hero]);
+

       // Animation definitions
       this.anims.create({
@@ -83,50 +131,79 @@
     },

     update() {
-      // Stop movement from last update
-      let moving = false;
-      this.hero.setVelocity(0);
-
-      // If we're swinging a sword, wait for the animation to finish
-      if (!this.hero.swinging) {
-        // Swinging a sword overrides movement
-        if (this.keys.space.isDown) {
-          this.hero.swinging = true;
-          this.hero.anims.play(`swing-${this.hero.direction}`, true);
-          this.hero.once('animationcomplete', () => {
-            this.hero.anims.play(`walk-${this.hero.direction}`, true);
-            this.hero.swinging = false;
-          });
-        } else {
-          // Set new velocity based on input
-          if (this.keys.up.isDown) {
-            this.hero.setVelocityY(-100);
-            this.hero.direction = 'up';
-            moving = true;
-          } else if (this.keys.down.isDown) {
-            this.hero.setVelocityY(100);
-            this.hero.direction = 'down';
-            moving = true;
-          }
-          if (this.keys.left.isDown) {
-            this.hero.setVelocityX(-100);
-            this.hero.direction = 'left';
-            moving = true;
-          } else if (this.keys.right.isDown) {
-            this.hero.setVelocityX(100);
-            this.hero.direction = 'right';
-            moving = true;
-          }
-
-          if (!moving) {
-            this.hero.anims.stop();
-          } else {
-            this.hero.anims.play(`walk-${this.hero.direction}`, true);
-          }
-        }
-      }
+      this.stateMachine.step();
     },
   }
 };

+class IdleState extends State {
+  enter(scene, hero) {
+    hero.setVelocity(0);
+    hero.anims.play(`walk-${hero.direction}`);
+    hero.anims.stop();
+  }
+
+  execute(scene, hero) {
+    const {left, right, up, down, space} = scene.keys;
+
+    // Transition to swing if pressing space
+    if (space.isDown) {
+      this.stateMachine.transition('swing');
+      return;
+    }
+
+    // Transition to move if pressing a movement key
+    if (left.isDown || right.isDown || up.isDown || down.isDown) {
+      this.stateMachine.transition('move');
+      return;
+    }
+  }
+}
+
+class MoveState extends State {
+  execute(scene, hero) {
+    const {left, right, up, down, space} = scene.keys;
+
+    // Transition to swing if pressing space
+    if (space.isDown) {
+      this.stateMachine.transition('swing');
+      return;
+    }
+
+    // Transition to idle if not pressing movement keys
+    if (!(left.isDown || right.isDown || up.isDown || down.isDown)) {
+      this.stateMachine.transition('idle');
+      return;
+    }
+
+    hero.setVelocity(0);
+    if (up.isDown) {
+      hero.setVelocityY(-100);
+      hero.direction = 'up';
+    } else if (down.isDown) {
+      hero.setVelocityY(100);
+      hero.direction = 'down';
+    }
+    if (left.isDown) {
+      hero.setVelocityX(-100);
+      hero.direction = 'left';
+    } else if (right.isDown) {
+      hero.setVelocityX(100);
+      hero.direction = 'right';
+    }
+
+    hero.anims.play(`walk-${hero.direction}`, true);
+  }
+}
+
+class SwingState extends State {
+  enter(scene, hero) {
+    hero.setVelocity(0);
+    hero.anims.play(`swing-${hero.direction}`);
+    hero.once('animationcomplete', () => {
+      this.stateMachine.transition('idle');
+    });
+  }
+}
+
 window.game = new Phaser.Game(config);

This is a lot to unpack. Some highlights of the changes:

Okay but why?

At first glance it may seem that the state machine code is longer than the old update method and more complex, and to some degree this is true. The reduction in complexity is not due to less code, but is instead due to less cognitive load. When we're working on the move state, we don't have to think about interfering with the idle and swing state logic as much as we previously did.

Let's say we want to add a dash in the current direction when the Shift key is pressed. Under the old code, we'd have to figure out where in the nest of if statements to check the shift key, and then probably add another level of conditions to avoid moving or attacking during a dash. With a state machine, we can add a new dash state and modify the existing states that can validly transition to a dash:

@@ -74,6 +74,7 @@
         idle: new IdleState(),
         move: new MoveState(),
         swing: new SwingState(),
+        dash: new DashState(),
       }, [this, this.hero]);


@@ -144,7 +145,7 @@
   }

   execute(scene, hero) {
-    const {left, right, up, down, space} = scene.keys;
+    const {left, right, up, down, space, shift} = scene.keys;

     // Transition to swing if pressing space
     if (space.isDown) {
@@ -152,6 +153,12 @@
       return;
     }

+    // Transition to dash if pressing shift
+    if (shift.isDown) {
+      this.stateMachine.transition('dash');
+      return;
+    }
+
     // Transition to move if pressing a movement key
     if (left.isDown || right.isDown || up.isDown || down.isDown) {
       this.stateMachine.transition('move');
@@ -162,7 +169,7 @@

 class MoveState extends State {
   execute(scene, hero) {
-    const {left, right, up, down, space} = scene.keys;
+    const {left, right, up, down, space, shift} = scene.keys;

     // Transition to swing if pressing space
     if (space.isDown) {
@@ -170,6 +177,12 @@
       return;
     }

+    // Transition to dash if pressing shift
+    if (shift.isDown) {
+      this.stateMachine.transition('dash');
+      return;
+    }
+
     // Transition to idle if not pressing movement keys
     if (!(left.isDown || right.isDown || up.isDown || down.isDown)) {
       this.stateMachine.transition('idle');
@@ -204,6 +217,32 @@
       this.stateMachine.transition('idle');
     });
   }
+}
+
+class DashState extends State {
+  enter(scene, hero) {
+    hero.setVelocity(0);
+    hero.anims.play(`swing-${hero.direction}`);
+    switch (hero.direction) {
+      case 'up':
+        hero.setVelocityY(-300);
+        break;
+      case 'down':
+        hero.setVelocityY(300);
+        break;
+      case 'left':
+        hero.setVelocityX(-300);
+        break;
+      case 'right':
+        hero.setVelocityX(300);
+        break;
+    }
+
+    // Wait a third of a second and then go back to idle
+    scene.time.delayedCall(300, () => {
+      this.stateMachine.transition('idle');
+    });
+  }
 }

 window.game = new Phaser.Game(config);

Is this fast?

No idea. I haven't hit issues with my own game. I'm not terribly concerned about performance as my game is just a demo right now, so take that with a grain of salt.

I don't think there's any glaring issues with it performance-wise, but I suspect having a bunch of state machines running each update loop might start to cause issues with their overhead. Some clever engineering could reuse states or even state machines between sprites, which might help.

What else could we do with this?

There's a lot of ideas I haven't touched upon here that are worth exploring:

Final Project

Here's the final version of the code used for this post, available as another Glitch project for your reading and remixing pleasure:

Data Collection at Mozilla: Browser Errors

April 11, 2018 mozilla

I’ve spent the past few months working on a project involving data collection from users of Nightly, the pre-release channel of Firefox that updates twice a day. I’d like to share the process from conception to prototype to illustrate

  1. One of the many ways ideas become reality at Mozilla, and
  2. How we care about and protect user privacy with regards to data collection.

Maybe JavaScript errors are a bad thing

The user interface of Firefox is written in JavaScript (along with XUL, HTML, and CSS). JavaScript powering the UI is “privileged” JavaScript, which is separate from JavaScript in a normal webpage, and can do things that normal webpages cannot do, such as read the filesystem.

When something goes wrong and an error occurs in this privileged JavaScript (let’s call them “browser errors”), it ends up logged to the Browser Console. Most users aren’t looking at the Browser Console, so these errors often go unnoticed.

While working on Shield, I found that our QA cycle1 involved a lot of time noticing and reporting errors in the Browser Console. Our code would often land on the Nightly channel before QA review, so why couldn’t we just catch errors thrown from our code and report them somewhere?2

So let’s a great plan

I told my boss a few times that browser error collection was a problem that I was interested in solving. I was pretty convinced that there was useful info to be gleaned from collecting these errors, but my beliefs aren’t really enough to justify building a production-quality error collection service. This was complicated by the fact that errors may contain info that can personally identify a user:

On top of all that, we didn’t even know how often these errors were occurring in the wild. Was this a raging fire of constant errors we were just ignoring, or was I getting all worried about nothing?

In the end, I proposed a 3-step research project:

  1. Run a study to measure the number of errors occurring in Nightly as well as the distribution of signatures.
  2. Estimate potential load using the study data, and build a prototype service. Grant access to the data to a limited set of employees and discover whether the data helps us find and diagnose errors.
  3. Shut down the prototype after 6 months or so and evaluate if we should build a production version of the system.

I wrote up this plan as a document that could be shared among people asking why this was an important project to solve. Eventually, my boss threw the idea past Firefox leadership, who agreed that it was a problem worth pursuing.

What even is happening out there

The first step was to find out how many errors we’d be collecting. One tool at our disposal at Mozilla is Shield, which lets us run small studies at targeted subsets of users. In this case, I wanted to collect data on how many errors were being logged on the Nightly channel.

To run the study, I had to fill out a Product Hypothesis Document (PHD) describing my experiment. The PHD is approved by a group in Mozilla with data science and experiment design experience. It’s an important step that checks multiple things:

Once the PHD was approved, I implemented the code for my study and created a Bugzilla bug for final review. Mozilla has a group of “data stewards” who are responsible for reviewing data collection to ensure it complies with our policies. Studies are not allowed to go out until they’ve been reviewed, and the results of the review are, in most cases, public and available in Bugzilla.

In our case, we decided to compute hashes from the error stacktraces and submit those to Mozilla’s data analysis pipeline. That allowed us to count the number of errors and view the distribution of specific errors without accidentally collecting personal data that may be in file paths.

I am perfect and infallible

The last steps after passing review in the bug were to announce the study on a few mailing lists to both solicit feedback from Firefox developers, and to inform our release team that we intended to ship a new study to users. Once the release team approved our launch plan, we launched and started to collect data. Yay!

A few days after launching Ekr, who had noticed the study on the mailing lists, reached out and voiced some concerns with our study.

While we were hashing errors before sending them, an adversary could precompute the hashes by running Firefox, triggering bugs they were interested in, and generating their own hash using the same method we were using. This, paired with direct access to our telemetry data, would reveal that an individual user had run a specific piece of code.

It was unclear if knowing that a user had run a piece of code could be considered sensitive data. If, for example, the error came from code involved with private browsing mode, would that constitute knowing that the user had used private browsing mode for something? Was that sensitive enough for us to not want to collect?

We decided to turn the study off while we tried to address these concerns. By that point, we had collected 2-3 days-worth of data, and decided that the risk wasn’t large enough to justify dropping the data we already had. I was able to perform a limited analysis on that data and determine that we were seeing tens of millions of errors per day, which was enough of an estimate for building the prototype. With that question answered, we opted to keep the study disabled and consider it finished rather than re-tool it based on Ekr’s feedback.

Can I collect the errors now

Mozilla already runs our own instance of Sentry for collecting and aggregating errors, and I have prior experience with it, so it seemed the obvious choice for the prototype.

With roughly 50 million errors per-day, I figured we could sample sending them to the collection service at a rate of 0.1%, or about 50,000 per-day. The operations team that ran our Sentry instance agreed that an extra 50,000 errors wasn’t an issue.

I spent a few weeks writing up a Firefox patch that collected the errors, mangled them into a Sentry-compatible format, and sent them off. Once the patch was ready, I had to get a technical review from a Firefox peer and a privacy review from a data steward. The patch and review process can be seen in the Bugzilla bug.

The process, as outlined on the Data Collection wiki page, involves three major steps:

Requesting Review

First, I had to fill out a form with several questions asking me to describe the data collection. I’m actually a huge fan of this form, because the questions force you to consider many aspects about data collection that are easy to ignore:

“Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements?”
It’s really easy to let curiosity or mild suspicion drive big chunks of work. The point of this question is to force you to think of a reason for doing the collection. Collecting data just because it is mildly convenient or interesting isn’t a good enough reason; it needs a purpose.
“What alternative methods did you consider to answer these questions? Why were they not sufficient?”
Data collection can’t simply be the first tool you reach for to answer your questions. If we want to be respectful of user privacy, we need to consider other ways of answering questions that don’t involve collecting data.
“List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the Mozilla wiki.”
The classification system we use for data makes it very clear how to apply our policies to the data you’re collecting. Browser errors, for example, are mostly category 2 data, but may potentially contain category 3 data and as such must be held to a higher standard.
“How long will this data be collected?”
If we can limit the time period in which we collect a piece of data, we can reduce the impact of data collection on users. I didn’t actually know time-limited collection was something to consider until I saw this question for the first time, but in fact several of our data collection systems enforce time limits by default.

Reviewing Request

Data stewards have their own form to fill out when reviewing a collection request. This form helps stewards be consistent in their judgement. Besides reviewing the answers to the review form from above, reviewers are asked to confirm a few other things:

Is the data collection documented in a publicly accessible place?
Sufficiently technical users should be able to see the schema for data being collected without having to read through the Firefox source code. Failing to provide this documentation mandates a failing review.
Is there a way for users to disable the collection?
There must be some way for users to disable the data collection. Missing this is also considered grounds for failure.

It’s important to note that this mechanism doesn’t need to be, say, a checkbox in the preferences UI. Depending on the context of the data collection, an about:config preference or some other mechanism may be good enough.

Rereing Viewquest?

In certain cases, requests may be escalated to Mozilla’s legal team if they involve changes to our privacy policy or other special circumstances. In the case of browser error collection, we wanted a legal review to double-check whether a user having used private browsing mode was considered category 2 or 3 data, as well as to approve our proposal for collecting category 3 data in error messages and file paths.

Our approach was to mimic what Mozilla already does with crashes; we collect the data and restrict access to the data to a subset of employees who are individually approved access. This helps make the data accessible only to people who need it, and their access is contingent on employment3. Legal approved the plan, which we implemented using built-in Sentry access control.

Welcome to errortown

With code and privacy review finished, I landed the patch and waited patiently for Sentry to start receiving errors. And it did!

Since we started receiving the data, I’ve spent most of my time recruiting Firefox developers who want to search through the errors we’re collecting, and refining the data we’re collecting to make it more more useful to those developers. Of course, changes to the data collection require new privacy reviews, although the smaller the changes are, the easier it is to fill out and justify the data collection.

But from my standpoint as a Mozilla employee, these data reviews are the primary way I see Mozilla making good on its promise to respect user privacy and avoid needless data collection. A lot of thought has gone into this process, and I can personally attest to their effectiveness.


  1. Firefox uses tons of automated testing, but we also have manual testing for certain features. In Shield's case, the time being wasted was in the manual phase.

  2. Actually, we already do collect crashes as part of the Socorro project, which I currently work on. But Socorro does not collect any info about the browser errors in question.

  3. Only some parts of crash data are actually private, and certain contributors who sign an NDA are also allowed access to that private data. We use centralized authorization to control access.

Using NPM Libraries in Firefox via Webpack

July 11, 2017 mozilla

I work on a system add-on for Firefox called the Shield Recipe Client. We develop it in a monorepo on Github along with the service it relies on and a few other libraries. One of these libraries is mozJexl, an expression language that we use to specify how to filter experiments and surveys we send to users.

The system add-on relies on mozJexl, and for a while we were pulling in the dependency by copying it from node_modules and using a custom CommonJS loader to make require() calls work properly. This wasn't ideal for a few reasons:

While working on another patch, I hit a point where I wanted to pull in ajv to do some schema validation and decided to see if I could come up with something better.

Webpack

I already knew that a few components within Firefox are using Webpack, such as debugger.html and Activity Stream. As far as I can tell, they bundle all of their code together, which is standard for Webpack.

I wanted to avoid this, because we sometimes get fixes from Firefox developers that we upstream back to Github. We also get help in the form of debugging from developers investigating issues that lead back to our add-on. Both of these would be made more difficult by landing webpacked code that is different from the source code we normally work on.

Instead, my goal was to webpack only the libraries that we want to use in a way that provided a similar experience to require(). Here's the Webpack configuration that I came up with:

/* eslint-env node */
var path = require("path");
var ConcatSource = require("webpack-sources").ConcatSource;
var LicenseWebpackPlugin = require("license-webpack-plugin");

module.exports = {
  context: __dirname,
  entry: {
    mozjexl: "./node_modules/mozjexl/",
  },
  output: {
    path: path.resolve(__dirname, "vendor/"),
    filename: "[name].js",
    library: "[name]",
    libraryTarget: "this",
  },
  plugins: [
    /**
     * Plugin that appends "this.EXPORTED_SYMBOLS = ["libname"]" to assets
     * output by webpack. This allows built assets to be imported using
     * Cu.import.
     */
    function ExportedSymbols() {
      this.plugin("emit", function(compilation, callback) {
        for (const libraryName in compilation.entrypoints) {
          const assetName = `${libraryName}.js`; // Matches output.filename
          compilation.assets[assetName] = new ConcatSource(
            "/* eslint-disable */", // Disable linting
            compilation.assets[assetName],
            `this.EXPORTED_SYMBOLS = ["${libraryName}"];` // Matches output.library
          );
        }
        callback();
      });
    },
    new LicenseWebpackPlugin({
      pattern: /^(MIT|ISC|MPL.*|Apache.*|BSD.*)$/,
      filename: `LICENSE_THIRDPARTY`,
    }),
  ],
};

(See also the pull request itself.)

Each entry point in the config is a library that we want to use, with the key being the name we're using to export it, and the value being the path to its directory in node_modules1. The output of this config is one file per entry point inside a vendor subdirectory. You can then import these files as if they were normal .jsm files:

Cu.import("resource://shield-recipe-client/vendor/mozjexl.js");
const jexl = new moxjexl.Jexl();

output.library

The key turned out to be Webpack's options for bundling libraries:

By setting output.library to a name like mozJexl, and output.libraryTarget to this, you can produce a bundle that assigns the exports from your entry point to this.mozJexl. In the configuration above, I use the webpack variable [name] to set it to the name for each export, since we're exporting multiple libraries with one config.

ExportedSymbols

Assuming that the bundle will work in a chrome environment, this is very close to being a JavaScript code module. The only thing missing is this.EXPORTED_SYMBOLS to define what names we're exporting. Luckily, we already know the name of the symbols being exported, and we know the filename that will be used for each entry point.

I used this info to write a small Webpack plugin that prepends an eslint-ignore comment to the start of each generated file (since we don't want to lint bundled code) and this.EXPORTED_SYMBOLS to the end of each generated file:

function ExportedSymbols() {
  this.plugin("emit", function(compilation, callback) {
    for (const libraryName in compilation.entrypoints) {
      const assetName = `${libraryName}.js`; // Matches output.filename
      compilation.assets[assetName] = new ConcatSource(
        "/* eslint-disable */", // Disable linting
        compilation.assets[assetName],
        `this.EXPORTED_SYMBOLS = ["${libraryName}"];` // Matches output.library
      );
    }
    callback();
  });
}

Licenses

During code review, mythmon brought up an excellent question; how do we retain licensing info for these files when we sync to mozilla-central? Turns out, there's a rather popular Webpack plugin called license-webpack-plugin that collects license files found during a build and outputs them into a single file:

new LicenseWebpackPlugin({
  pattern: /^(MIT|ISC|MPL.*|Apache.*|BSD.*)$/,
  filename: `LICENSE_THIRDPARTY`,
}),

(Why MIT/ISC/MPL/etc.? I just used what I thought were common licenses for libraries we were likely to use.)

Future Improvements

This is already a useful improvement over our old method of pulling in dependencies, but there are some potential improvements I'd eventually like to get to:


  1. Did you know that Webpack will automatically use the main module defined in package.json as the entry point if the path points to a directory with that file?

Q is Scary

June 8, 2017 mozilla

q is the hands-down winner of my "Libraries I'm Terrified Of" award. It's a Python library for outputting debugging information while running a program.

On the surface, everything seems fine. It logs everything to /tmp/q (configurable), which you can watch with tail -f. The basic form of q is passing it a variable:

import q

foo = 7
q(foo)

Take a good long look at that code sample, and then answer me this: What is the type of q?

If you said "callable module", you are right. Also, that is not a thing that exists in Python.

Also, check out the output in /tmp/q:

0.0s <module>: foo=7

It knows the variable name. It also knows that it's being called at the module level; if we were in a function, <module> would be replaced with the name of the function.

You can also divide (/) or bitwise OR (|) values with q to log them as well. And you can decorate a function with it to trace the arguments and return value. It also has a method, q.d(), that starts an interactive session.

And it does all this in under 400 lines, the majority of which is either a docstring or code to format the output.

Spooky
Spooky.

How in the Hell

So first, let's get this callable module stuff out of the way. Here's the last two lines in q.py:

# Install the Q() object in sys.modules so that "import q" gives a callable q.
sys.modules['q'] = Q()

Turns out sys.modules is a dictionary with all the loaded modules, and you can just stuff it with whatever nonsense you like.

The Q class itself is super-fun. Check out the declaration:

# When we insert Q() into sys.modules, all the globals become None, so we
# have to keep everything we use inside the Q class.
class Q(object):
    __doc__ = __doc__  # from the module's __doc__ above

    import ast
    import code
    import inspect
    import os
    import pydoc
    import sys
    import random
    import re
    import time
    import functools

"When we insert Q() into sys.modules, all the globals become None"

What? Why?! I mean I can see how that's not an issue for modules, which are usually the only things inside sys.modules, but still. I tried chasing this down, but the entire sys module is written in C, and that ain't my business.

Most of the other bits inside Q are straightforward by comparison; a few helpers for outputting stuff cleanly, overrides for __truediv__ and __or__ for those weird operator versions of logging, etc. If you've never heard of callable types1 before, that's the reason why an instance of this class can be both called as a function and treated as a value.

So what's __call__ do?

Ghost Magic

def __call__(self, *args):
    """If invoked as a decorator on a function, adds tracing output to the
    function; otherwise immediately prints out the arguments."""
    info = self.inspect.getframeinfo(self.sys._getframe(1), context=9)

    # ... snip ...

Welcome to the inspect module. Turns out, Python has a built-in module that lets you get all sorts of fun info about objects, classes, etc. It also lets you get info about stack frames, which store the state of each subroutine in the chain of subroutine calls that led to running the code that's currently executing.

Here, q is using a CPython-specific function sys._getframe to get a frame object for the code that called q, and then using inspect to get info about that code.

# info.index is the index of the line containing the end of the call
# expression, so this gets a few lines up to the end of the expression.
lines = ['']
if info.code_context:
    lines = info.code_context[:info.index + 1]

# If we see "@q" on a single line, behave like a trace decorator.
for line in lines:
    if line.strip() in ('@q', '@q()') and args:
        return self.trace(args[0])

...and then it just does a text search of the source code to figure out if it was called as a function or as a decorator. Because it can't just guess by the type of the argument being passed (you might want to log a function object), and it can't just return a callable that can be used as a decorator either.

trace is pretty normal, whatever that means. It just logs the intercepted arguments and return value / raised exception.

# Otherwise, search for the beginning of the call expression; once it
# parses, use the expressions in the call to label the debugging
# output.
for i in range(1, len(lines) + 1):
    labels = self.get_call_exprs(''.join(lines[-i:]).replace('\n', ''))
    if labels:
        break
self.show(info.function, args, labels)
return args and args[0]

The last bit pulls out labels from the source code; this is how q knows the name of the variable that you pass in. I'm not going to go line-by-line through get_call_exprs, but it uses the ast module to parse the function call into an Abstract Syntax Tree, and walks through that to find the variable names.


It goes without saying that you should never do any of this. Ever. Nothing is sacred when it comes to debugging, though, and q is incredibly useful when you're having trouble getting your program to print anything out sanely.

Also, if you're ever bored on a nice summer evening, check out the list of modules in the Python standard library. It's got everything:


  1. Check out this page and search for "Callable Types" and/or __call__.

Build and Sign WebExtensions with CircleCI

May 4, 2017 mozilla

Once Planet Mozilla updated with my last post, I got a few bug reports and feature requests for mailman-admin-helper, along with a pull request (Thanks, TheOne!). Clearly I'm not the only person who isn't a fan of our mailing list admin.

Before landing anything, I decided to see if I could get automatic builds running so that I wouldn't have to pull a build pull requests myself when I want to test them. What I ended up with, however, does a bit more than that; it also runs lints, and even signs and uploads new releases when I push a new tag.

We use CircleCI on Normandy, so I defaulted to using them for this as well. I'll walk through the sections, but here's the entire circle.yml file I ended up with:

machine:
  node:
    version: 7.10.0
dependencies:
  override:
    - sudo apt-get update; sudo apt-get install jq
    - go get -u github.com/tcnksm/ghr
    - npm install -g web-ext
compile:
  override:
    - web-ext build
    - mv web-ext-artifacts $CIRCLE_ARTIFACTS
test:
  override:
    - web-ext lint --self-hosted
deployment:
  release:
    tag: /v[0-9]+(\.[0-9]+)*/
    owner: Osmose
    commands:
      - jq --arg tag "${CIRCLE_TAG:1}" '.version = $tag' manifest.json > tmp.json && mv tmp.json manifest.json
      - web-ext sign --api-key $AMO_API_KEY --api-secret $AMO_API_SECRET
      - ghr -u Osmose $CIRCLE_TAG web-ext-artifacts

If you want to adapt this to your own project, you'll want to change the deployment.release.owner field to the Github account hosting your WebExtension, and add the following environment variables to your CircleCI project config (NOT your circle.yml file, which is committed to your repo):

How does it work?

circle.yml files are split into phases. Each phase has a default action that is overridden with the override key.

machine:
  node:
    version: 7.10.0

The machine phase defines the machine used to run your build. Here we're just making sure that we have a recent version of Node.

dependencies:
  override:
    - sudo apt-get update; sudo apt-get install jq
    - go get -u github.com/tcnksm/ghr
    - npm install -g web-ext

The dependencies step is for installing libraries and programs that your build needs. Our build process has three dependencies:

compile:
  override:
    - web-ext build
    - mv web-ext-artifacts $CIRCLE_ARTIFACTS

The compile step is used to build your project before testing. While we aren't running any tests that need a built add-on, this is a good time to build the add-on and upload it to the $CIRCLE_ARTIFACTS directory, which is saved and made available for download once the build is complete. This makes it easy to pull a ready-to-test build of the add-on from open pull requests.

test:
  override:
    - web-ext lint --self-hosted

The test step is for actually running your tests. We don't have automated tests for mailman-admin-helper, but web-ext comes with a handy lint command to help catch common errors.

One thing to note about CircleCI is that any commands that return non-zero return codes will stop the build immediately and mark it as failed, except for commands in the test step. test step commands will mark a build as failed, but will not stop other commands in the test step from running. This is useful for running multiple types of tests or lints because it allows you to see all of your failures instead of exiting early before running all of your tests.

deployment:
  release:
    tag: /v[0-9]+(\.[0-9]+)*/
    owner: Osmose
    commands:
      - jq --arg tag "${CIRCLE_TAG:1}" '.version = $tag' manifest.json > tmp.json && mv tmp.json manifest.json
      - web-ext sign --api-key $AMO_API_KEY --api-secret $AMO_API_SECRET
      - ghr -u Osmose $CIRCLE_TAG web-ext-artifacts

The deployment section only runs on successful builds, and handles deploying your code. It's made up of multiple named sections, and each section must either have a branch or tag field describing the branches or tags that the section will run for.

In our case, we're using a regex that matches tags named like version numbers prefixed with v, e.g. v0.1.2. We also set the owner to my Github account so that forks will not run the deployment process.

The commands do three things:

  1. Use jq to modify the version key in manifest.json to match the version number from the tag. The v prefix is removed before the replacement.

  2. Use web-ext to build and sign the WebExtension, using API keys stored in environment variables. This creates an XPI file in the web-ext-artifacts directory.

  3. Use ghr to upload the contents of web-ext-artifacts (which should just by the signed XPI) to the tag on Github. This uses the GITHUB_TOKEN environment variable for authentication.

The end result is that, whenever a new tag is pushed to the repository, CircleCI adds a signed XPI to the release page on Github automatically, without any human intervention. Convenient!


Feel free to steal this for your own WebExtension, or share any comments or suggestions either in the comments or directly on the mailman-admin-helper repository. Thanks for reading!

mailman-admin-helper: Mildly Easier Mailman Spam Management

April 30, 2017 mozilla

Mozilla hosts a few Mailman instances1, and I run a few mailing lists on them. Our interface for managing incoming spam is... okay.

Mailman default admindb interface

The form inputs are tiny. And it takes, like, 3 clicks to discard and blacklist spam per-sender. And, because I only learned about the options for filtering by spam headers within the past month, I had to use this interface on a daily basis for years.

Finally, about a year or so ago, I got fed up and wrote a bookmarklet that auto-clicked every form element needed to discard and blacklist every email on the page. Since it's rare for the lists I moderate to get legitimate emails that are marked for moderation, I didn't need anything more complex.

However, we recently updated our Mailman pages to use CSP, specifically the script-src none directive. Because the pages no longer accept any URL as valid for script execution, my bookmarklet stopped working. I searched online for workarounds and didn't find anything informative2.

Luckily, I happen to have experience making WebExtensions that inject content scripts into web pages. It's as simple as creating a manifest.json file:

{
  "manifest_version": 2,
  "name": "mailman-admin-helper",
  "version": "0.1.1",
  "applications": {
    "gecko": {
      "id": "mailman-admin-helper@mkelly.me"
    }
  },

  "description": "Adds useful shortcuts to Mozilla Mailman admin.",

  "content_scripts": [
    {
      "matches": [
        "*://mail.mozilla.org/admindb/*",
        "*://lists.mozilla.org/admindb/*"
      ],
      "js": ["index.js"],
      "css": ["index.css"]
    }
  ]
}

The content_scripts key is where the magic happens. List some domains, write some JavaScript and CSS, and you're done! The web-ext tool makes testing, building, and signing the extension pretty painless.

An hour or two later, and I had finished my new WebExtension, mailman-admin-helper. After it is installed, the admin interface is greatly simplified:

Mailman admindb interface as modified by the mailman-admin-helper extension

The block of checkboxes and radio buttons has been replaced by 4 buttons that immediately make their changes and refresh the page when clicked. And if you need to inspect and modify an individual email, you can still click through the email subject to get to the normal moderation page.

Granted, it cuts out a lot of functionality, but this extension is mostly meant for myself to use. Pull requests are welcome, though, in case anyone wants to add functionality that they commonly use.

Big thanks to the Add-ons team and community for making WebExtensions super-easy to use!


  1. I'm not entirely sure why we have two, but it's cool.

  2. I did find bug 866522, which discusses the reason bookmarklets don't work with CSP, as well as some proposed fixes to Firefox and the (in my opinion, correct) wisdom that bookmarklets are a dead-end anyway.

content-UITour.js

April 25, 2017 mozilla

Recently I found myself trying to comprehend an unfamiliar piece of code. In this case, it was content-UITour.js, a file that handles the interaction between unprivileged webpages and UITour.jsm.

UITour allows webpages to highlight buttons in the toolbar, open menu panels, and perform other tasks involved in giving Firefox users a tour of the user interface. The event-based API allows us to iterate quickly on the onboarding experience for Firefox by controlling it via easily-updated webpages. Only a small set of Mozilla-owned domains are allowed access to the UITour API.

Top-level View

My first step when trying to grok unfamiliar JavaScript is to check out everything at the top-level of the file. If we take content-UITour.js and remove some comments, imports, and constants, we get:

var UITourListener = {
  handleEvent(event) {
    /* ... */
  },

  /* ... */
};

addEventListener("mozUITour", UITourListener, false, true);

Webpages that want to use UITour emit synthetic events with the name "mozUITour". In the snippet above, UITourListener is the object that receives these events. Normally, event listeners are functions, but they can also be EventListeners, which are simply objects with a handleEvent function.

According to Mossop's comment, content-UITour.js is loaded in browser.js. A search for firefox loadFrameScript brings up two useful pages:

It looks like content-UITour.js is loaded for each tab with a webpage open, but it can do some more privileged stuff than a normal webpage. Also, the global object seems to be window, referring to the browser window containing the webpage, since events from the webpage are bubbling up to it. Neat!

Events from Webpages

So what about handleEvent?

handleEvent(event) {
  if (!Services.prefs.getBoolPref("browser.uitour.enabled")) {
    return;
  }
  if (!this.ensureTrustedOrigin()) {
    return;
  }
  addMessageListener("UITour:SendPageCallback", this);
  addMessageListener("UITour:SendPageNotification", this);
  sendAsyncMessage("UITour:onPageEvent", {
    detail: event.detail,
    type: event.type,
    pageVisibilityState: content.document.visibilityState,
  });
},

If UITour itself is disabled, or if the origin of the webpage we're registered on isn't trustworthy, events are thrown away. Otherwise, we register UITourListener as a message listener, and send a message of our own.

I remember seeing addMessageListener and sendAsyncMessage on the browser message manager documentation; they look like a fairly standard event system. But where are these events coming from, and where are they going to?

In lieu of any better leads, our best bet is to search DXR for "UITour:onPageEvent", which leads to nsBrowserGlue.js. Luckily for us, I've actually heard of this file before: it's a grab-bag for things that need to happen to set up Firefox that don't fit anywhere else. For our purposes, it's enough to know that stuff in here gets run once when the browser starts.

The lines in question:

// Listen for UITour messages.
// Do it here instead of the UITour module itself so that the UITour module is lazy loaded
// when the first message is received.
var globalMM = Cc["@mozilla.org/globalmessagemanager;1"].getService(Ci.nsIMessageListenerManager);
globalMM.addMessageListener("UITour:onPageEvent", function(aMessage) {
  UITour.onPageEvent(aMessage, aMessage.data);
});

Oh, I remember reading about the global message manager! It covers every frame. This seems to be where all the events coming up from individual frames get gathered and passed to UITour. That UITour variable is coming from a clever lazy-import block at the top:

[
/* ... */
["UITour", "resource:///modules/UITour.jsm"],
/* ... */
].forEach(([name, resource]) => XPCOMUtils.defineLazyModuleGetter(this, name, resource));

In other words, UITour refers to the module in UITour.jsm, but it isn't loaded until we receive our first event, which helps make Firefox startup snappier.

For our purposes, we're not terribly interested in what UITour does with these messages, as long as we know how they're getting there. We are, however, interested in the messages that we're listening for: "UITour:SendPageCallback" and "UITour:SendPageNotification". Another DXR search tells me that those are in UITour.jsm. A skim of the results shows that these messages are used for things like notifying the webpage when an operation is finished, or returning information that was requested by the webpage.


To summarize:

The rest of the content-UITour.js is split between origin verification and sending events back down to the webpage.

Verifying Webpage URLs

Next, let's take a look at ensureTrustedOrigin:

ensureTrustedOrigin() {
  if (content.top != content)
    return false;

  let uri = content.document.documentURIObject;

  if (uri.schemeIs("chrome"))
    return true;

  if (!this.isSafeScheme(uri))
    return false;

  let permission = Services.perms.testPermission(uri, UITOUR_PERMISSION);
  if (permission == Services.perms.ALLOW_ACTION)
    return true;

  return this.isTestingOrigin(uri);
},

MDN tells us that content is the Window object for the primary content window; in other words, the webpage. top, on the other hand, is the topmost window in the window hierarchy (relevant for webpages that get loaded in iframes). Thus, the first check is to make sure we're not in some sort of frame. Without this, a webpage could control when UITour executes things by loading a whitelisted origin in an iframe1.

documentURIObject lets us check the origin of the loaded webpage. chrome:// URIs get passed immediately, since they're already privileged. The next three checks are more interesting:

isSafeScheme

isSafeScheme(aURI) {
  let allowedSchemes = new Set(["https", "about"]);
  if (!Services.prefs.getBoolPref("browser.uitour.requireSecure"))
    allowedSchemes.add("http");

  if (!allowedSchemes.has(aURI.scheme))
    return false;

  return true;
},

This function checks the URI scheme to see if it's considered "safe" enough to use UITour functions. By default, https:// and about: pages are allowed. http:// pages are also allowed if the browser.uitour.requireSecure preference is false (it defaults to true).

Permissions

The next check is against the permissions system. The Services.jsm documentation says that Services.perms refers to an instance of the nsIPermissionManager interface. The check itself is easy to understand, but what's missing is how these permissions get added in the first place. A fresh Firefox profile has some sites already whitelisted for UITour, but where does that whitelist come from?

This is where DXR really shines. If we look at nsIPermissionManager.idl and click the name of the interface, a dropdown appears with several options. The "Find subclasses" option performs a search for "derived:nsIPermissionManager", which leads to the header file for nsPermissionManager.

We're looking for where the default permission values come from, so an in-page search for the word "default" eventually lands on a function named ImportDefaults. Clicking that name and selecting "Jump to definition" lands us inside nsPermissionManager.cpp, and the very first line of the function is:

nsCString defaultsURL = mozilla::Preferences::GetCString(kDefaultsUrlPrefName);

An in-page search for kDefaultsUrlPrefName leads to:

// Default permissions are read from a URL - this is the preference we read
// to find that URL. If not set, don't use any default permissions.
static const char kDefaultsUrlPrefName[] = "permissions.manager.defaultsUrl";

On my Firefox profile, the "permissions.manager.defaultsUrl" preference is set to resource://app/defaults/permissions:

# This file has default permissions for the permission manager.
# The file-format is strict:
# * matchtype \t type \t permission \t host
# * "origin" should be used for matchtype, "host" is supported for legacy reasons
# * type is a string that identifies the type of permission (e.g. "cookie")
# * permission is an integer between 1 and 15
# See nsPermissionManager.cpp for more...

# UITour
origin  uitour  1   https://www.mozilla.org
origin  uitour  1   https://self-repair.mozilla.org
origin  uitour  1   https://support.mozilla.org
origin  uitour  1   https://addons.mozilla.org
origin  uitour  1   https://discovery.addons.mozilla.org
origin  uitour  1   about:home

# ...

Found it! A quick DXR search reveals that this file is in /browser/app/permissions in the tree. I'm not entirely sure where that defaults bit in the URL is coming from, but whatever.

With this, we can confirm that the permissions check is where most valid uses of UITour are passed, and that this permissions file is where the whitelist of allowed domains lives.

isTestingOrigin

The last check in ensureTrustedOrigin falls back to isTestingOrigin:

isTestingOrigin(aURI) {
  if (Services.prefs.getPrefType(PREF_TEST_WHITELIST) != Services.prefs.PREF_STRING) {
    return false;
  }

  // Add any testing origins (comma-seperated) to the whitelist for the session.
  for (let origin of Services.prefs.getCharPref(PREF_TEST_WHITELIST).split(",")) {
    try {
      let testingURI = Services.io.newURI(origin);
      if (aURI.prePath == testingURI.prePath) {
        return true;
      }
    } catch (ex) {
      Cu.reportError(ex);
    }
  }
  return false;
},

Remember those boring constants we ignored earlier? Here's one of them in action! Specifically, it's PREF_TEST_WHITELIST, which is set to "browser.uitour.testingOrigins".

This function appears to parse the preference as a comma-separated list of URIs. It fails early if the preference isn't a string, then splits the string and loops over each entry, converting them to URI objects.

The nsIURI documentation notes that prePath is everything in the URI before the path, including the protocol, hostname, port, etc. Using prePath, the function iterates over each URI in the preference and checks it against the URI of the webpage. If it matches, then the page is considered safe!

(And if anything fails when parsing URIs, errors are reported to the console using reportError and discarded.)

As the preference name implies, this is useful for developers who want to test a webpage that uses UITour without having to set up their local development environment to fake being one of the whitelisted origins.

Sendings Messages Back to the Webpage

The other remaining logic in content-UITour.js handles messages sent back to the content process from UITour.jsm:

receiveMessage(aMessage) {
  switch (aMessage.name) {
    case "UITour:SendPageCallback":
      this.sendPageEvent("Response", aMessage.data);
      break;
    case "UITour:SendPageNotification":
      this.sendPageEvent("Notification", aMessage.data);
      break;
    }
},

You may remember the Message manager overview, which links to documentation for several functions, including addMessageListener. We passed in UITourListener as the listener, which the documentation says should implement the nsIMessageListener interface. Thus, UITourListener.receiveMessage is called whenever messages are received from UITour.jsm.

The function itself is simple; it defers to sendPageEvent with slightly different parameters depending on the incoming message.

sendPageEvent(type, detail) {
  if (!this.ensureTrustedOrigin()) {
    return;
  }

  let doc = content.document;
  let eventName = "mozUITour" + type;
  let event = new doc.defaultView.CustomEvent(eventName, {
    bubbles: true,
    detail: Cu.cloneInto(detail, doc.defaultView)
  });
  doc.dispatchEvent(event);
}

sendPageEvent starts off with another trusted origin check, to avoid sending results from UITour to untrusted webpages. Next, it creates a custom event to dispatch onto the document element of the webpage. Webpages register an event listener on the root document element to receive data returned from UITour.

defaultView returns the window object for the document in question.

Describing cloneInto could take up an entire post on its own. In short, cloneInto is being used here to copy the object from UITour in the chrome process (a privileged context) for use in the webpage (an unprivileged context). Without this, the webpage would not be able to access the detail value at all.

And That's It!

It takes effort, but I've found that deep-dives like this are a great way to both understand a single piece of code, and to learn from the style of the code's author(s). Hopefully ya'll will find this useful as well!


  1. While this isn't a security issue on its own, it gives some level of control to an attacker, which generally should be avoided where possible.

Caching Async Operations via Promises

August 22, 2016 mozilla

I was working on a bug in Normandy the other day and remembered a fun little trick for caching asynchronous operations in JavaScript.

The bug in question involved two asynchronous actions happening within a function. First, we made an AJAX request to the server to get an "Action" object. Next, we took an attribute of the action, the implementation_url, and injected a <script> tag into the page with the src attribute set to the URL. The JavaScript being injected would then call a global function and pass it a class function, which was the value we wanted to return.

The bug was that if we called the function multiple times with the same action, the function would make multiple requests to the same URL, even though we really only needed to download data for each Action once. The solution was to cache the responses, but instead of caching the responses directly, I found it was cleaner to cache the Promise returned when making the request instead:

export function fetchAction(recipe) {
  const cache = fetchAction._cache;

  if (!(recipe.action in cache)) {
    cache[recipe.action] = fetch(`/api/v1/action/${recipe.action}/`)
      .then(response => response.json());
  }

  return cache[recipe.action];
}
fetchAction._cache = {};

Another neat trick in the code above is storing the cache as a property on the function itself; it helps avoid polluting the namespace of the module, and also allows callers to clear the cache if they wish to force a re-fetch (although if you actually needed that, it'd be better to add a parameter to the function instead).

After I got this working, I puzzled for a bit on how to achieve something similar for the <script> tag injection. Unlike an AJAX request, the only thing I had to work with was an onload handler for the tag. Eventually I realized that nothing was stopping me from wrapping the <script> tag injection in a Promise and caching it in exactly the same way:

export function loadActionImplementation(action) {
  const cache = loadActionImplementation._cache;

  if (!(action.name in cache)) {
    cache[action.name] = new Promise((resolve, reject) => {
      const script = document.createElement('script');
      script.src = action.implementation_url;
      script.onload = () => {
        if (!(action.name in registeredActions)) {
          reject(new Error(`Could not find action with name ${action.name}.`));
        } else {
          resolve(registeredActions[action.name]);
        }
      };
      document.head.appendChild(script);
    });
  }

  return cache[action.name];
}
loadActionImplementation._cache = {};

From a nitpicking standpoint, I'm not entirely happy with this function:

But these are minor, and the patch got merged, so I guess it's good enough.