Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Goals

Testing in production is an important core competency for mitigating risks that it exposes.

The following page will document provides context, use cases and how to get involved.

Table of Contents

Use Cases / KPIs

  1. How many actually complete set-up?

  1. How many turn on location data

  2. How many subscribe to an HA?

  3. How many open the app after they install it?  (AIUI the number who open it on day 28 is a standard KPI?)

  4. How many get an alert that they may be infected?

  5. How many location data points does their app log (worry that there may be bugs where we fail to log)

  6. How many infection data points do they have from HAs they are subscribed to.

Epic / User Story

Page Properties

Target release

MVP

Epic / User Story

https://pathcheck.atlassian.net/browse/TEST-7

Document status

Status
colourGreen
titleIN PROGRESS

Document owner

Jonathon Wright

Designer

Tech lead

Technical writers

QA

...

https://github.com/tripleblindmarket/covid-safe-paths/releases/tag/v0.9.4

Background

1. Make layers – like a stack of pancakes

The idea of “testing in production” can actually mean different things. Are you testing a bunch of test servers from within your production data center? Or are your test applications running separately on top of your production platform? Or are you truly running live tests against 100% production-deployed code? The answer should be all of these. Layer your production testing to give you the ability to test different aspects of the production environment in different ways. Then match up your test cases so as to minimize the impact that your testing – and maintenance of the test environment – has on production users.

2. Time your tests when usage is light

Non-functional testing can have an impact on your entire user base if you let it. It can make a server environment sluggish, and that’s something no one wants. Study your analytics and determine the best time to schedule your tests. For example, look for the lowest levels of:

  • Number of users on the site

  • Resource-intensive processes within the environment

3. Collect real traffic data and replay it to your test systems

Make sure to use actual traffic data you have been collecting in production (such as user workflows, user behavior and resources) to drive the generation of load for test cases. That way when you exercise your tests within your production environment, you’ll have confidence that the simulated behavior is realistic.

4. Introduce a chaos monkey

According to Netflix engineers Cory Bennett and Ariel Tseitlin, “The best defense against major unexpected failures is to fail often. By frequently causing failures, we force our services to be built in a way that is more resilient.” Netflix built what’s called a Chaos Monkey into their production environment. This code actually introduces failures into the production environment randomly, forcing engineers to design recovery systems and develop a stronger, more adaptive platform. You can put your own chaos monkey in place because Netflix released their code to GitHub.

5. Monitor like crazy

When you are running a production test, keep your eye on key user performance metrics so that you know if the test is having any kind of unacceptable impact on the user experience. Be prepared to shut the test down if that’s the case.

6. Create an “Opt-in” experience for experimental testing

A great way to test how your application performs with real users is to have some “opt-in” to new feature releases. This will allow you to monitor and collect data from real time users and make adjustments to your testing strategy accordingly, without as much concern about impacting their experience. After all – they’ve already agreed to become test subjects, so a little hiccup here and there won’t come as a surprise.

Open Questions

Question

Answer

Date Answered

...