Goals

Testing in production is an important core competency for mitigating risks that it exposes.

The following page will document provides context, use cases and how to get involved.

Use Cases

Safe Paths - Crowd-Testing

Safe Paths - Location-Data

Safe Places - Machine-Data / APM

Epic / User Story

Target release	MVP
Epic / User Story	https://pathcheck.atlassian.net/browse/TEST-7
Document status	IN PROGRESS
Document owner	@Jonathon Wright
Designer	@Todd DeCapua
Tech lead	@Eran Kinsbruner
Technical writers
QA	@Diarmid Mackenzie

Task / Work in Progress (WIP)

Task / Goal / WIP	Status

Task / Goal / WIP	Status
Use Case - Undiagnosed users	https://pathcheck.atlassian.net/browse/TEST-10?atlOrigin=eyJpIjoiMDA2ZGI1NjJiMjhhNGRkNDlmNDJhNjMxZjQxOWQ0NzQiLCJwIjoiaiJ9
Use Case - Diagnosed users	https://pathcheck.atlassian.net/browse/TEST-11?atlOrigin=eyJpIjoiMjExOWVlNWJlNmVlNDk1Zjk1NTE3NDQ4NjViMzMxMDciLCJwIjoiaiJ9
Use Case - Contact tracer	https://pathcheck.atlassian.net/browse/TEST-12?atlOrigin=eyJpIjoiOGNhYTAwNjhiOTRkNDBkZDhkYWE0ZGVhZGJmZjJmMGEiLCJwIjoiaiJ9
Use Case - Health authority (HA)	https://pathcheck.atlassian.net/browse/TEST-13?atlOrigin=eyJpIjoiMTY4MGY4YzcwM2QwNDkzYmIwNjk1YjcxMzkxOWYwNWUiLCJwIjoiaiJ9
Location Data - Diagnostics	https://pathcheck.atlassian.net/browse/TEST-14?atlOrigin=eyJpIjoiMmU4MWZjMThjYTFmNDc1ZGI4NmI2ZjUzOTQwN2QxYWIiLCJwIjoiaiJ9
Location Data - Lifecycle Management	https://pathcheck.atlassian.net/browse/TEST-15?atlOrigin=eyJpIjoiMGU0NDBlYzhlOTc4NDJkODkwNDQ0YjBlOWRmZDIxNGIiLCJwIjoiaiJ9
Location Data - Boston Area (GPX)	https://pathcheck.atlassian.net/browse/TEST-17?atlOrigin=eyJpIjoiYTkzNWJmNjA1ODZlNGQwOWI3YmExMDc5NDAxMDE0NmEiLCJwIjoiaiJ9
Safe Places - Machine Data (APM)	https://pathcheck.atlassian.net/browse/TEST-16?atlOrigin=eyJpIjoiMGEzMzdjNGM5OTQ3NGQ4MGIxYzIxODc5YjdlOWFmMjMiLCJwIjoiaiJ9
Safe Paths - Crowdtesting (Mobile Labs)	https://pathcheck.atlassian.net/browse/TEST-18?atlOrigin=eyJpIjoiMTE0MDJkMmZhNjRmNGM5OGE4MjQzMTJlNGQyNDI2NjMiLCJwIjoiaiJ9
Safe Paths - Crowdtesting (TestFlight)	https://pathcheck.atlassian.net/browse/TEST-19?atlOrigin=eyJpIjoiYzljZDRjYzk5ZjI4NDU0MDhlYWE5MTQzNTU3OTViNDIiLCJwIjoiaiJ9
Safe Paths - Crowdtesting (Google Play)	https://pathcheck.atlassian.net/browse/TEST-20?atlOrigin=eyJpIjoiOWYxY2U4MGE2N2NiNDljOTkzYWJkY2QxYzM1MzRjZjEiLCJwIjoiaiJ9
Provide Access to Local Mobile Devices	https://pathcheck.atlassian.net/browse/TEST-5?atlOrigin=eyJpIjoiMzRmY2MwZjFmZWNkNDRmZjg4NjUwN2U3ODBlYzg1OWUiLCJwIjoiaiJ9
Provide Access to Remote Cloud Devices	https://pathcheck.atlassian.net/browse/TEST-4?atlOrigin=eyJpIjoiZTAwOTFlOTI1NWU2NGRlMmE2ZmU0ODgxOWIzMzQyYmQiLCJwIjoiaiJ9
Provide Access to TestFlight	https://pathcheck.atlassian.net/browse/TEST-3

Measurements / KPIs

How many actually complete set-up?
No Analytics > Conversions Event to track event cadence for the app 'onboarding process' during startup.
How many turn on location data
No Analytics > Unable to capture telemetry / instrumentation meta data.
How many subscribe to an HA?
Splunk for Good > Track event cadence for the ‘subscribe to HA'
How many open the app after they install it?
No Analytics > Retentions to track high level ‘engagement’.
How many get an alert that they may be infected?
No Analytics > Push notifications would need to be implemented.
How many location data points does their app log
Splunk for Good > Depending on the device this calculation can be made during the export function.
How many infection data points do they have from HAs they are subscribed to.
No Analytics > Unable to track this information from device (PII / GDPR)
How is performance is effected by network location (NV)
Splunk for Good : Quality > Performance we can look at both Network Response Latency (NRL) and Device Performance (Duration Traces). Custom Traces (CPU, Memory, Device Attributes) and Data Aggregation (Network / URL).
How can we A/B test / Canary Rollout
Splunk for Good: Testers can be defined with a subset of user behaviors that can be flagged for rollout.

Deliverables

Apple Build (iOS / IPA) - COVID Safe Paths

Check out the pre-requisites section for any dependencies.

Pre-requisites

Clone the GIT repositories

Download the latest IPA / APK files

https://github.com/tripleblindmarket/covid-safe-paths/releases/tag/v0.9.4

Good Practices

1. Make layers – like a stack of pancakes

The idea of “testing in production” can actually mean different things. Are you testing a bunch of test servers from within your production data center? Or are your test applications running separately on top of your production platform? Or are you truly running live tests against 100% production-deployed code? The answer should be all of these. Layer your production testing to give you the ability to test different aspects of the production environment in different ways. Then match up your test cases so as to minimize the impact that your testing – and maintenance of the test environment – has on production users.

2. Time your tests when usage is light

Non-functional testing can have an impact on your entire user base if you let it. It can make a server environment sluggish, and that’s something no one wants. Study your analytics and determine the best time to schedule your tests. For example, look for the lowest levels of:

Number of users on the site
Resource-intensive processes within the environment

3. Collect real traffic data and replay it to your test systems

Make sure to use actual traffic data you have been collecting in production (such as user workflows, user behavior and resources) to drive the generation of load for test cases. That way when you exercise your tests within your production environment, you’ll have confidence that the simulated behavior is realistic.

4. Introduce a chaos monkey

According to Netflix engineers Cory Bennett and Ariel Tseitlin, “The best defense against major unexpected failures is to fail often. By frequently causing failures, we force our services to be built in a way that is more resilient.” Netflix built what’s called a Chaos Monkey into their production environment. This code actually introduces failures into the production environment randomly, forcing engineers to design recovery systems and develop a stronger, more adaptive platform. You can put your own chaos monkey in place because Netflix released their code to GitHub.

5. Monitor like crazy

When you are running a production test, keep your eye on key user performance metrics so that you know if the test is having any kind of unacceptable impact on the user experience. Be prepared to shut the test down if that’s the case.

6. Create an “Opt-in” experience for experimental testing

A great way to test how your application performs with real users is to have some “opt-in” to new feature releases. This will allow you to monitor and collect data from real time users and make adjustments to your testing strategy accordingly, without as much concern about impacting their experience. After all – they’ve already agreed to become test subjects, so a little hiccup here and there won’t come as a surprise.

Open Questions

Question	Answer	Date Answered

Question	Answer	Date Answered

Testing

Testing in Production (TiP)