View Source

Outline plan is to run some sort of Beta Trial in Boston.

What are we trying to achieve?

Learn whether we are on track to deliver our users the experience that we had envisioned.
Insofaras we are not, identify what we need to do to get back on track.

What would success look like & how might we measure it?

Let’s start qualitatively. Drawing on (but condensing) the Quality Map (this is quite old, but I think mostly still relevant):

Undiagnosed users:

Users get notified when they have been in contact with an infected person, with few false positives, and few false negatives.
Notifications are timely, relative to the contact tracing interview that identified the “points of concern”
The app provides clear information to users about what has been detected, and what steps they should take
The app provides a high-quality user experience: slick, attractive, usable.
The app does not cause frustration
The app does not inconvenience (battery usage, data costs, unhelpful notifications, other problems)
The user trusts the app.
The app user is very privacy conscious - match behavior with expectations
Asymptomatic vs Symptomatic users - contact tracing impact

Diagnosed users:

The contact tracing experience is clear, straightforward, and informative.
Contact tracing based on data from the app is superior to contact tracing without the app.
The user continues to have a positive experience after the contact tracing is complete.
Is he able to identify where he got the infection from
Is he able to ask his family/friends before sharing any details with contact tracers

Contact tracer

The contact tracing process is clear and straightforward
It is straightforward to publish points of concern in Safe Places.
Contact tracing based on data from the app is superior to contact tracing without the app.
It is straightforward to redact data to meet a user’s privacy needs.
Does it reduce the load on contact tracer considerably - able to do more no of patients (how many more? in the same time?
Does it accelerate publication of data significantly?

Health authority

The app supports contact tracing efforts
The app helps to reduce the spread of COVID-19
Does it mean health facility requiring less staff?

…and how might we measure it?

Objective	Measurement	Implementation
Users get notified when they have been in contact with an infected person, with few false positives, and few false negatives.	What % of notifications did / did not seem to match an exposure?	Daily report from all experiment participants, reporting screenshots of any notifications, and their own view of when/where they might have been. One day later, we publish comentary describing the “points of concern” locations, published the previous day. Participant then follows up with their classification of events as true positive / false positive / false negative, with explanations.
Notifications are timely, relative to the contact tracing interview that identified the “points of concern”	Time lag between completion of contact tracing interview, and notifications.	For each point of concern published, we track the time of the contact tracing interview (real or imagined) that generated it. Participants sharing screenshots of notifications also indicate the arrival time of the notification.
The app provides clear information to users about what has been detected, and what steps they should take	Qualitative user input	Covered by a standard set of questions that the participant answrs for each notification they receive.
The app provides a high-quality user experience: slick, attractive, usable.	Participant feedback via survey	Participant survey after 2d, 1 week, 2 weeks?
The app does not cause frustration	Participant feedback via survey	Participant survey after 2d, 1 week, 2 weeks?
The app does not inconvenience (battery usage, data costs, unhelpful notifications, other problems)	Participant feedback via survey	Participant survey after 2d, 1 week, 2 weeks?
The user trusts the app.	Participant feedback via survey	Participant survey after 2d, 1 week, 2 weeks?
The app user is very privacy conscious - match behavior with expectations	Participant feedback via survey	Participant survey after 2d, 1 week, 2 weeks?
Asymptomatic vs Symptomatic users - contact tracing impact	Deepti gulati pahwa - I didn’t understnd this one…
DIagnosed users…
The contact tracing experience is clear, straightforward, and informative.	Participant feedback via survey	Survey after contact tracing interview
Contact tracing based on data from the app is superior to contact tracing without the app.	Compare surveys of participants interviewed having used, or not used, the app.	Some participants have the app & are contact traced Some participants are contact traced without having installed the app. Both fill in the same survey questions.
The user continues to have a positive experience after the contact tracing is complete.	Follow-up survey	Everyone asked to install the app after contact tracign interview (if they didn’t have it alreadu). Specific survey 3d after contact tracing interview
Is he able to identify where he got the infection from	Not sure about this as a thing to try to measure… Deepti gulati pahwa	I am not sure that our narative for fictional contact tracing events needs to include a fictional point of origin for the infection - especially in the case where the user does not have the app. Infection will typically be 3-4 days prior to symptoms, test & contact trace.
Is he able to ask his family/friends before sharing any details with contact tracers	I am not sure we have designed for this… Deepti gulati pahwa	May be better addressed to Design team in the first instance, rather than trying to answer this in Beta trial?
Contact tracer
The contact tracing process is clear and straightforward	Survey with contact tracer	Survey after each contact tracing interview & general survey after completing several of them. This should cover the cases with & without the App.
It is straightforward to publish points of concern in Safe Places.	Survey with contact tracer	Include in contact tracer survey
Contact tracing based on data from the app is superior to contact tracing without the app. experience speed completeness of data	Survey with contact tracer.	Ask explicit questions on this. But also compare scores between the two types of contact trace exercise. We could also do some contact trace experiemtns where the particiant has been running the app, but does not make use of it. We can then review the data here afterwards with the contact tracer & participant to determine whether any significant data points were missed.
It is straightforward to redact data to meet a user’s privacy needs.	Survey with contact tracer.	Include in contact tracer survey
Does it reduce the load on contact tracer considerably - able to do more no of patients (how many more? in the same time?	Measure duration of contact trace interview & any follow-up work.	Compare interviews with & without the app.
Does it accelerate publication of data significantly?	Measure time from interview to publication of data with & without the app.	Record time lag from contact trace interview completing to (a) data being published, and (b) participants getting notifications from that data.
Health authority
The app supports contact tracing efforts	Interview with HA administrators.	After some number of contact tracing interviews have been completed, we allow HA administrators to conduct their own interviews of contact tracers and participants, before responding to our survey.
The app helps to reduce the spread of COVID-19	Interview with HA administrators.	This interview should be informed by data & analysis of points above.
Does it mean health facility requiring less staff?	Interview with HA administrators.	This interview should be informed by data & analysis of points above.

Diagnostics

It’s not enough to determine that we are falling short on some goal. We need information that will allow us to understand and rectify the cause of the problem. We expect the following diagnostics will be useful

Daily GPS movement logs from every participants phone. (can we re-enable Share Location Data via email, in the App with a Feature Flag?).
Data published by the HA in unencrypted format. Saved
A daily journal of movement. We only want this from some participants, as we also want some participants to have the experience of contact tracing without the benefit of such a journal.
Participants who complet surveys available for follow-up interviews to clarify any points raised by their survey data.

Data lifecycle

We expect the data collected will be useful to learn from when planning future cycles of Beta testing, therefore we will seek agreement from participants to retain the data for 6 months, at which point it will be destroyed
Resources derived from the data will become part of the project documentatin, and therefore be kept indefinitely. However, these resources will not contain any PII.
Question: Do we need to identify a responsible person at Patch Check for the proper management of this data?

Experiment Design

Installing the App

Participants install the App from Google Beta / Apple TestFlight.

This is a custom build, which allows us to

include a “Boston” Health Authority for the Beta program, which we ask them to register with.
include the ability for them to email their location data to us daily (as per the v1.0 function, which will be replaced by “secure transfer” in MVP1)

Daily Reporting

Each day, we ask participants to:

Send us their location data. We ask them not to analyze it themselves, as doing this might render their experience inauthentic vs. a real user.
Share details of any exposure notifications received
Classify prior exposure notifications as “true positive” / “false positive” / “false negative” based on points of concern “public descriptions” we provide (see below).

We also ask them to complete

A more general survey after 2, 7 and 14 days
specific surveys if they are involved in contact tracing - see below.

Exact mechanisms for these reports TBC. We can start with email, but may need some better tec to scale past ~10 participants. Tools like Survey Monkey or Google Forms might work better. Some design work needed.

Points of Concern

From Day 2, we begin publishing a set of “Points of Concern” in the area.

The protocol is as follows:

We design & document a set of points of concern in a register. This is not published yet.
Using Safe Places, we build a JSON file representing the points of concern.
We publish this, and record the time of publication.
About 24 hours later, we publish a register of the points of concern, which enables our participants to classify exposure notifications as false positive / false negative / true positive.

The set of points of concern may include (but won’t be limited to), the output from contact tracing interviews. It may also be informed by the daily location data we receive from participants, but we need to take care here not to bias the point of concern to match the data that those participants recorded.

When contact tracing interview output feeds into the points of concern, we make sure we follow the same procedures that we’d expect an HA to follow in terms of when the JSON data is published. We also follow the rest of the protocol above: i.e. we also generate and share ~24 hours later, a register of the published points of concern.

An example of a register of the points of concern might be:

12:05 to 12:55 at the Boston Burger Company, Remington Street
14:00 to 1600 at Cafe Pamplona, Bow Street
Etc.

This allows users to make their own assessment of whether they spent time at any points of concern, and therefore determine whether the exposure notifications are false positives, false negatives, or true positives.

We publish this with 24 hours delay to allow participants to have an authentic response to any exposure notifications, not mediated by this information.

Concern: how large is this set of points of concern going to be? We will need to generate many such points in order to get meaningful numbers of exposure notifications - will teh resulting list be consumable by participants, or will it be too long?

Contact Tracing Interviews

We select a small number of participants each day to participate in contact tracing interviews.

Ideally these are conducted with professional contact tracers, so that we can assess both sides of the contact tracing experience. However, if we don’t have enough contact tracers, a Safe Paths volunteer may play the role of the contact tracer.

In all cases, we conduct the interview using Safe Places in the way that a contact tracer would.

By the end of the interview, we aim to have recorded a set of points of concern that should be published, representing the locations where the participant may have exposed others to infection, and that they are willing to share (i.e. not to be redacted).

This is then published, as per “Points of Concern” above.

We also record:

A written register of the points of concern published - to be published to our participants with 24h delay as per above.
The time & duration of the contact tracing interview.

Surveys are then conducted of both the participant and the contact tracer (we’ll do the 2nd survey even if the role of the contact tracer is a stand-in, but we’ll take care to separate the data from real contact tracers vs. stand-ins).

After the contact tracing interview, the participant is asked to install the app, if they hadn't already.

The participant then receives another follow-up survey about 3d later, seeking input on their experience with the app after the contact trace interview.

Who are our Participants? How Many?

Open question - see below.

A concentrated geographic area will be best for trigering exposure notifications - this remains true whether they are generated from contact tracing interviews, or entirely synthetic.

On number of participants, we need ensure that our capability to administer & consume data from participants scales up

Ideally participants would be

Willing to follow a daily process of reporting data & assessing their own experience
Moving around about a wide area (i.e. a square mile or so - we still prefer to be geographically concentrated as per above)

My view is that the reporting obligations & the desire for a concentrated area makes Boston students a better fit than FedEx or Healthcare workers.

A square kilometer is ~1,400 geohash tiles (which we use as a basis for matching), so plenty of space for us to have both false positives & false negatives.

If our participants' roaming space was much below 100 geotiles (7 hectares, or 300 yards x 300 yards), tat might be a problem.

Level of Direction

How much do we want participants to be directed, vs. interacting naturally?

I think we will get the most learning about how this tech fits with the real if movements can be natural and self-directed, rather than directed. However this could become a problem if:

participants don’t move around very much at all (e.g. because of lockdown).
we have a small number of participants across a very large area.

If participants are not crossing paths, we can compensate for this with a set of synthetic points of concern - I prefer this approach over directing movement. However if the participants are spread over a very large area, then the set of “points of concern” required to generate interesting numbers of exposure notifications will become large, and harder for participants to review in terms of spotting false positives & false negatives.

Technology Considerations

Experiment design above assumes some minimal changes to the App technology:

Enable a “Boston” Health Authority
Re-enable sharing of location data by email.

It does not currently envisage much more detailed instrumentation of the app, using e.g. Firebase/Crashlytics etc.

Data from such frameworks could be useful, but the trade-off in terms of Dev cost vs. benefit is not clear, given that these tools can only be used in this specific experiment (we don’t want use them with the general public for privacy reasons).

We should discuss this trade-off with the Mobile App Dev team.

Implementation Punch List

List of items/tasks needed to deliver the above

Review, feedback & sign-off of this plan
Resolve unanswered questions re: target participant group & desired numbers
Recruit particpants
Recruit contact tracers
Get special build of the App from Dev with appropriate feature flags as required
Agree with Dev whether to include Firebase / Crashlytics
Define “daily reporting” ask for participants, and supporting technology
Define who will receive daily reports from participants, and what analysis they will do on them
Define what the daily register of points of concern looks like, and how it is published
Design & test procedures for pushing points of concern that don’t come from contact tracing interviews
Set up supporting systems for contact tracing & publishing to take place (Safe Places instance)
Set schedule for contact tracing interviews,
Define participant surveys post-contact tracing (immediate & 3 days later)
Define contact tracer survey post-contact tracing (immediate & 3 days later)
Processes & tech to administer post-contact tracing surveys
Define who will collate & analyze info from contact tracing surveys
Overall onboarding brief for participants
Overall onboarding brief for contact tracers
Identify Safe Paths volunteers to stand in for contact tracers if we don’t have enough.
Plan for follow-up with Health Authority admins (giving them access to participating contact tracers & participants).
Participation agreement for participants? (probably needed since PII shared)
Participation agreement for contact tracers? (maybe not needed since no PII shared)
Create overall project plan (covering all items above + whatever else) and identify a PM to run this.
Establish target dates for program, up to a first report with real data from the Beta trial.
Define governance plan for this program: regular reviews of whether we are achieving the goals we set out to achieve, any other issues.
Define & implement data retention policies.
Define person at Path Check responsible for us acting responsibly with participants data.

Previous Design Notes

These notes were recorded in an early draft of this article, and may still contain useful ideas and insights, so I am preserving them here…

What can we do with this group that we can’t do with regular people?

Social needs / Objectives of the Simulation project :

Drive predictable volumes of “contact trace” interviews
Tech validation
1. Collect complete location data from non-infected patients to assess what matched & what didn’t
2. test location contexts form wide area
3. test moving objects with moving paths?
4. test location related interactive behaviors between multiple mobile users and objects
5. Location services on and off between different test users - impact
Social system validation
1. Get feedback from the person who participated in the interview
2. Get feedback from non-infected patients as to who matched a given location.

Volunteer Base to be used (question):

Harvard students - putting them at risk?
Health officials working daily - also with actual COVID patients ( Mayo Clinic staff, or something similar)
Fedex/ logistic company delivery people - as they move around city.
Small geographical area to consider - (Boston? Or smaller - to ensure paths are crossed often.

Tech Aspects to consider

Run detailed analytics / diagnostics on individual phones: firebase, crashlytics etc.
Collect daily? location logs from every phone & check for reliability of logging.
Get qualitative feedback from individuals
Set up lots of infections, and get a much higher rate of notifications than we could do otherwise.
Push up scale / number of infections/ data points to download.
What features we want to test - only crossed Paths?
Check the difference in mapping when we use bluetooth only vs / bluetooth and GPS? - possibility MVP2 - is ready to test?
Contact tracers perspective - safe places - new

Risks related to experimentation: Product dependencies - must be in place before we start

GPS logging reliability - else all we will learn is that GPS reliability is not good enough!!
(not sure there is much else that is really essential…
… secure transport of location data is nice but not essential
….dittto hashing of location data on HA JSON server…
… Diarmid to read through full MVP1 spec & decide what else is a “must have” here - suspect not much…
(maybe some of the consent stuff; chance to review the redacted trail before it is published…?)

Key tech enablers (non-product)

Firebase / crashlytics - how easy to set up?
Rig to consume daily GPS data for analysis
Analysis engine to determine what should have matched vs. what did.
Pre-production Path Check environment to direct to Mock HA Server.
Mock HA server to receive encrypted data transmissions
Mock HA server into which we can feed data
Safe Places server for contact tracers
Synthetic data generator to generate large data sets

People enablers

Volunteer MoPs, willing to share their location data & a daily report.
Volunteer MoPs to participate in contact trace interviews
Volunteer contact tracers
Data controller who can monitor how we use PII
Overall people to run the analysis daily
Someone to direct the experiments
PM/Dev engagement to learn from this & fix problems.

What to measure?

How do we measure “effectiveness” ? Define KPIs to make this prototype/ simulation successful ( Tech as well as non-tech)
User accounts of their movements vs. what the location trails tell us vs. what notifications triggered? Use this info to try to account for numbers of false negatives, true positives & false positives…
Epidemiological view of effectiveness - independent review of stuff in previous bullet?
User feedback on messaging - how seriously do they take it? Do they know what to do? Other feedback?
Contact tracer feedback.
MoP feedback on contact tracing experience
60% penetration vs 20% penetration of app for contact tracing to work. (is there a way to test that through simulation.

How many people?

Depends how active / mobile / engaged people are going to be.
Directed interactions only need a small number of people (< 10) to be able to do some effective stuff, (controlled experiment)
Larger groups of people enable more non-directed learnings, much more unexpected stuff will start to happen as we get to 50+ people.
Much more tech/automation needed to process from 50+ people than from 5-10 people - with 5-10 lots could be manual.
Suspect we should aim for ~10 people for a week, then grow by ~20 people/week so we are at 50 people after 3 weeks. Not obviously going to get lots more benefit from scaling above 50 people, and will become increasingly challenging to organize….
2 groups -
- 10 people - directed learnings with pre-determined KPIs
  - They should follow plans, and on purpose cross paths to validate our idea of false positives and false negatives. (public transportation, cafe, office building)
- 50 people - 2nd group - non- directed - get unexpected feedback. (extra feedback on what info they trust - and how important is privacy to them at all levels. Develop a questionnaire) non-technological part.

Use cases for directed learnings -

Cafes
Grocery stores
Public transportation - false positives/ negatives possibility
Office buildings - floors
Diffrence in logging - wifi vs 3g vs 4g

User journey aspects to consideration:

App perspective - user experience
IOS versions (different) vs android versions - range of phone devices to be checked. Select and test a set of popular mobile devices
User behavior in surroundings
Demographics behaviour - old age 60+, parents and kids, essential workers
If social distancing is maintained - people practicing, people not practicing.
Mobility impacts - on feet, bicycle, by car, by metro/train, bus/tram

Contact tracer/interview:

Dashboard - what helps (through interviews generate features)