Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Outline plan is to run some sort of Beta Trial in Boston.

What are we trying to achieve?

  • Learn whether we are on track to deliver our users the experience that we had envisioned.

  • Insofaras we are not, identify what we need to do to get back on track.

What would success look like & how might we measure it?

Let’s start qualitatively. Drawing on (but condensing) the Quality Map (this is quite old, but I think mostly still relevant):

Undiagnosed users:

  • Users get notified when they have been in contact with an infected person, with few false positives, and few false negatives.

  • Notifications are timely, relative to the contact tracing interview that identified the “points of concern”

  • The app provides clear information to users about what has been detected, and what steps they should take

  • The app provides a high-quality user experience: slick, attractive, usable.

  • The app does not cause frustration

  • The app does not inconvenience (battery usage, data costs, unhelpful notifications, other problems)

  • The user trusts the app.

  • The app user is very privacy conscious - match behavior with expectations

  • Asymptomatic vs Symptomatic users - contact tracing impact

Diagnosed users:

  • The contact tracing experience is clear, straightforward, and informative.

  • Contact tracing based on data from the app is superior to contact tracing without the app.

  • The user continues to have a positive experience after the contact tracing is complete.

  • Is he able to identify where he got the infection from

  • Is he able to ask his family/friends before sharing any details with contact tracers

Contact tracer

  • The contact tracing process is clear and straightforward

  • It is straightforward to publish points of concern in Safe Places.

  • Contact tracing based on data from the app is superior to contact tracing without the app.

  • It is straightforward to redact data to meet a user’s privacy needs.

  • Does it reduce the load on contact tracer considerably - able to do more no of patients (how many more? in the same time?

  • Does it accelerate publication of data significantly?

Health authority

  • The app supports contact tracing efforts

  • The app helps to reduce the spread of COVID-19

  • Does it mean health facility requiring less staff?

…and how might we measure it?

Objective

Measurement

Implementation

Users get notified when they have been in contact with an infected person, with few false positives, and few false negatives.

What % of notifications did / did not seem to match an exposure?

Daily report from all experiment participants, reporting screenshots of any notifications, and their own view of when/where they might have been.

One day later, we publish comentary describing the “points of concern” locations, published the previous day.

Participant then follows up with their classification of events as true positive / false positive / false negative, with explanations.

Notifications are timely, relative to the contact tracing interview that identified the “points of concern”

Time lag between completion of contact tracing interview, and notifications.

For each point of concern published, we track the time of the contact tracing interview (real or imagined) that generated it.

Participants sharing screenshots of notifications also indicate the arrival time of the notification.

The app provides clear information to users about what has been detected, and what steps they should take

Qualitative user input

Covered by a standard set of questions that the participant answrs for each notification they receive.

The app provides a high-quality user experience: slick, attractive, usable.

Participant feedback via survey

Participant survey after 2d, 1 week, 2 weeks?

The app does not cause frustration

Participant feedback via survey

Participant survey after 2d, 1 week, 2 weeks?

The app does not inconvenience (battery usage, data costs, unhelpful notifications, other problems)

Participant feedback via survey

Participant survey after 2d, 1 week, 2 weeks?

The user trusts the app.

Participant feedback via survey

Participant survey after 2d, 1 week, 2 weeks?

The app user is very privacy conscious - match behavior with expectations

Participant feedback via survey

Participant survey after 2d, 1 week, 2 weeks?

Asymptomatic vs Symptomatic users - contact tracing impact

Deepti gulati pahwa - I didn’t understnd this one…

DIagnosed users…

The contact tracing experience is clear, straightforward, and informative.

Participant feedback via survey

Survey after contact tracing interview

Contact tracing based on data from the app is superior to contact tracing without the app.

Compare surveys of participants interviewed having used, or not used, the app.

Some participants have the app & are contact traced

Some participants are contact traced without having installed the app.

Both fill in the same survey questions.

The user continues to have a positive experience after the contact tracing is complete.

Follow-up survey

Everyone asked to install the app after contact tracign interview (if they didn’t have it alreadu). Specific survey 3d after contact tracing interview

Is he able to identify where he got the infection from

Not sure about this as a thing to try to measure…

Deepti gulati pahwa

I am not sure that our narative for fictional contact tracing events needs to include a fictional point of origin for the infection - especially in the case where the user does not have the app.
Infection will typically be 3-4 days prior to symptoms, test & contact trace.

Is he able to ask his family/friends before sharing any details with contact tracers

I am not sure we have designed for this…

Deepti gulati pahwa

May be better addressed to Design team in the first instance, rather than trying to answer this in Beta trial?

Contact tracer

The contact tracing process is clear and straightforward

Survey with contact tracer

Survey after each contact tracing interview & general survey after completing several of them.

This should cover the cases with & without the App.

It is straightforward to publish points of concern in Safe Places.

Survey with contact tracer

Include in contact tracer survey

Contact tracing based on data from the app is superior to contact tracing without the app.

  • experience

  • speed

  • completeness of data

Survey with contact tracer.

Ask explicit questions on this. But also compare scores between the two types of contact trace exercise.

We could also do some contact trace experiemtns where the particiant has been running the app, but does not make use of it. We can then review the data here afterwards with the contact tracer & participant to determine whether any significant data points were missed.

It is straightforward to redact data to meet a user’s privacy needs.

Survey with contact tracer.

Include in contact tracer survey

Does it reduce the load on contact tracer considerably - able to do more no of patients (how many more? in the same time?

Measure duration of contact trace interview & any follow-up work.

Compare interviews with & without the app.

Does it accelerate publication of data significantly?

Measure time from interview to publication of data with & without the app.

Record time lag from contact trace interview completing to (a) data being published, and (b) participants getting notifications from that data.

Health authority

The app supports contact tracing efforts

Interview with HA administrators.

After some number of contact tracing interviews have been completed, we allow HA administrators to conduct their own interviews of contact tracers and participants, before responding to our survey.

The app helps to reduce the spread of COVID-19

Interview with HA administrators.

This interview should be informed by data & analysis of points above.

Does it mean health facility requiring less staff?

Interview with HA administrators.

This interview should be informed by data & analysis of points above.

Diagnostics

It’s not enough to determine that we are falling short on some goal. We need information that will allow us to understand and rectify the cause of the problem. We expect the following diagnostics will be useful

  • Daily GPS movement logs from every participants phone. (can we re-enable Share Location Data via email, in the App with a Feature Flag?).

  • Data published by the HA in unencrypted format. Saved

  • A daily journal of movement. We only want this from some participants, as we also want some participants to have the experience of contact tracing without the benefit of such a journal.

  • Participants who complet surveys available for follow-up interviews to clarify any points raised by their survey data.

Data lifecycle

  • We expect the data collected will be useful to learn from when planning future cycles of Beta testing, therefore we will seek agreement from participants to retain the data for 6 months, at which point it will be destroyed

  • Resources derived from the data will become part of the project documentatin, and therefore be kept indefinitely. However, these resources will not contain any PII.

  • Question: Do we need to identify a responsible person at Patch Check for the proper management of this data?

Experiment Design

Installing the App

Participants install the App from Google Beta / Apple TestFlight.

This is a custom build, which allows us to

  • include a “Boston” Health Authority for the Beta program, which we ask them to register with.

  • include the ability for them to email their location data to us daily (as per the v1.0 function, which will be replaced by “secure transfer” in MVP1)

Daily Reporting

Each day, we ask participants to:

  • Send us their location data. We ask them not to analyze it themselves, as doing this might render their experience inauthentic vs. a real user.

  • Share details of any exposure notifications received

  • Classify prior exposure notifications as “true positive” / “false positive” / “false negative” based on points of concern “public descriptions” we provide (see below).

We also ask them to complete

  • A more general survey after 2, 7 and 14 days

  • specific surveys if they are involved in contact tracing - see below.

Exact mechanisms for these reports TBC. We can start with email, but may need some better tec to scale past ~10 participants. Tools like Survey Monkey or Google Forms might work better. Some design work needed.

Points of Concern

From Day 2, we begin publishing a set of “Points of Concern” in the area.

The protocol is as follows:

  • We design & document a set of points of concern in a register. This is not published yet.

  • Using Safe Places, we build a JSON file representing the points of concern.

  • We publish this, and record the time of publication.

  • About 24 hours later, we publish a register of the points of concern, which enables our participants to classify exposure notifications as false positive / false negative / true positive.

The set of points of concern may include (but won’t be limited to), the output from contact tracing interviews. It may also be informed by the daily location data we receive from participants, but we need to take care here not to bias the point of concern to match the data that those participants recorded.

When contact tracing interview output feeds into the points of concern, we make sure we follow the same procedures that we’d expect an HA to follow in terms of when the JSON data is published. We also follow the rest of the protocol above: i.e. we also generate and share ~24 hours later, a register of the published points of concern.

An example of a register of the points of concern might be:

  • 12:05 to 12:55 at the Boston Burger Company, Remington Street

  • 14:00 to 1600 at Cafe Pamplona, Bow Street

  • Etc.

This allows users to make their own assessment of whether they spent time at any points of concern, and therefore determine whether the exposure notifications are false positives, false negatives, or true positives.

We publish this with 24 hours delay to allow participants to have an authentic response to any exposure notifications, not mediated by this information.

Concern: how large is this set of points of concern going to be? We will need to generate many such points in order to get meaningful numbers of exposure notifications - will teh resulting list be consumable by participants, or will it be too long?

Contact Tracing Interviews

We select a small number of participants each day to participate in contact tracing interviews.

Ideally these are conducted with professional contact tracers, so that we can assess both sides of the contact tracing experience. However, if we don’t have enough contact tracers, a Safe Paths volunteer may play the role of the contact tracer.

In all cases, we conduct the interview using Safe Places in the way that a contact tracer would.

By the end of the interview, we aim to have recorded a set of points of concern that should be published, representing the locations where the participant may have exposed others to infection, and that they are willing to share (i.e. not to be redacted).

This is then published, as per “Points of Concern” above.

We also record:

  • A written register of the points of concern published - to be published to our participants with 24h delay as per above.

  • The time & duration of the contact tracing interview.

Surveys are then conducted of both the participant and the contact tracer (we’ll do the 2nd survey even if the role of the contact tracer is a stand-in, but we’ll take care to separate the data from real contact tracers vs. stand-ins).

After the contact tracing interview, the participant is asked to install the app, if they hadn't already.

The participant then receives another follow-up survey about 3d later, seeking input on their experience with the app after the contact trace interview.

Who are our Participants? How Many?

Open question - see below.

A concentrated geographic area will be best for trigering exposure notifications - this remains true whether they are generated from contact tracing interviews, or entirely synthetic.

On number of participants, we need ensure that our capability to administer & consume data from participants scales up

Ideally participants would be

  • Willing to follow a daily process of reporting data & assessing their own experience

  • Moving around about a wide area (i.e. a square mile or so - we still prefer to be geographically concentrated as per above)

My view is that the reporting obligations & the desire for a concentrated area makes Boston students a better fit than FedEx or Healthcare workers.

A square kilometer is ~1,400 geohash tiles (which we use as a basis for matching), so plenty of space for us to have both false positives & false negatives.

If our participants' roaming space was much below 100 geotiles (7 hectares, or 300 yards x 300 yards), tat might be a problem.

Level of Direction

How much do we want participants to be directed, vs. interacting naturally?

I think we will get the most learning about how this tech fits with the real if movements can be natural and self-directed, rather than directed. However this could become a problem if:

  • participants don’t move around very much at all (e.g. because of lockdown).

  • we have a small number of participants across a very large area.

If participants are not crossing paths, we can compensate for this with a set of synthetic points of concern - I prefer this approach over directing movement. However if the participants are spread over a very large area, then the set of “points of concern” required to generate interesting numbers of exposure notifications will become large, and harder for participants to review in terms of spotting false positives & false negatives.

Technology Considerations

Experiment design above assumes some minimal changes to the App technology:

  • Enable a “Boston” Health Authority

  • Re-enable sharing of location data by email.

It does not currently envisage much more detailed instrumentation of the app, using e.g. Firebase/Crashlytics etc.

Data from such frameworks could be useful, but the trade-off in terms of Dev cost vs. benefit is not clear, given that these tools can only be used in this specific experiment (we don’t want use them with the general public for privacy reasons).

We should discuss this trade-off with the Mobile App Dev team.

Implementation Punch List

List of items/tasks needed to deliver the above

  1. Review, feedback & sign-off of this plan

  2. Resolve unanswered questions re: target participant group & desired numbers

  3. Recruit particpants

  4. Recruit contact tracers

  5. Get special build of the App from Dev with appropriate feature flags as required

  6. Agree with Dev whether to include Firebase / Crashlytics

  7. Define “daily reporting” ask for participants, and supporting technology

  8. Define who will receive daily reports from participants, and what analysis they will do on them

  9. Define what the daily register of points of concern looks like, and how it is published

  10. Design & test procedures for pushing points of concern that don’t come from contact tracing interviews

  11. Set up supporting systems for contact tracing & publishing to take place (Safe Places instance)

  12. Set schedule for contact tracing interviews,

  13. Define participant surveys post-contact tracing (immediate & 3 days later)

  14. Define contact tracer survey post-contact tracing (immediate & 3 days later)

  15. Processes & tech to administer post-contact tracing surveys

  16. Define who will collate & analyze info from contact tracing surveys

  17. Overall onboarding brief for participants

  18. Overall onboarding brief for contact tracers

  19. Identify Safe Paths volunteers to stand in for contact tracers if we don’t have enough.

  20. Plan for follow-up with Health Authority admins (giving them access to participating contact tracers & participants).

  21. Participation agreement for participants? (probably needed since PII shared)

  22. Participation agreement for contact tracers? (maybe not needed since no PII shared)

  23. Create overall project plan (covering all items above + whatever else) and identify a PM to run this.

  24. Establish target dates for program, up to a first report with real data from the Beta trial.

  25. Define governance plan for this program: regular reviews of whether we are achieving the goals we set out to achieve, any other issues.

  26. Define & implement data retention policies.

  27. Define person at Path Check responsible for us acting responsibly with participants data.

Previous Design Notes

These notes were recorded in an early draft of this article, and may still contain useful ideas and insights, so I am preserving them here…


What can we do with this group that we can’t do with regular people?

Social needs / Objectives of the Simulation project :

  1. Drive predictable volumes of “contact trace” interviews

...

  1. Tech validation 

    1. Collect complete location data from non-infected patients to assess what matched & what didn’t

    2. test location contexts form wide area

    3. test moving objects with moving paths? 

    4. test location related interactive behaviors between multiple mobile users and objects 

    5. Location services on and off between different test users - impact

  2. Social system validation 

    1. Get feedback from the person who participated in the interview

    2. Get feedback from non-infected

...

    1. patients as to who matched a given location.

Volunteer Base to be used (question): 

  1. Harvard students - putting them at risk?

  2. Health officials working daily - also with actual COVID patients ( Mayo Clinic staff, or something similar)

  3. Fedex/ logistic company delivery people - as they move around city.

  4. Small geographical area to consider - (Boston? Or smaller - to ensure paths are crossed often.

Tech Aspects to consider 

  • Run detailed analytics / diagnostocs diagnostics on individual phones: firebase, crashlytics etc.

  • Collect daily?  location logs from every phone & check for reliability of logging.

  • Get qualitative feedback from individuals

  • Set up lots of infections, and get a much higher rate of notiifcations notifications than we could do otherwise.

  • Push up scale / number of infections/ data points to download.

  • What features we want to test - only crossed Paths? 

  • Check the difference in mapping when we use bluetooth only vs / bluetooth and GPS?  - possibility MVP2  - is ready to test?

  • Contact tracers perspective - safe places - new 

Risks related to experimentation: Product dependencies - must be in place before we start

  • GPS logging reliability - else all we will learn is that GPS reliability is not good enough!!

  • (not sure there is much else that is really essential… essential… 

  • … secure transport of loction location data is nice but not essential

  • ….dittto hashing of location data on HA JSON server…

  • … Diarmid to read through full MNVP1 MVP1 spec & decide what else is a “must have” here - suspect not much…

  • (maybe some of the consent stuff; chance to review the redacted trail before it is published…?)

...

  • Firebase / crashlytics - how easy to set up?

  • Rig to consume daily GPS data for analysis

  • Analysis engine to determine what should have matched vs. what did.

  • Pre-production Path Check environemnt environment to direct to Mock HA Server.

  • Mock HA server to receive encrypted data transmissions

  • Mock HA server into which we can feed data

  • Safe Places server for contact tracers

  • Synthetic data generator to generate large data sets

...

  • Volunteer MoPs, willing to share their location data & a daily report.

  • Volunteer MoPs to participate in contact trace interviews

  • Volunteer contact tracers

  • Data controller who can monitor how we use PII

  • Overall people to run the analysis daily

  • Someone to direct the experiments

  • PM/Dev engagement to learn from this & fix problems.

Other considerations What to measure?

  • How do we measure “effectiveness” ?  Define KPIs to make this prototype/ simulation successful ( Tech as well as non-tech)

  • User accounts of their movements vs. what the location trails tell us vs. what notiifcations notifications triggered?  Use this info to try to account for numbers of false negatives, true positives & false positives…

  • Epidemiological view of effectiveness - independent review of stuff in previous bullet?

  • User feedback on messaging - how seriously do they take it?  Do they know what to do?  Other feedback?

  • Contact tracer feedback.

  • MoP feedback on contact tracing experience

  • 60% penetration vs 20% penetration of app for contact tracing to work. (is there a way to test that through simulation.

How many people?

  • Depends how active / mobile / engaged people are going to be.

  • Directed interactions only need a small number of people (< 10) to be able to do some effective stuff, (controlled experiment)

  • Larger groups of people enable more non-directed learnings, much more unexpected stuff will start to happen as we get to 50+ people. 

  • Much more tech/automation needed to process from 50+ people than from 5-10 people - with 5-10 lots could be manual.

  • Suspect we should aim for ~10 people for a week, then grow by ~20 people/week so we are at 50 people after 3 weeks.  Not obviously going to get lots more benefit from scaling above 50 people, and will become increasingly challenging to organize….

  • 2 groups - 

    • 10 people - directed learnings with pre-determined KPIs

      • They should follow plans, and on purpose cross paths to validate our idea of false positives and false negatives. (public transportation, cafe, office building)

    • 50 people - 2nd group - non- directed - get unexpected feedback. (extra feedback on what info they trust - and how important is privacy to them at all levels. Develop a questionnaire)  non-technological part.

Use cases for directed learnings -

Cafes
Grocery stores
Public transportation - false positives/ negatives possibility
Office buildings - floors
Diffrence in logging - wifi vs 3g vs 4g 


User journey aspects to consideration: 

  • App perspective - user experience

  • IOS versions (different) vs android versions - range of phone devices to be checked. Select and test a set of popular mobile devices

  • User behavior in surroundings 

  • Demographics behaviour - old age 60+, parents and kids, essential workers

  • If social distancing is maintained - people practicing, people not practicing.

  • Mobility impacts -  on feet, bicycle, by car, by metro/train, bus/tram

Contact tracer/interview:

  • Dashboard - what helps (through interviews generate features)