Safe Paths Privacy & Ethics concerns from Twitter discussion with Software Testers - April 17 2020

Diarmid Mackenzie - 18 April 2020

This document summarizes a significant number of concerns related to the privacy & ethics of the Safe Paths app.

The context for this was a Tweet I put out to several prominent people in the Software Testing community seeking volunteers on the project. Many of these people have a strong interest in privacy and ethical concerns, and a wide range of concerns were voiced about the project.

I answered some points inline in Twitter (you can see these by following the links), but I’ve not attempted to replicate any of that here. This document simply states the concerns as raised, without trying to give a definitive position on them.

The purpose of this document is primarily for sharing within the Safe Paths project, to make sure that we fully understand and appreciate these concerns, and can take the right actions in response to them.

I will also share this document with those who contributed those concerns, so they can point out if I have overlooked or misrepresented anything, or so that they can add further points of concern if they have them.

In total I have identified 30 broad points of concern (some emphasized multiple times, in multiple ways). I have grouped them into broad categories in an attempt to help make this digestible.

I’m hugely grateful to all those who contributed their thoughts on this:

DanAshby04
keithklain
FionaCCharles
TheTestDoctor
michaelbolton
jamesmarcusbach
Manish_Awasthi
QualityFrog

Ethics

As well as Privacy concerns, there are a wide range of other Ethical concerns. If the project is Privacy-first, where do Ethics come?

https://twitter.com/TheTestDoctor/status/1251148153488719872?s=20
https://twitter.com/keithklain/status/1251146635863711747?s=20
https://twitter.com/DanAshby04/status/1251144715161882627?s=20

Concern that key ethical considerations were not adequately engaged with prior to project inception (at least there is no clear public evidence of that having happened).

https://twitter.com/keithklain/status/1251184891200442368?s=20
https://twitter.com/keithklain/status/1251166689577795585?s=20
https://twitter.com/keithklain/status/1251165664926224384?s=20
https://twitter.com/FionaCCharles/status/1251164377488728064?s=20
https://twitter.com/keithklain/status/1251146943511724032?s=20
https://twitter.com/FionaCCharles/status/1251143902939295744?s=20
https://twitter.com/keithklain/status/1251135917810683906?s=20

By focussing on users with access to smartphones, doesn’t this project simply augment existing inequality, and cause other social issues
https://twitter.com/keithklain/status/1251240509751930880?s=20
https://twitter.com/keithklain/status/1251171202686746624?s=20
https://twitter.com/keithklain/status/1251170119742611456?s=20

There are specific issues in the US with the cost to an individual of testing (even if the government says tests are paid for, the reality is not so. This potentially impacts the efficacy of the App in the US.
https://twitter.com/michaelbolton/status/1251243298972413952?s=20
https://twitter.com/michaelbolton/status/1251214999693021185?s=20

Concerns about the organizations involved in the project, and their intentions.
https://twitter.com/keithklain/status/1251209163360940033?s=20
https://twitter.com/keithklain/status/1251207436268167169?s=20
https://twitter.com/keithklain/status/1251206598871130113?s=20
https://twitter.com/michaelbolton/status/1251205090691727361?s=20
https://twitter.com/michaelbolton/status/1251203820098596871?s=20
https://twitter.com/keithklain/status/1251201771688640514?s=20
https://twitter.com/keithklain/status/1251191735184756740?s=20

https://twitter.com/keithklain/status/1251185377878081539?s=20
https://twitter.com/keithklain/status/1251167094869233664?s=20
https://twitter.com/keithklain/status/1251166322748264460?s=20
https://twitter.com/keithklain/status/1251147292461027330?s=20

We need clarity over exactly who is responsible for the systems that contain the data.
https://twitter.com/michaelbolton/status/1251240225558446081?s=20

Impossibility of constraining future behaviour of involved parties

Whatever an organization’s intentions now, that can always change in future. How do we protect against that?
https://twitter.com/michaelbolton/status/1251243565809913865?s=20
https://twitter.com/TheTestDoctor/status/1251244983119753218?s=20
https://twitter.com/TheTestDoctor/status/1251241129313533959?s=20

Concern that eventually the technology will end up used by law enforcement / security services, or for some other currently unintended purpose, which we cannot predict or control.
https://twitter.com/jamesmarcusbach/status/1251214986547875840?s=20
https://twitter.com/jamesmarcusbach/status/1251212587133329408?s=20
https://twitter.com/jamesmarcusbach/status/1251210026015838208?s=20
https://twitter.com/keithklain/status/1251202636117938178?s=20
https://twitter.com/michaelbolton/status/1251199785790517248?s=20
https://twitter.com/keithklain/status/1251232716961579008?s=20

People cannot give informed consent, because there can be no guarantees about what will happen to their data once they have handed it over.
https://twitter.com/michaelbolton/status/1251269599301173256?s=20

Concerns about how long we continue to collect data for. When does this end? (the alternative to it ending is that it becomes the new normal)
https://twitter.com/michaelbolton/status/1251198113559314434?s=20

Transparency & Communications

Transparency: More project information should be in the public domain so that it is open to scrutiny by anyone with an interest, not just people who have signed up to support the project.
https://twitter.com/jamesmarcusbach/status/1251209159694905344?s=20
https://twitter.com/keithklain/status/1251167094869233664?s=20
https://twitter.com/keithklain/status/1251144461830074370?s=20

If we are already following particular guidelines, e.g. Privacy By Design guidelines, this should be clearly stated & evidenced in public.

https://twitter.com/FionaCCharles/status/1251171328939393024?s=20

White paper is vague & high level - not enough technical details to review properly.
https://twitter.com/jamesmarcusbach/status/1251289037542797314?s=20

In a video, Ramesh Raskar talks about what the Health Authority can do with the data, without explictly highlighting the fact that this must all be done only with the user’s participation & consent.
https://twitter.com/keithklain/status/1251249769332322304?s=20
https://twitter.com/keithklain/status/1251226759271514113?s=20
https://twitter.com/keithklain/status/1251208684702773249?s=20

Precise language is very important, e.g. data creation vs. data collection; “government agencies” “your town’s website” (plus see specific previous point)
https://twitter.com/michaelbolton/status/1251207143480573953?s=20
https://twitter.com/keithklain/status/1251205653160394752?s=20
https://twitter.com/DanAshby04/status/1251204102098427904?s=20
https://twitter.com/michaelbolton/status/1251202917266317315?s=20

Trust of Health Authorities

The idea that what Health Authorities do with the data they receive will be constrained by the what we write in our Requirements docs is naive.
It appears we have no mechanism to oblige the HA to act with the consent of the user. Health Authorities in the US may be tightly regulated, but this may not be true in other jurisdictions.
https://twitter.com/keithklain/status/1251270504251625478?s=20

https://twitter.com/TheTestDoctor/status/1251249028861431810?s=20
https://twitter.com/keithklain/status/1251242031227305984?s=20
https://twitter.com/keithklain/status/1251225159790362624?s=20
https://twitter.com/keithklain/status/1251204966464860168?s=20
https://twitter.com/keithklain/status/1251204058519666689?s=20

What protections do we have against coercion of users to provide data against their will?
https://twitter.com/jamesmarcusbach/status/1251222688086888449?s=20
https://twitter.com/jamesmarcusbach/status/1251223192963657728?s=20

How do we define what is a Health Authority? How can we be sure that we won’t change that definition over time & include other agencies (e.g. security)?

https://twitter.com/keithklain/status/1251250364243943425?s=20

Concerns about how long data is kept for
https://twitter.com/michaelbolton/status/1251207485509316619?s=20
https://twitter.com/Manisha_Awasthi/status/1251190679126114309?s=20

Location Data, Redaction & Anonymization

Access to location data can be used to gain all sorts of important intelligence about another person. Hence location privacy matters a lot.
https://twitter.com/jamesmarcusbach/status/1251215793150287872?s=20

Data must be redacted, since e.g. home address makes you easily identifiable
https://twitter.com/QualityFrog/status/1251241779938066432?s=20

There is no good reason for unredacted data to be shared with the Health Authority. Maybe it makes the implementation simpler, but if we are Privacy-first, Privacy should come first.
https://twitter.com/QualityFrog/status/1251249244415111169?s=20
https://twitter.com/jamesmarcusbach/status/1251221232608919552?s=20

Couldt the data be anonymous even prior to the contact trace interview, in the sense that the health official, and the systems have no idea what the user’s identity is?

https://twitter.com/DanAshby04/status/1251207307695882241?s=20

Certain locations such as “home” could be configured on the device and geo-fenced such that data is not even recorded on the device in these locations. That would improve privacy.
https://twitter.com/QualityFrog/status/1251249244415111169?s=20
https://twitter.com/QualityFrog/status/1251245748198481924?s=20

“Anonymized location data” is a myth. It is quite possible to extract individual trails from an anonymized pool of data points.
https://twitter.com/jamesmarcusbach/status/1251208093251194880?s=20
https://twitter.com/jamesmarcusbach/status/1251223463722807296?s=20

https://twitter.com/jamesmarcusbach/status/1251220388538757120?s=20
https://twitter.com/keithklain/status/1251206247208177667?s=20

There is a specific issue with 3rd party location-tracking apps, which may leak data, which could then be correlated to public data published by a Health Authority, de-anonymizing the user. It may not be safe to deploy Safe Paths alongside 3rd party apps that have location data enabled.
https://twitter.com/dhmackenzie/status/1251210171478691842?s=20

There are many ways a curious individual or agency could explore & pick apart the public data - especially in e.g. a small island community. Ditto any security agency
https://twitter.com/jamesmarcusbach/status/1251291029942095872?s=20
https://twitter.com/jamesmarcusbach/status/1251290704732553216?s=20
https://twitter.com/jamesmarcusbach/status/1251290253177962496?s=20
https://twitter.com/jamesmarcusbach/status/1251289548119662593?s=20

Location services might be better implemented as a variable precision parameter, rather than a binary on/off. (one for Apple / Google primarily; though we could pioneer this approach in this app?)
https://twitter.com/QualityFrog/status/1251242495431892993?s=20

General concerns about viability of the approach

There are high risks of false positives or false negatives - in all cases, but specifically in dense urban situations, with mixed-use and single-use multi-storey buildings.
https://twitter.com/DanAshby04/status/1251180737056968706?s=20
https://twitter.com/QualityFrog/status/1251246417827573768?s=20
https://twitter.com/michaelbolton/status/1251246561289482240?s=20
https://twitter.com/DanAshby04/status/1251145983662014464?s=20
https://twitter.com/DanAshby04/status/1251145943736442883?s=20

Bluetooth-only apps are flawed in a number of ways.
https://twitter.com/keithklain/status/1251216397591547904?s=20

Other points

Security & Privacy need to be fundamental concerns of everyone on the project, not just specialists.
https://twitter.com/TheTestDoctor/status/1251146868140109825?s=20

A few days after this discussion, Keith Klain organized a panel discussion podcast to follow up on themes that had been rasied in the Twitter discussion.

http://qualityremarks.com/qr-podcast-ethics-panel/

It ranges from specific concerns about Safe Paths, to broader concerns about privacy & ethics across the whole tech industry.

Some of the concerns raised are based on mistaken assumptions about the Safe Paths project (which i turn our due to our own shortcomings on transparency - something we are wokring to address).

Other concerns are highly relevant, and in need of good answers. The points that specifically stood out for me were the following:

Threat modelling, in particular the STRIDE threat model. We are stepping up our Security analysis & this absolutely needs to be a part of that.
Genuine disbelief that private companies can be contributing out of altruism. Transparency: we need a position on what private companies are contributing, and with what motivation.
Not enough clarity about exactly what is a “Health Authority” and exactly what “Government Agencies” may have access to date.
Concern that data heat maps could lead to “kettling” of deprived communities to contain the spread.
Being inside the project creates a cognitive bias - outside perspectives are valuable because they do not have this bias.
Ramesh interview - not just slack use of language about who makes redaction decisions (mentioned above), but also seems concernigly relaxed about infection data being posted on a public website!
We need very rigorous documentation of exactly where & when data is created, transported, collected, destoryed etc. This should all be public.
Concern about harm to small businesses, because of information being spread that there was a COVID infection there.
Concern about Open Source model. Anyone can come in & modify the project from the original vision. The technology could also be forked and re-used for some other purpose beyind the original design intent.
If the real issue is social inequality, it could be argued that Digital Contact Tracing cannot improve things, and it certainly risks makig things worse for certain communities.
If we do get good testers on the project, they will ask awkward questions. Does testing have an improtant enough seat at the table on this project? Is it listened too when decisions are taken?
COVID-19 is disproportionately hitting ethnic minority communities in the US. Poor people will use this app (and therefore be at risk of any harms it may cause), while rich people never will.
Concern about diversity on the project Open Source in general has diversity issues. Poorer people can’t make time around their job & other commitments to volunteer on a project like this.
Key questions must be answered in public.

Some key misconceptions / misinformation.

The problem is that the product has already been built, and you can’t retro-fit Ethics, Pivacy & Security. The product is very far from already built. We have some early versions, with many known flaws, and a vast amount of work still to do.
That this project will inevitably collect lots of data - e.g. a comparison drawn to the volume of data that Facebook collects. In fact we are being very disciplined in not extending data beyond GPS co-ordinates & Timestamps (there is a risk that potentially someone could fork the project & take it in a different direction - that’s covered in the “Open Source” concern above).
Assumption that some central authority is able to see who contacted whom (“guilt by association” concerns). My understanding is that there is no way the Health Authority could do this analysis except possibly between two infected people - they have no access to the data of uninfected persons. (we should not ignore this latter case, but it falls under the general umbrella of what HAs will do with the case data they have, and whether they can be trusted to act appropriately).,
Peter Thiel has been contracted to build out contact tracing app with MIT (45:30). Keith links to the article, but this is a contract awarded by Trump to collate government data on the spread. It has nothing to do with MIT or the Safe Paths project. https://gizmodo.com/trump-admin-gives-coronavirus-tracking-contract-to-pete-1842994647
Contact tracig is not going to solve the problems of social isolation, isolation of communities, mental health issues. Actually, this is exactly what it is intended to do - by offering a finer grained view of risk, it allows broad generic lockdown measures to be lifted earlier than they might otherwise.
Assumption that there are NDAs put in place on project participants. There are no NDAs for project volunteers.