Info |
---|
This article is out of date. At the time it was written, there was no plan to encrypt published HA JSON data for MVP1 |
Our current plan for MVP1 is to ship with the current implementation of redacted location data shared in plain text from Public URLs.
...
This means that it will be common for case data from individuals to have a publication delay of 2 or 3 days, perhaps even longer.
Countries we are targetting have total cases as follows. As least 4 of the 7 targets would have this issue.
Mexio: 33,460, Puerto Rico: 2,173, , KCMO: 664 (on 5 May), Lake County FL: 223, Haiti: 151, Guam: 151, Teton County: 97.
...
This chart from Oxford (https://science.sciencemag.org/content/368/6491/eabb6936) shows why this is such a big problem.
This shows transmission from day of infection.
...
In short - for the solution to be effective, we need to be able to publish individual users data promptly (within 12 hours; sooner would actually be significantly better). We can’t do that if we are dependent on “crowd” effects for privacy, because the crowds simply will not be there in many of our small scale deployments.
Note also, my assuption assumption that a crowd of 10 will be enough for privacy purposes may be a major underestimate. In some jurisdictions, it is not even clear that there is any value N for which the pooled data on N users can be considered to result in adequate privacy. Adam Leon Smith (Unlicensed) tells me that this would be the case in the EU (though that may chage change if we got a more explicit consent from users).
And beyond what the law says, I would not be surprised if there were significant public outcry when it is discovered that we publish this personal data in plain text. The only defense I can see against that is total transparency with patients about how their data will be published, and how exposed this could make them - and I suspect that such transparency will massively hinder uptake.
What options are available?
There are a couple of options available that can help here.
The first is to publish location data points with a one-way hash function applied to them.
The second is a more sophisticated scheme, with 2 separate independent servers, used to provide stronger sryptographic protection.
Another option might be to have the HA Servers authenticate Safe Paths Apps based on a secret built into release builds of the app, from outside our Open Source repo - I am not sure why this approach does not seem to be under consideration.
ALS: I think that this should be done regardless, it provides an additional control, albeit weak.
The first solution is described in this paper:
https://arxiv.org/pdf/2003.14412v2.pdf
...
And this WIRED interview with Ramesh.
https://www.wired.com/story/covid-19-contact-tracing-apps-cryptography/
It is known to have weaknesses (vulnerability to brute-force attacks), but it provides considerably more protection than plain text, and is relatively inexpensive to implement (Abhishek Singh (Unlicensed) tells me it is mostly implemented already).
The second solution is described in this paper (as soluton #4), and also referred to in the WIRED interview above.
https://github.com/PrivateKit/PrivacyDocuments/blob/master/GpsEncryption.pdf
...
Our current thinking is that this is the ideal solution to the problem, but we are concerned that it is too complex & expensive to implement for MVP1 (1 June).
Based on my discussion above, I don’t believe that a plain text solution is acceptable.
I understand there is reluctance to deploy the hashing solution. There is a concern that we will be subject for ridicule for deploying such a solution.
However:
It is not clear to me that we will be subject to any less ridicule for deploying a plain text file with zero protection.
This is a solution that has already been presented publically as an intermediate option in both the MIT paper above, and Ramesh’s interview with WIRED. I am not aware of us having been ridiculed for that yet.
There are many circumstances in which imperfect security measures deliver significant protection in spite of their imperfections - the locks that we mostly have on our front doors being a good example.
The main group I would see this protecting against would be the tech-literate (but not highly skilled with security) “concerned public” who might well be attracted to digging through a plain text file, but would mostly be put off by hashed data that would require substantial effort to decrypt.
The key question for me regarding the Hashing solution is whether or not it would deliver enough protection that we could consider dropping the crowd-size N that represents the minimum number of cases that we can publish at one go.
If it does allow us to reduce this number to low figures, 1, 2 or 3, say - then I think it makes MVP1 viable (as per above, I don’t think the current MVP1 plan is viable).
If it does not, then there is little point in spending time on a Hashing solution, and we should be working on the “full” solution as a priority, as a necessary part of MVP1.
References:
https://www.worldometers.info/coronavirus/
https://www.worldometers.info/coronavirus/country/us/
https://www.kcmo.gov/Home/Components/News/News/332/16